Reddit Is Now the Most-Cited Source in AI Search. Here's Why That Should Terrify Every Other Publisher.
Reddit's CEO says LLMs "would not exist" without Reddit data. The numbers back him up. Here's what AI search's most-cited platform means for everyone else.
Reddit's CEO Steve Huffman recently made it clear: large language models "would not exist as we know them" without Reddit. Beyond the bold statement, there’s a number that should change how marketers view AI visibility. Citation-tracking firm Profound reports that Reddit is the most-cited platform across all major AI models.
It’s not just one of the most cited platforms. It’s the top one. Full stop.
If you’ve been wondering why ChatGPT, Perplexity, and Google’s AI Overviews often pull answers from a 20-year-old forum instead of your carefully optimised blog, here’s why. And this trend is only going to become more pronounced.
The "Modern Oil" Pitch Is Actually a Pricing Strategy
At Fast Company's Most Innovative Companies Summit, Huffman described Reddit's user-generated content as "modern oil" for AI. While it’s a catchy phrase, the real point is that this is Reddit’s business model.
Reddit struck data licensing deals with Google and OpenAI in 2024, when the AI industry was still figuring out the value of training data. Two years later, Huffman now says Reddit understands just how valuable its data is, and the company is "open for business" for future deals with more intentional terms.
In other words, the next round of licensing deals will be very different from the first. Platforms that didn’t sign early will soon experience the cost of coming in late.
The Two-Tier System Is Officially Live
Huffman’s approach is clear: there are two ways to access Reddit’s data at scale, and only one is cooperative.
The first option is to license the data. Google and OpenAI chose this route, gaining structured access, user protections, and a collaborative relationship.
The second option is legal action. Reddit has sued Anthropic in California Superior Court for unauthorized use of Reddit content. It has also filed a federal case against Perplexity in New York, along with three data-scraping firms, for DMCA anti-circumvention violations.
"Commercial use of our data requires commercial terms," Huffman said. This isn’t just a negotiation tactic; it’s now company policy.
This issue goes beyond Reddit. Every major content platform with user-generated conversations like Stack Overflow, Quora, Pinterest, and LinkedIn is watching closely. The legal and pricing standards Reddit sets will shape content licensing for AI training over the next two years.
Why Reddit Beat Everyone Else to AI Citation Dominance
Reddit’s dominance in citations isn’t just because of its size. It’s about the format of its content.
Huffman put it plainly: "These models are quite simple. They're regurgitating on an absolutely massive scale what they've consumed elsewhere, and a large portion of that consumption is actually just the human conversation on Reddit because it's natural and it covers basically every topic imaginable."
AI answer engines often highlight Reddit threads because they have the kind of content large language models aim to create: a question, several human perspectives, upvotes as social proof, and natural language that doesn’t sound like marketing.
In contrast, typical brand blog posts, SEO-optimised listicles, or press releases don’t match this format. AI models designed to provide helpful, conversational answers prefer sources that already use this style.
For marketers, the past 18 months have delivered a tough lesson. You might spend months creating a detailed guide, only to see a short Reddit thread from 2019 appear higher in ChatGPT’s answers.
Reddit's Own AI Hedge
This is where things get more interesting. Reddit knows that providing data to external AI systems is risky if people get Reddit answers from ChatGPT, they may stop visiting Reddit itself.
Huffman called this situation a paradox. Reddit’s solution is Reddit Answers, a search tool powered by large language models that reads posts and comments, then creates responses using direct user quotes and highlights different perspectives.
This is Reddit’s key strategy. Instead of competing with ChatGPT on synthesis, Reddit focuses on what only it can offer: real human disagreement, in real voices, on questions without a single right answer.
It’s still unclear if users will pick Reddit Answers over more polished AI tools. But Reddit Answers exists for a reason: to protect Reddit from being cut out of the process.
What This Means for Your AI Search Visibility Strategy
If you’re a marketer, founder, or content strategist aiming to appear in AI answer engines, there are three key takeaways from this interview.
First, Reddit has become a primary channel for visibility, not just a secondary one. Being part of relevant subreddit conversations genuinely and without obvious marketing is now an AI search optimisation tactic. As models cite Reddit more than any other platform, your presence in those threads becomes even more valuable.
Second, the cost of AI search tools is likely to increase. If Reddit’s next licensing deals match Huffman’s signals,input costs for ChatGPT, Gemini, and other major models will go up. This will likely lead to higher prices for the AI products, plug-ins, and APIs your team uses.
Third, the open web is becoming less accessible. Reddit’s move from "born of the open internet" to "commercial use requires commercial terms" reflects a larger trend. As AI companies stopped sharing open research and started building closed, commercial products, the data sources they used also became restricted. More platforms are likely to follow Reddit’s example.
The Bottom Line
Huffman’s "modern oil" phrase will grab attention, but the bigger story is the major changes happening behind the scenes. Reddit is now both the most valuable source for AI search training and one of the few publishers ready to defend that value in court.
For everyone else, the message is clearer than before. The conversational, community-driven internet is winning the AI citation battle, while the polished, brand-controlled web is falling behind. Platforms with community-driven content are about to see just how valuable that is.
The next round of licensing deals will tell us the number.
Featured Image: Google Gemini