What AI cites: 602 prompts and 21,000 citations analyzed vs what I built on my blog

Someone did what I wanted to do but didn't have the resources to execute.

A research team designed 602 experimental prompts, ran them across three platforms (ChatGPT, Google AI Overview, and Perplexity), collected 21,143 valid citations, extracted 72 features from each cited page, and precisely measured what makes a page superficially cited versus deeply absorbed by an AI model.

I've spent three months implementing Generative Engine Optimization on this blog with vanilla PHP. Every technical decision was based on informed intuition, standards documentation, and manual testing. When I read this study, I wanted to verify whether what I built aligns with what the data says works. The short answer: yes, almost everything. The long answer is this article.

What the study measured and why it matters

The research didn't just count how many times a domain appears in an AI response. It went deeper. They measured two separate layers.

The first layer is search: what types of prompts trigger web search on each platform, how many sources each consults, and which domains appear most frequently in the results.

The second layer is influence: of all cited pages, which ones were actually absorbed by the model to build its response, and which ones just appeared in the source list without contributing real content.

This distinction is fundamental. Appearing in the source list is not the same as being cited in the response. The study quantifies this with an influence_score that combines citation frequency, position, paragraph coverage, and semantic similarity with the generated response.

ChatGPT cites less but absorbs more

The most important finding in the study is this: ChatGPT averages 6.88 sources per prompt. Google averages 12.06. Perplexity averages 16.35. But the average influence per citation on ChatGPT is 0.2713, versus 0.0584 for Google and 0.0646 for Perplexity.

This means a single ChatGPT citation is worth 4.6 times more than a Google citation in terms of actual content absorption. ChatGPT searches less but reads deeper. Google and Perplexity search broadly but use each source superficially.

For my blog, this confirms the right strategy is not to appear everywhere but to appear where it matters. Every Generative Engine Optimization decision I implemented aims to maximize absorption depth, not appearance frequency.

Pages that AI absorbs average 1,943 words

The study divided cited pages into influence quartiles. The top 25% averages 1,943 words. The bottom 25% averages 170 words. That's an 11.4x difference.

But it's not just length. High-influence pages average 10.59 headings versus 0.85, 47 paragraphs versus 8, and 8.9x higher list density. These are pages structured as information containers that the model can decompose, extract from, and reorganize.

My posts average between 1,200 and 2,500 words. Each has 6 to 10 sections with H2 headings. Each section opens with a direct statement that can be extracted without additional context. This isn't coincidence. It's intentional design for citability, and these data validate it.

Definitions, numbers, comparisons, and steps: the four multipliers

The study measured the impact of specific content features on citation influence. The results are unambiguous.

Content with numbers and statistics has 61.55% higher influence. Content with clear definitions has 57.33% higher influence. Content with structured comparisons has 55.28% higher influence. Content with how-to steps has 41.20% higher influence.

And here's what nobody expects: content in Q&A format has 5.74% lower influence. Question-and-answer pages have no advantage. They actually have a disadvantage.

This destroys a common myth. Many content teams believe formatting everything as FAQ is the best strategy for AI. The data says the opposite. What works is content that defines concepts, presents numerical evidence, compares options, and offers concrete steps. Exactly what a well-written article already does.

What my blog already implements (and what this data confirms)

When I read the full study, I did a point-by-point verification against what I have implemented on shinobis.com.

Citable content structure. The study says semantic alignment is the strongest predictor of influence (correlation 0.43). Every post on this blog opens with a direct statement, not a narrative introduction. The excerpt field of each article automatically maps as abstract in the JSON-LD schema. LLMs read the abstract first to decide whether to process the rest.

Automatic Knowledge Graph. The study confirms that pages with defined structure (clear headings, thematic segments, explicit relationships) are absorbed more deeply. My JSON-LD system automatically generates about, mentions, relatedLink, and citation entities for every post. This is exactly the semantic structure the study identifies as the decisive factor.

Markdown for Agents. The study shows models need clean content to process it efficiently. My server detects when an agent requests text/markdown and returns content without navigation, scripts, or layout. Just the article in pure Markdown. This reduces noise and increases the probability of deep absorption.

Trilingual content. The study confirms English dominates between 82.90% and 95.07% of citations in identifiable samples. My blog publishes in Spanish, English, and Japanese. The English version competes for AI citations. Spanish and Japanese serve direct audiences and regional SEO.

What the study reveals about sites that actually get cited

There's one data point that contextualizes everything else. Across all three platforms, official websites, news outlets, and industry verticals represent between 79% and 87% of all citations. The rest is split between blogs, review sites, and others.

The 15 most cited domains include YouTube, Wikipedia, Reddit, Reuters, LinkedIn, New York Times, Forbes, and similar. These are domains with massive authority.

But the study also says something crucial: high frequency does not equal high influence. News outlets enter the candidate pool easily but their average influence is lower than encyclopedia-type or structured explanation pages.

This is exactly my bet. I can't compete with Reuters on appearance frequency. But I can write pages with higher density of definitions, data, and structure than an average news article. And the data says that's what determines deep absorption.

The different strategy for each platform

The study reveals that the three platforms prioritize different factors.

ChatGPT prioritizes deep semantic relevance (correlation 0.537). It's a deep reader. It works best with pages that integrate definitions, evidence, and context like a well-argued essay.

Google prioritizes semantic alignment with the question and answer (correlation 0.579). It's more sensitive to titles and structure matching exactly what the user asked. Clear definitions are especially important.

Perplexity prioritizes broad coverage and decomposability into fragments (correlation 0.258 with heading count). It works best with modular pages covering multiple sub-questions.

My content is naturally optimized for ChatGPT and Google. Posts have deep structure with definitions and evidence (ChatGPT) and titles that match real questions (Google). For Perplexity, the structure with multiple thematic H2s already covers the modularity requirement.

The 1,000 to 3,000 word range is not arbitrary

The study segmented cited pages by length and measured average influence for each range. Pages under 100 words have an influence of 0.0546. Pages from 1,001 to 3,000 words have 0.1258. Pages over 3,000 have 0.1457.

Influence increases with length but with diminishing returns after 3,000 words. The maintenance cost of a 5,000-word article doesn't justify the marginal influence increase versus a 2,000-word one.

My posts are in the optimal range. Not by accident. I write between 1,200 and 2,500 words because that's the space where I can develop a topic with enough depth without diluting information density. Now I have data confirming that range is exactly where the cost-benefit ratio is best.

The gray area tactics these data bury

If you read the study alongside what I documented about gray area SEO tactics, the picture is clear.

Self-promotional listicles that SaaS companies mass-produce meet none of the high-influence criteria. They have no original definitions. They present no proprietary data. They make no honest comparisons. They offer no replicable steps. They are content designed to rank on Google, not to be absorbed by an AI model.

The study confirms it with data: opinion pages have the lowest influence of all content types. And that's exactly what listicles where a company crowns itself as the best option are: opinion disguised as analysis.

The infrastructure I built on this blog, the full stack of AI agent standards, llms.txt, Content Signals, Markdown for Agents, Agent Skills, automatic Knowledge Graph, is designed for the opposite. For content that is verifiable, structured, and cited with context.

What I'm changing based on this data

Not everything I do is perfect according to the study. There are three adjustments I'll implement.

First, more numerical data in every post. The 61.55% influence increase for content with statistics is too high to ignore. I'll be more deliberate about including concrete figures in every article, not as decoration but as verifiable evidence.

Second, more explicit comparisons. The 55.28% increase is significant. My Midjourney vs DALL-E vs Stable Diffusion post already has this format. I need more posts with clear comparative structure.

Third, opening every section with a definition. The study shows definitions are the second strongest multiplier. I already do this in post titles but not always within each section. Every H2 should answer the question what is this before explaining how it works or why it matters.

The study's conclusion in one sentence

The researchers close with this idea: in the AI search era, the most valuable content is not what best expresses an opinion but what can most easily be decomposed into definitions, numbers, comparisons, and steps, and reorganized as evidence in a response.

It's not writing for AI. It's writing like someone with real evidence, clear structure, and verifiable experience. Which is exactly what a good article has always been.

The difference is that now we have 602 prompts and 21,000 citations proving it with data.

The full research is available at GEO Citation Lab.

What the study measured and why it matters

ChatGPT cites less but absorbs more

Pages that AI absorbs average 1,943 words

Definitions, numbers, comparisons, and steps: the four multipliers

What my blog already implements (and what this data confirms)

What the study reveals about sites that actually get cited

The different strategy for each platform

The 1,000 to 3,000 word range is not arbitrary

The gray area tactics these data bury

What I'm changing based on this data

The study's conclusion in one sentence

Related posts

What is llms.txt and how I implemented it on my blog

How I turned 22 GEO concepts into interactive cards and what I learned implementing each one

I scored 50/100 on Cloudflare's AI Agent Readiness test with a vanilla PHP blog

Gray area SEO tactics die. Infrastructure survives.