May 21, 20268 min read

llms.txt is a sign nobody reads — and llm.pdf is even worse

You spent an afternoon writing a beautiful /llms.txt. A clean Markdown index of your site, every important page summarized in one or two lines, ordered the way you would want an AI to understand your business. You uploaded it. You felt the small satisfaction of a job done — the digital equivalent of polishing the welcome mat before a guest arrives.

The guest is not coming.

Two years after the proposal, the big AI labs have either publicly said they will not read /llms.txt, or simply never started. The live crawler logs show GPTBot, ClaudeBot, and PerplexityBot walking past the file and fetching your HTML pages anyway. There is no enforcement, no verification, no signal back. You wrote a memo and slid it under a door that nobody is on the other side of.

This post is the honest read on llms.txt, the even shakier "llm.pdf" idea that some agencies are now upselling, and — more usefully — on the few places where AI-readable files do earn their keep. So you can stop polishing the doormat and put your time where it pays.

A small paper note pinned by a piece of clear tape to a battered door handle. — A note taped to a door handle by aboodi vesakaran on Unsplash. The note is there. Whether anyone opens the door to read it is a different question — and it is the entire `llms.txt` story.

What llms.txt was supposed to be

The pitch is genuinely nice. In September 2024 the AI engineer Jeremy Howard proposed a simple file: /llms.txt, sitting at the root of your domain, written in Markdown, listing the most important pages of your site in plain language. A robots.txt for the LLM era — except instead of "stay out," it would say "here, this is the structured tour."

Two flavors of the file emerged:

llms.txt — a slim index. Sections, links, one-line descriptions.
llms-full.txt — the same idea, but with full content inlined so an LLM never has to crawl your site at all.

A handful of developer-tool companies adopted it within months — Anthropic, Stripe, Vercel, Cloudflare, Hugging Face. By early 2026 the file had been added to roughly 10–15% of indexed domains. The agencies caught wind. "AI SEO" became a service line. People started selling llms.txt packages the way they used to sell meta keywords.

That last sentence is the tell. We have been here before.

The awkward part: no major AI provider commits to reading it

Here is the cold list, as of Q1 2026:

Provider	Position on llms.txt
Google	Publicly said they do not support it and have no plan to. John Mueller compared it to the meta keywords tag — a dead protocol.
OpenAI	No public commitment. GPTBot crawls HTML directly.
Anthropic	Hosts its own llms.txt for its docs, but has not committed to reading third-party llms.txt in Claude training or web-search retrieval.
Perplexity	No commitment. PerplexityBot fetches the regular web.
Meta, Mistral	No statement.

That is every provider that matters. None has stood up and said, "Yes — write us a /llms.txt and we will use it to answer questions about your site."

When researchers actually look at server logs to see what the AI crawlers fetch, the answer is what you would expect from companies that never agreed to the protocol: they fetch HTML. The file at /llms.txt is requested at roughly the rate of any other obscure file at a domain root — which is to say, rarely, and almost never by a crawler that proceeds to act on its contents.

It is not that llms.txt is forbidden. It is that the systems you wanted to influence are not reading it. Writing it harder will not change that.

Why "inference-time" is the part that breaks the dream

Most of the llms.txt pitch leans on a fuzzy claim: "When ChatGPT answers a question about your business, it will use your /llms.txt."

That is not how any of the production assistants work today.

When ChatGPT, Claude, Gemini, or Perplexity needs fresh facts mid-answer, the call stack looks like this:

The assistant decides it needs the web.
It hands a query to a search engine — Bing, Google, Brave, or its own crawler index.
The search engine returns ranked web pages. These are normal HTML URLs.
The assistant fetches one or more of those pages and reads the rendered HTML.

Notice what is not in that chain: anyone going to yoursite.com/llms.txt. The search engine does not look there. The assistant does not look there. The only way your /llms.txt would matter is if someone in step 4 specifically asked for it — and no production system has been built to do that, because there has been no reason to.

The web is not crawled twice. Building a parallel "LLM-friendly" lane only works if someone has agreed to drive in it. So far, none of the four cars on the road has.

The live crawler test (and what it shows)

There is a clean way to test this yourself, and it takes about thirty seconds. Grep your server logs for the line where a known AI crawler — GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot, Google-Extended — fetched /llms.txt. Then grep again for that same crawler fetching anything else.

The ratio is brutal. The crawlers are on your site. They are not on your /llms.txt. They are on /, /products, /blog, your sitemap, the same surfaces you have always served. The file sits at the root, untouched, like a paper menu by a self-checkout terminal.

A black and gray telephone booth half-buried in snow at the edge of a road. — Phone booth on snow by Lukas Schroeder on Unsplash. The infrastructure works. The line is open. There is just nobody on the other end calling.

"llm.pdf" — the upsell that should be a warning sign

A second idea has started circulating: instead of (or in addition to) llms.txt, publish an llm.pdf — a designed PDF "knowledge brief" of your business, optimized for an AI to read. Some agencies are selling this for four figures.

This idea is worse on every axis.

PDFs are the format crawlers process least well. Every search engine for the last twenty years has treated PDF as a second-class citizen — tokenized poorly, indexed shallowly, often skipped on bandwidth budgets. AI crawlers have inherited that bias.
There is no spec. Unlike llms.txt, there is no convention for where llm.pdf lives, no expected schema, no parsing contract. Nobody is "looking for" it.
It cannot be linked the way HTML is. PDFs do not feed the graph that search engines (and search-using LLMs) actually use to find pages.
It is brittle. Update your hours, your prices, your product list — now your llm.pdf is stale, and you have two canonical sources of truth on your domain disagreeing with each other.

The only reason llm.pdf exists is that it is an easier thing for an agency to sell than "go fix your schema markup." It is the snake-oil version of an already-doubtful idea.

Where AI-readable files actually earn their keep

To be fair: llms.txt is not entirely useless. There is a real, narrow lane where it works — and naming it makes the rest of this argument honest.

Developer tools that read at agent time. When a human running Cursor, GitHub Copilot, or a custom RAG pipeline points the agent at your docs, those tools do read /llms.txt if it is there. Cursor's "@docs" feature, several agentic IDEs, and most retrieval frameworks know about the convention and will prefer the curated index over a noisy crawl.

That is why the early adopters — Anthropic, Stripe, Vercel, Hugging Face, Cloudflare — were all developer-tools companies. Their actual customers are people who paste a docs URL into an AI coding assistant. For that audience, llms.txt is a small, real quality-of-life upgrade.

If you are an e-commerce store, a marketing site, a SaaS product page, a brand, a law firm — that audience is not yours, and that lane does not help you. The agents reading your llms.txt are not the agents recommending you to shoppers.

What actually moves the needle for AI visibility

Here is the unglamorous list of things that genuinely affect whether AI systems can read and quote your site correctly. None of it is new. All of it is boring. All of it is what works.

Clean, semantic HTML. Headings in order. Lists where you mean lists. Product pages with the product first and the navigation second. LLMs read text content; if a person can skim your page in five seconds, an LLM can summarize it in one.
Schema.org JSON-LD. Product, Article, FAQPage, Organization, BreadcrumbList. This is the structured data Google has rewarded for a decade and that every AI search engine inherits. It is the real "LLM-readable" surface and nobody is upselling you on it because it is unsexy and works.
A real sitemap and a real robots.txt. Yes, the boring ones. The AI crawlers do honor robots.txt directives at the User-Agent level (GPTBot, ClaudeBot, etc.) — block, allow, or rate-limit them there.
OpenGraph and Twitter card metadata. Used by every link-preview pipeline, which is increasingly how AI search results surface your site.
Content that contains the answer. If a shopper might ask "do you ship to Germany?", the words "we ship to Germany" should appear on a page on your site. Embeddings and search are very good at the obvious; they cannot conjure a fact that is not there.

Notice the shape: every item above is read by systems whose owners have publicly agreed to read it. That is the difference between a protocol and a wish.

A heavily graffitied wooden door with a single small official notice taped near the handle. — Door with notice by Anton Tseiko on Unsplash. A wall of competing signs. The official one is in there somewhere — but it does not win by being there, it wins by being in the place readers actually look.

Should you still publish an llms.txt?

If you have already published one, do not lose sleep — it is not actively harming you. Five minutes of upkeep per quarter, fine.

If you are thinking about publishing one, here is the calibrated answer:

You sell to developers and your product has docs? Yes. Worth half a day. Your customers' coding agents will read it.
You run an e-commerce store, marketing site, or service business? No. The audience you care about is the AI search systems, and they do not read it. The time is better spent on schema and content.
An agency is selling you an llm.pdf? Walk away. Politely, but quickly.

The takeaway

llms.txt is a community proposal, not a standard. Two years in, no major AI provider has committed to reading it.
Google has explicitly said it will not support it.
The live crawlers (GPTBot, ClaudeBot, PerplexityBot, OAI-SearchBot) fetch HTML, not /llms.txt.
At inference time, AI assistants reach the web through normal search engines and HTML pages — your llms.txt is not in that path.
llm.pdf is worse — no spec, no audience, poor format, and an obvious upsell.
The narrow exception: developer-tool docs, where IDE agents and RAG pipelines do read it. If that is not your business, it is not your file.
The real "AI-readable" surface is the one you already pay for and rarely tune: semantic HTML, schema.org JSON-LD, sitemaps, openGraph, and content that contains the answer.

The sign is fine. The problem is the audience. Stop polishing the doormat and turn on the porch light where the actual visitors are walking by.