llms.txt Guide: Format, Example, and Why It Matters

llms.txt has been around since September 2024. It's been 14 months. You should already have one. Most sites still don't. The ones that do usually have a stub that nobody updated since launch.

Here's what llms.txt actually is, what it isn't, and how to write one that does work that matters — not a checkbox file you forget about.

What llms.txt is, in one paragraph

It's a markdown file at the root of your site (/llms.txt) that gives LLMs a curated, structured summary of your important content. The spec was proposed by Jeremy Howard. The format is intentionally simple: an H1 with your site name, a blockquote with a one-line description, optional paragraphs of context, and H2 sections that link to your key pages in markdown link format.

The point isn't to replace robots.txt (which gates crawler access). The point is to give an LLM a curated map of what you want it to read when it does come in.

llms.txt vs llms-full.txt — they're different files

This trips up almost everyone.

llms.txt is the map. Short. Links to your important pages. Think of it as a sitemap for AI engines, written in narrative.
llms-full.txt is the territory. Long. Contains the actual text of your key pages, concatenated into one file. Think of it as a one-shot context document.

You don't need both. You probably want both. The map is for engines that crawl on demand (Perplexity, ChatGPT browsing). The full file is for engines that pre-fetch and cache (some agent harnesses, RAG pipelines, internal tooling).

A real example: Booplex's llms.txt

This is the actual file at booplex.com/llms.txt as of writing. I'll annotate why each block is there.

# Booplex

> Booplex is Gabi Florea's personal site — an independent builder and SEO specialist. 
> Tools, war stories, and AI workflows for the people responsible for shipping things 
> that need to be found by humans, search engines, and AI engines.

The blockquote is what most AI engines pull as the description when citing the brand. One line that matches the AI summary meta tag is enough.

## About

- [About Gabi](https://booplex.com/about): The person behind Booplex — 10+ years of 
  building, breaking, and figuring out why good projects get buried.
- [What I work on now](https://booplex.com/now): What I'm building this month, 
  what shipped, what's on the back burner.
- [Work with me](https://booplex.com/work-with-me): How to reach me if you've got 
  a project that needs SEO, AI workflow, or app work.

Three pages, each with a description in the link's adjacent text. Format matters: the LLM uses the description, not the URL, to decide if the page is relevant to the query.

## Brain Dumps (blog)

- [Brain Dumps index](https://booplex.com/blog): The blog. War stories, automation 
  recipes, technical SEO experiments.
- [How I fixed canonical URLs pointing to localhost](https://booplex.com/blog/how-i-fixed-canonical-urls-pointing-to-localhost-in-next-js): 
  The post that triggered a 5-week reindex recovery. Covers what broke, why, and 
  the Next.js fix.
- [GSC URL inspection canonical lies](https://booplex.com/blog/gsc-url-inspection-canonical-lies): 
  How to read the Inspection report when the declared canonical and Google-selected 
  canonical disagree.

The blog gets its own section.

Posts are listed by topic relevance, not chronologically. AI engines reading this for a query like "canonical url problems" will see both posts and the index.
## Tools

- [Canonical URL Checker](https://booplex.com/tools/canonical-checker): Paste any URL, 
  see what canonical it actually serves. Free, no signup.
Tools get their own section. They're the highest-intent content — pure first-touch capture.
What goes in a good llms.txt vs a bad one
Good
Plain language descriptions. Not SEO-stuffed. Not marketing copy. Just "this page is about X, useful for Y."
Curated, not exhaustive. If your site has 500 pages, pick the 30 that matter. The LLM doesn't need the rest.
Organized by topic, not by URL structure. Group like-with-like — all your blog posts in one section, tools in another, about pages in another.
Optional H2 sections for context. A short paragraph explaining what kind of content the next section contains. AI engines parse this.
Updated. Touch it when you publish something significant. Not every post — but the high-value ones, yes.
Bad
Auto-generated from sitemap.xml. That defeats the curation point. Most sitemaps include every URL on the site, which is the opposite of what llms.txt wants.
Keyword stuffing in descriptions. AI engines downrank pages with obvious LLM-bait text. Write for a human first.
One-line link only, no description. The LLM has to guess what the page is about from the URL. Don't make it guess.
Stale. The file says you published 5 posts and your most recent one is from 2024.
How to write one in 20 minutes
Open a markdown file, name it llms.txt.
H1 = your brand name. H1 is the title — keep it short.
Blockquote with one sentence describing what your site is.
Optional paragraph or two of context — who you serve, what you make, what makes you different.
H2 sections per topic. For each, list 3–10 pages with markdown links. Add a 1-line description after each link explaining what the page covers.
Save to your site's public root so it serves at /llms.txt.
Validate: load it in a browser, check the markdown parses, check the links work.
That's the entire process. The whole spec is intentionally lightweight.
Where to put it in Next.js (App Router)
The cleanest way: a route handler at app/llms.txt/route.ts that returns the markdown content with the right MIME type.
// app/llms.txt/route.ts
import { NextResponse } from 'next/server';

export const revalidate = 3600; // refresh hourly

export async function GET() {
  const content = await buildLlmsTxt(); // your generator function
  return new NextResponse(content, {
    headers: {
      'Content-Type': 'text/markdown; charset=utf-8',
      'Cache-Control': 'public, max-age=3600, must-revalidate',
    },
  });
}
If you want it static instead, drop a llms.txt file in /public. Next.js serves it as-is. The route handler approach is better if you generate the content from your CMS — then a new post lands in the file automatically.
Does anyone actually read llms.txt?
This is the honest part. The answer in May 2026 is: some engines, sometimes, partially.
Perplexity: reads it when crawling for citations. Verified — I've seen the description text from my llms.txt blockquote quoted back in Perplexity citations.
ChatGPT browsing: appears to read it, but evidence is anecdotal. Inconsistent.
Anthropic (Claude): Anthropic publishes a list of supported sources. llms.txt is not officially in it as of writing. Claude does read pages, but I haven't confirmed it preferentially reads llms.txt first.
Google AI Overviews: Google has not committed to reading llms.txt. Google's signal is still your robots.txt + sitemap + on-page content.
Bing Copilot: no public commitment. Probably reads via the standard Bing crawler.
So: it's not a magic switch. But the cost of having one is 20 minutes and a single file, and the upside is that the engines that do read it have a much clearer entity description of you than they'd otherwise build from scraping.
What llms.txt won't fix
This part is important because I've seen too many "add llms.txt and watch the citations roll in" tutorials.
llms.txt is a discoverability and structure aid, not a ranking factor. It can't fix:
A site with no original content. If your pages are repackaged stuff from other sites, llms.txt won't help. AI engines look for source-of-truth signals first.
A site with weak entity markup. llms.txt is text — the Organization/Person schema markup is the structured equivalent. You need both.
A site that blocks AI crawlers via robots.txt. If you've blocked GPTBot or ClaudeBot, llms.txt is moot — they never come in.
A site with no dateModified signals. AI engines weight fresh content. If your post is 3 years old and you haven't updated it, that matters more than your llms.txt.
Why most SEOs are still ignoring it
Best guess: because there's no measurable ranking impact. SEOs are conditioned to optimize for things that show up in Google Search Console. llms.txt doesn't.
That's the wrong frame. The right frame is: AI engines are now a meaningful traffic and reputation channel. ChatGPT alone has 200M weekly users. Perplexity has 15M.
Those users are asking questions your site might answer. The cost of being legible to those engines is 20 minutes of work.
Don't wait for someone to publish a study correlating llms.txt with AI citations. By the time the study lands, the engines will have moved on to a different signal. Ship the file. Move on.
FAQ
Do I need llms.txt if I have a sitemap?
Yes. They do different things. sitemap.xml is for search crawlers to discover URLs. llms.txt is for LLMs to understand which URLs matter and what they're about. Same content, different shape, different audience.
llms.txt vs robots.txt — which one controls AI crawlers?
robots.txt controls access (whether the crawler can read the page at all). llms.txt provides structure (what the page is about, why it matters). You configure access in robots.txt, then provide curation in llms.txt for the engines that respect both.
What's the file size limit for llms.txt?
No formal limit, but practical recommendation: under 30KB. LLMs that read it on-the-fly have context windows to manage. If your llms.txt is huge, they'll truncate.
Does llms-full.txt help with AI Overviews?Probably not directly. Google's AI Overviews currently rely on standard search index signals. llms-full.txt is more useful for engines that pre-cache content (Perplexity, agent frameworks) than for engines that do real-time SERP construction (Google).
Can I block specific AI crawlers but still publish llms.txt?
Yes — and you should, if that's your policy. The two systems are independent. You can block GPTBot in robots.txt and still publish llms.txt for the engines you do allow.
Does llms.txt help with ChatGPT citations?
Anecdotally, yes. ChatGPT's web-browsing tool appears to give weight to llms.txt's structure when summarizing a site. But the citations themselves come from the page content, not from llms.txt — so llms.txt is the door, the page is the answer.

llms.txt: what it is, how to write one, and why most SEOs are still ignoring it

What llms.txt is, in one paragraph

llms.txt vs llms-full.txt — they're different files

A real example: Booplex's llms.txt

What goes in a good llms.txt vs a bad one

Good

Bad

How to write one in 20 minutes

Where to put it in Next.js (App Router)

Does anyone actually read llms.txt?

What llms.txt won't fix

Why most SEOs are still ignoring it

FAQ

Do I need llms.txt if I have a sitemap?

llms.txt vs robots.txt — which one controls AI crawlers?

What's the file size limit for llms.txt?

Does llms-full.txt help with AI Overviews?Probably not directly. Google's AI Overviews currently rely on standard search index signals. llms-full.txt is more useful for engines that pre-cache content (Perplexity, agent frameworks) than for engines that do real-time SERP construction (Google).

Can I block specific AI crawlers but still publish llms.txt?

Does llms.txt help with ChatGPT citations?

Found This Useful?

More Learning Material

The GSC Coverage report is the most underrated free tool in SEO

ISR, SSG, SSR — which one actually helps SEO in 2026?

The 5 technical SEO leaks that quietly kill indexing (and how to plug each one)