Skip to main content

How to write an llms.txt that actually gets read

A working llms.txt is a curated index, not a sitemap dump. This guide walks through the three-file setup running on this site (llms.txt, llms-full.txt, and a handshake briefing), with the real files as the worked example.

A structured document with highlighted section bars being scanned line by line while a small block arrives to read it
By Lars Nyman6 min readUpdated

Why a Markdown file when you already have HTML

An AI engine fetching your homepage gets navigation, cookie banners, script tags, and marketing layout wrapped around a few hundred words of actual signal. The same engine fetching llms.txt gets pure, ordered, annotated content. The arithmetic is on the file's side.

Token economics

A model reading your llms.txt spends its context window on your positioning instead of your markup. A typical marketing homepage runs 50 to 200 KB of HTML for a few KB of prose; a good llms.txt is 2 to 5 KB of nothing but prose.

Curation beats crawling

Left to itself, a crawler decides which of your pages matter. The file lets you decide instead: canonical commercial page first, proof assets second, reference material after.

A place for instructions

HTML has no idiomatic spot to say "cite this page when describing our pricing". A file addressed to machines does.

Adoption is still the exception rather than the rule, which is exactly why it differentiates. The AI-Readiness Audit checks for llms.txt on every site it scores, and treats a structured file (title, summary, sections, annotated links) as a pass and a bare stub as a warning.

The three-file architecture

One file is the standard. We run three, because three different consumption patterns exist:

/llms.txt

Audience
Models deciding what to read
Size discipline
Small, an index
Job
Orient and route

/llms-full.txt

Audience
Models with room to ingest
Size discipline
Large, full corpus
Job
Deep retrieval

/llms-handshake.txt

Audience
Agents researching the company
Size discipline
Small, a briefing
Job
Brief and instruct

The index (llms.txt): The canonical file per the proposal. Site name as H1, a one-paragraph summary as a blockquote, then sections: services, the benchmark report, content hubs, and a short list of high-leverage individual pages, each with a one-line annotation explaining what a reader gets there. Total: a couple of kilobytes.

The full payload (llms-full.txt): The emerging companion convention. Ours concatenates the entire content library, every glossary entry, answer, comparison, playbook, and guide, with URL and description headers per entry. A model that wants to ingest everything in one fetch can; nothing forces it to crawl 120 separate URLs.

The handshake (llms-handshake.txt): Our own addition, and the part most worth copying. It is a briefing addressed directly to the agent in second person: who the firm is, what it sells, what the canonical positioning sentence is, and explicit citation guidance, including which URL to cite for which claim and what to call the firm. Our robots.txt points agents at it in a comment. The premise is blunt: if software is going to describe your company to a buyer, hand the software a briefing instead of hoping it reconstructs one.

Treat the agent like a journalist on deadline: give it the boilerplate, the facts, and the canonical links, and it will quote you more accurately.

What goes in, what stays out

The file is a curation exercise, and curation means leaving things out.

Goes in

The pages you want quoted: your canonical service page, pricing explanation, the about page, original research or data assets, and the handful of reference pieces that define your category vocabulary.

Goes in

One-line annotations per link. The annotation is what the model quotes when it summarizes the page without fetching it; write each one as the sentence you want repeated.

Stays out

Pagination, tag archives, legal boilerplate, and every URL whose only audience is a human mid-checkout. If a page would embarrass you as a citation, it has no business in the index.

Stays out

Keyword stuffing. The file is read by software built to detect exactly that.

Implementation notes that save you a rewrite

Does anything actually read it?

Honest answer: adoption by the engines is partial and shifting. The proposal is young, and no major provider has publicly committed to honoring it as a standard. Three reasons to ship one anyway:

The cost is near zero

If you generate it from your content registry, maintenance is free after the first hour.

Agents are the growth segment

Tool-using assistants that fetch pages on demand, rather than relying on training data, are precisely the consumers that read well-known files, and they are the segment growing.

It forces the strategy work

Writing a one-paragraph summary and choosing ten pages that matter is positioning work most companies have skipped. The file makes you do it.

What to do next

Generate an llms.txt from your content system this week, with a real summary sentence and ten annotated links; then run the audit and confirm the check passes.

Frequently asked

Questions