Search engines built their world around robots.txt. AI engines need something different. The llms.txt standard is the answer, a single plain text file that tells models what your site is, what matters on it, and where to find the clean version. This guide covers the file itself, where it sits, what goes inside it, and how to keep it current.
A Map Written for Models, Not Crawlers
The llms.txt file is a Markdown document that lives at the root of your domain and tells large language models what your site contains and where the important parts sit. It was proposed in 2024 by Jeremy Howard of Answer.AI as a way to hand AI systems a curated index instead of leaving them to parse cluttered HTML. Where robots.txt tells crawlers what they may not touch, llms.txt tells models what they should read first.
The problem it solves is context. A modern web page carries navigation, scripts, ad slots, cookie banners, and tracking code wrapped around a few hundred words of actual content. A model trying to summarize or cite that page has to wade through markup that means nothing to a reader. The file gives the model a clean, prioritized list of links and short descriptions, so it spends its limited context window on substance rather than scaffolding.
The format is deliberately simple. It uses standard Markdown, starts with an H1 title, allows a short blockquote summary, and groups links under H2 headings. No new syntax, no schema dialect, no JSON wrapper. Anyone who can write a README can write a valid file, and any model that can read Markdown can parse it without special tooling.
“robots.txt manages permission. llms.txt manages attention.”
The core distinction
Location and the Optional Full Version
The file belongs at the root of your domain, served at /llms.txt, the same level as robots.txt and sitemap.xml. A model or tool looking for it checks that exact path first, so placing it in a subfolder defeats the purpose. It must return as plain text with a 200 status and stay publicly reachable without a login wall.
The standard also defines a companion file, llms-full.txt, served at the same root level. The short file is an index of links and descriptions. The full file expands those links into the actual content inline, giving a model the complete text in one fetch when the use case calls for it. Documentation sites and product references often publish both, while a smaller brand site may only need the index version.
Serving the file is one decision. Pointing to clean targets is another. The links inside should resolve to readable pages, and many sites pair each HTML URL with a Markdown equivalent by appending .md to the path. That gives models a stripped version of the same page, free of the interface chrome, which is the whole point of the exercise.
“Place it at the root or the tools looking for it never find it.”
The location rule
Structure, Sections, and Sensible Choices
A working file opens with the project or brand name as an H1, then a blockquote that states in one or two sentences what the site is and who it serves. After that come H2 sections that group related links, each link followed by a short description of what the reader finds there. A common pattern is a primary section for core pages and an Optional section for material a model can skip when context is tight.
Decide what earns a place by asking what you want a model to repeat back. Product pages, documentation, pricing, key guides, and authoritative reference material belong in the file. Login screens, cart pages, duplicate tag archives, and thin utility pages do not. The file is an editorial choice, not a dump of your sitemap, and a tighter file usually reads better than an exhaustive one.
Descriptions carry real weight. A bare URL tells a model nothing about relevance, while a single clear line, like a section that explains token pricing and what each output costs, helps the model decide what to pull. Write these the way you would write a helpful link label, specific and plain, with no marketing padding around them.
“Treat the file as a curated reading list, not a backup of your sitemap.”
The editorial principle
How Engines Use It and How to Keep It Honest
Adoption is uneven and worth stating plainly. The file is a published convention, not a ratified web standard, and no major engine has confirmed that it weights llms.txt in ranking. What it does today is give compliant tools, agents, and retrieval systems a clean entry point, and several documentation platforms and AI tools already fetch it. Treat it as low-cost infrastructure that helps where it is read, not a guaranteed visibility lever.
Maintenance is the part most sites get wrong. A file that lists pages you deleted six months ago sends a model to dead links and undercuts the trust the file is meant to build. Tie regeneration to your publishing flow so new core pages get added and removed ones get dropped. A file you write once and forget decays into a liability. Citably’s llms.txt Generator builds the file from your live site, the lowest-cost Deploy output, and keeps the structure correct while you control which links earn a place.
A clean file helps models find and read your content, but it does not by itself decide whether they cite you. Answer Engine Optimization is the broader practice, covering the structure of your pages, the schema you publish, the clarity of your answers, and the authority around them. Make pages answerable first, publish llms.txt so models can find them, then measure whether engines actually pull and cite the result. The file is the groundwork. The visibility data is the verdict.
“A file you write once and never touch decays into a liability.”
The upkeep truth
A clean llms.txt file gives AI engines a clear path to your best pages, but a published file and a cited brand are not the same thing. The next step is seeing which prompts pull you in and which still hand the answer to someone else.
