llms.txt is a hand-curated, plain-text or markdown file at the root of your domain that lists the URLs you most want LLMs to cite. It follows the proposal at llmstxt.org and is a hint, not a hard standard. Models that respect it use it to find canonical, ready-to-quote pages on your site.

What is llms-full.txt?

llms-full.txt is the long-form companion to llms.txt: a single concatenated plain-text dump of your most important pages. It exists so retrieval systems can ingest your whole canonical corpus in one fetch instead of crawling many pages. Ship it once your site has more than a handful of citeable pages.

ai.txt is a small declaration file at the root of your domain that states your AI training stance — opted in, opted out, or restricted. It is a social/contractual signal, not a technical block. Use robots.txt for actual access control.

How do I block AI crawlers I don't want?

Use robots.txt with explicit User-agent blocks. A wildcard User-agent: * disallow is often ignored or overridden by AI bots, so name each crawler. Use Allow: / to opt in and Disallow: / to opt out. ai.txt and llms.txt are signals, not enforcement.

Writing

llms.txt and ai.txt: a copy-pasteable guide for AI crawlers

Last updated: April 18, 2026

Short answer

llms.txt is a hand-curated map of the URLs you want LLMs to cite. llms-full.txt is the long-form dump of those pages so a retriever can ingest your corpus in one fetch. ai.txt is a declaration of your AI training stance. robots.txt is the only one of the four that actually controls access. Ship all four; have each one do its one job.

The 3 files in 60 seconds

/llms.txt — short, human-curated table of contents. Lists 10–50 URLs with one-line descriptions, organized by section.
/llms-full.txt — the long-form concatenation of those pages. One fetch, the whole corpus.
/ai.txt — a one-page text file declaring your training-data stance. Opt in, opt out, or "ask".

None of these files block anything. Access control belongs in robots.txt. These three files tell well-behaved AI bots which pages to read once they're allowed in.

Minimal `llms.txt` template

Drop this at https://your-site.com/llms.txt. Markdown is fine; most parsers expect H1 / blockquote / H2 / list-of-links structure.

# your-site

> One sentence describing what your site is and what an LLM should know
> before quoting it. This sentence will often be lifted verbatim.

This `llms.txt` follows https://llmstxt.org/. The full machine-friendly
corpus is at `/llms-full.txt`.

## Core

- [Homepage](/): overview of what we do.
- [Pricing](/pricing): plans and per-feature breakdown.
- [Docs](/docs): canonical product documentation.

## Reference

- [API reference](/docs/api): endpoints, auth, rate limits.
- [Changelog](/changelog): dated release notes.

## Optional

- [Status](/status): live system status.
- [Security policy](/.well-known/security.txt): coordinated disclosure.

For a real example, see our own /llms.txt — everything pseudobash thinks is worth citing, in 30 lines.

When to ship `llms-full.txt`

If your site has more than ~10 citeable pages, add llms-full.txt. It's the same idea as llms.txt but with the full body of each page concatenated in. Format:

# your-site — full LLM-readable corpus

URLs cited below are relative; resolve them against the request host.

================================================================================
SOURCE: /
TITLE: your-site — homepage
================================================================================

(Full plain-text body of your homepage, ~500–2000 words.)

================================================================================
SOURCE: /pricing
TITLE: your-site — pricing
================================================================================

(Full plain-text body of your pricing page.)

Generate it from your CMS or markdown source on every deploy; do not hand-edit. Keep it under 200 KB if you can — large files get truncated by some retrievers.

Minimal `ai.txt`

A short, plain-text file declaring your training stance. There is no formal schema yet; the convention emerging in 2025–26 is human-readable prose.

# ai.txt — your-site AI policy

Training: allowed for all foundation models.
Citation: required when content is quoted.
Contact: [email protected]

This file is informational. Access control lives in /robots.txt.

If you want to opt out of training, swap the first line for Training: not allowed and add explicit Disallow rules in robots.txt for GPTBot, ClaudeBot, Google-Extended, and friends — the file alone won't enforce it.

Allowlist for the AI bots that matter today

Paste this at the top of your robots.txt. Each crawler gets its own block because many WAFs ignore User-agent: * for AI bots. The list pseudobash maintains lives at /shell.md.

# Allow major AI answer engines.

User-agent: GPTBot
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: Applebot-Extended
Allow: /

User-agent: Bytespider
Allow: /

Sitemap: https://your-site.com/sitemap.xml

Keep your existing User-agent: * block below this — the named blocks take precedence for those bots, and the wildcard handles everyone else.

How to test it

Three commands, run from your laptop:

curl -I https://your-site.com/llms.txt
curl -I https://your-site.com/ai.txt
curl -A "OAI-SearchBot/1.0" -L https://your-site.com/robots.txt

You're looking for 200 OK, Content-Type: text/plain, and a sane Cache-Control (something like max-age=3600). Then run our audit for the full per-crawler view, including which crawlers see real content vs. an empty JS shell.

Next steps

How to show up in ChatGPT results — the pillar guide these files plug into.
How to get traffic from AI agents — what to do once the bots are reading you.
/shell.md — the canonical pseudobash retrieval contract this guide draws from.

Audit my site

Sources

Cite this page

title: llms.txt and ai.txt: a copy-pasteable guide for AI crawlers
url: https://pseudobash.com/blog/llms-txt-and-ai-txt-guide
last-modified: 2026-04-18
snippet: llms.txt is a curated map of citeable URLs; llms-full.txt concatenates them; ai.txt declares training stance; robots.txt is the only one that enforces access.
X-Pseudobash-Cite: title="llms.txt and ai.txt: a copy-pasteable guide for AI crawlers"; url="https://pseudobash.com/blog/llms-txt-and-ai-txt-guide"; last-modified="2026-04-18"