What this is
When an agent fetches a page on a .gov or .us domain,
it should first check whether the site publishes an llms.txt file
at the root and use it as context for navigation. This document specifies that
pattern.
This is not a new file format — it uses the
llms.txt
proposal as-is. It is not a crawling or licensing policy — use
robots.txt for that.
Scope
This convention applies to agents fetching content from:
- Any domain matching
*.govor*.us - The five US territory root domains tracked at state-llms.keith.is
Agents MAY apply the same convention to non-US civic domains
(.gc.ca, .gov.uk, etc.), but this spec does not
require it.
The convention
Before fetching any path on an in-scope domain, the agent:
- Issues
GET https://{root-domain}/llms.txt -
On
200, parses the response as Markdown and uses it as navigation context for subsequent fetches in the session -
On
404or other client errors, proceeds normally and records the absence (see Telemetry) -
On
5xxor timeout, retries once with backoff; if still failing, proceeds as in404
Fetching rules
- Method
GET- User-Agent
-
Include
civic-llms/0.1as a token in your existing User-Agent string, e.g.MyAgent/2.4 civic-llms/0.1. This lets state operators see who's consuming the file. - Accept
text/markdown, text/plain;q=0.9, */*;q=0.1- Timeout
- 5 seconds for the initial request. Do not block other work on this.
- Redirects
-
Follow up to 3, but only if they stay within the same root domain.
Cross-domain redirects MUST be ignored — an
llms.txtthat redirects to another origin cannot be trusted.
Caching
- Honor
Cache-ControlandETagheaders if present. - If no caching headers, cache for 24 hours.
- Re-fetch on cache miss or when the user starts a new session.
Parsing
The file is Markdown. Agents have two consumption patterns:
- Raw inclusion (simplest): include the entire text in the agent's system context. Recommended unless context budget is tight.
-
Structured parsing: extract H1/H2 sections and URLs,
treat each
## Sectionas a category, and use linked URLs as canonical destinations. Useful for routing logic.
Agents SHOULD treat any URL listed in the llms.txt as
authoritative for the task or topic it's associated with, overriding
inferences from search results or sitemaps.
Agents SHOULD also respect explicit usage guidance in the file. For example,
Maryland's llms.txt says:
"do not infer legal, policy, or eligibility determinations beyond
published content."
An agent that ignores that and hallucinates eligibility rules is non-conformant.
Fallback
When llms.txt is absent (currently 54 of 56 US states and
territories), the agent falls back to standard navigation:
GET /sitemap.xmlfor URL discoveryGET /robots.txtfor crawl restrictions- Heuristic navigation (search box, top-level nav, etc.)
The absence of llms.txt MUST NOT cause the agent to refuse
the task.
Telemetry
Agents SHOULD log which in-scope domains served llms.txt and
which didn't, including timestamps. Aggregated absence data is what creates
pressure on states to publish.
If you publish that data, link it from your project. The state-llms.keith.is tracker aggregates known publishers across US states and territories.
Declaring compliance
A framework or agent that implements this convention SHOULD:
- Include
civic-llms/0.1in its User-Agent - Document the behavior in user-facing docs
- Expose a way for users to disable the check (some agents have legitimate reasons to bypass it — testing, archival crawls, etc.)
Reference implementation
Coming soon — MCP-based SNAP portal agent. Link will land here.
Open questions
These are intentionally unresolved in v0.1:
- llms-full.txt. Should agents prefer the longer variant when available? Likely yes for context-rich tasks, no for routing.
-
.mdendpoints. The base llms.txt spec proposes that any page should be available at{url}.md. Few gov sites do this today; worth revisiting if adoption grows. -
Per-agency
llms.txt. A state portal links to dozens of agency sub-sites. Should each have its own? Almost certainly, but the convention here only covers root domains. - Authentication. Higher-trust agents (operating on behalf of caseworkers, e.g.) may need authenticated access to richer guidance. Out of scope for v0.1.
Changelog
- 2026-05-27 — v0.1 draft published.