**TL;DR.** `llms.txt` is the AEO (Answer Engine Optimization) equivalent of `robots.txt` + `sitemap.xml` combined — a Markdown file at your root that tells AI engines what your site is about and where to find structured information. In 2026, ecommerce stores without llms.txt are leaving citation opportunities on the table.

## What llms.txt is

A simple Markdown file at `https://yourdomain.com/llms.txt`. It's a community standard proposed by Jeremy Howard (fast.ai) in 2024 and adopted opportunistically by Anthropic, Perplexity, and other AI engines.

The structure:

```markdown
# Brand Name

> A 1-2 sentence summary of what the brand does and who it serves.

## About

Brief 200-word description of the brand.

## Products

- [Product category 1](https://example.com/categories/cat-1) — short description
- [Product category 2](https://example.com/categories/cat-2) — short description

## Documentation

- [Help center](https://example.com/help)
- [API docs](https://example.com/docs)

## Pricing

- [Pricing page](https://example.com/pricing)

## Optional

- [Blog](https://example.com/blog)
- [Comparison pages](https://example.com/compare)
- [Glossary](https://example.com/glossary)
```

That's it. Plain Markdown, well under 5KB.

## Why AI engines care

Crawling a 50,000-page ecommerce site is expensive. AI engines that index for retrieval (Perplexity, ChatGPT Search) and AI engines that train (Anthropic, OpenAI) both benefit from a curated entry point.

llms.txt gives them:

1. A canonical brand description in the engine's preferred format (Markdown).
2. A map of high-priority sections, not buried links.
3. Pointers to deeper content (llms-full.txt, Markdown twins).

Pages cited in llms.txt and pages with Markdown twins are observably more likely to be cited in AI answers. Research consensus in 2025–2026 estimates 3–5x citation probability for pages with these surfaces versus pages without.

## llms-full.txt for deep retrieval

`llms.txt` is a sitemap-style index. `llms-full.txt` is the long-form companion that contains the actual Markdown body of every public document.

```markdown
# Brand Name — Full Knowledge Base

## Section: Brand Overview

[200-word brand description]

## Section: Pricing

[Full pricing matrix as Markdown table]

## Section: FAQ

### Q: How does pricing work?
A: ...

### Q: ...

## Section: Comparisons

### Comparison: Brand vs Shopify

[Full text of the comparison page]

### Comparison: Brand vs BigCommerce

[Full text]

## Section: Glossary

### AEO
AEO (Answer Engine Optimization) is...

### llms.txt
A Markdown file at...
```

Target size: 50–500KB. Larger is fine; AI engines fetch it lazily.

The pattern is: the engine fetches llms.txt to understand your brand, then optionally fetches llms-full.txt for retrieval at answer time.

## Markdown twins

For every public Article (blog post, guide, comparison, customer story, glossary entry), emit a Markdown twin at the same path + `.md` suffix:

| HTML URL                                         | Markdown twin                                       |
| ------------------------------------------------ | --------------------------------------------------- |
| `/blog/inp-optimization-2026`                     | `/blog/inp-optimization-2026.md`                    |
| `/compare/ordiko-vs-shopify`                      | `/compare/ordiko-vs-shopify.md`                     |
| `/guides/migrate-from-shopify-to-ordiko`           | `/guides/migrate-from-shopify-to-ordiko.md`          |
| `/glossary/aeo`                                   | `/glossary/aeo.md`                                  |

Reference the Markdown version in your HTML:

```html
<link rel="alternate" type="text/markdown" href="/blog/inp-optimization-2026/raw.md" />
```

The Markdown version should be the same content as the HTML page minus navigation chrome — just the article body. Use proper Markdown headings, lists, tables, and code fences.

## AI crawler policy in robots.txt

Explicitly allow the engines you want to be cited by:

```
User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

# Deny access to private/sensitive paths for all crawlers
User-agent: *
Disallow: /cart
Disallow: /checkout
Disallow: /account
Disallow: /api
Disallow: /admin
```

The user-agent names:

| User-Agent          | Engine             | Purpose                           |
| ------------------- | ------------------ | --------------------------------- |
| `GPTBot`            | OpenAI             | Training data collection           |
| `ChatGPT-User`      | OpenAI             | ChatGPT browsing on user request   |
| `OAI-SearchBot`     | OpenAI             | ChatGPT Search index               |
| `ClaudeBot`         | Anthropic          | Claude.ai retrieval                |
| `anthropic-ai`      | Anthropic          | Training data collection           |
| `PerplexityBot`     | Perplexity         | Perplexity's index                  |
| `Google-Extended`   | Google             | Bard/Gemini training opt-in        |
| `CCBot`             | Common Crawl       | Open dataset (used by many AIs)    |

## Content patterns that get cited

AI engines parse content as text and prefer:

1. **Direct definitional answers in the first 60 words.** Lead with the answer.
2. **H2 questions, H3 sub-questions.** Hierarchical headings = parseable structure.
3. **Markdown tables for comparable data.** AI engines understand tables well.
4. **Numbered statistics with year.** "In 2026, 73% of merchants..." is more citable than "Most merchants...".
5. **Cite your sources.** Include `[Source](url)` links — AI engines weight cited content higher.
6. **FAQ blocks.** Question-and-answer format is heavily favored by retrieval systems.

Avoid:

- Inverted pyramids (lead, then context, then conclusion) — AI engines often quote the lead and skip the rest.
- Long paragraph walls of text without structure.
- "Click to expand" hidden content that crawlers can't see.
- Heavy reliance on images without alt text.

## Monitoring AI traffic

In your server logs, count requests by user agent:

```bash
grep -oE "(GPTBot|ChatGPT-User|ClaudeBot|PerplexityBot)" access.log | sort | uniq -c
```

You should see traffic from these bots once they discover your llms.txt. Volume grows over weeks as engines crawl and re-crawl.

To monitor citations:

- Search your brand name on Perplexity, ChatGPT, Claude, Gemini. Note which pages are cited.
- Use brand-monitoring tools (Brand24, Mention) that increasingly track AI mentions.
- Set up a quarterly review: query "best [your category] platform" on each AI engine and document who gets cited.

## How Ordiko handles AEO

Ordiko ships:

- `/llms.txt` auto-generated per store and per apex.
- `/llms-full.txt` concatenating the full Markdown body of every marketing doc.
- Markdown twins of every blog post, guide, comparison, customer story, glossary entry.
- AI crawler allow rules on marketing routes; disallow on cart/checkout/account.
- Citable content templates: TL;DR lead, H2 questions, tables, FAQ blocks, numbered stats.

Zero configuration required.

## FAQ

**Is llms.txt an official standard?**
It's a community standard proposed by Jeremy Howard in 2024 and adopted by Anthropic, Perplexity, and others. There's no W3C or IETF spec. AI engines fetch it opportunistically when discovering a domain. Treat it as best practice, not strict compliance.

**Will llms.txt make my site rank higher on Google?**
Not directly. Google's classic SERP doesn't read llms.txt. The signal benefits AI search citation (Perplexity, ChatGPT Search, Claude) and Google AI Overviews indirectly via cleaner content semantics.

**Should I block AI crawlers to protect my content?**
For most ecommerce stores, no. Blocking GPTBot/ClaudeBot/PerplexityBot from your marketing routes guarantees you don't get cited when users ask AI engines about your category. The trade-off is that AI engines train on your content; for ecommerce that's a feature, not a bug.

**How does Ordiko handle llms.txt?**
Ordiko auto-generates llms.txt and llms-full.txt per store from the catalog and content collection, with cache invalidation tied to content mutations. Markdown twins are emitted at /blog/[slug].md, /compare/[slug].md, etc. Configuration is zero-touch.
