llms.txt llms.txt

/ 18 Thermidor 233
4 minutes / 795 words
in short: implement automatic llms.txt into Hugo

Imagine my shock and surprise to discover that when I submit resumes now, I quickly see LLM research agents in my access logs pouring over my portfolio page. As time moves ever forward, I suspect I’ll be dealing with AI agents more frequently, especially in the job market.

I’m not thrilled about this, but I can’t do anything about it - so, if you can’t beat them, join them.

My website is already fairly straight-forward HTML and CSS, but in my own testing, models still seem to get confused about content on my site - theres a lot to parse, even on a simple site.

Luckily, a new standard is being formed for organizing web content for llms into simple “.txt” files. If you haven’t seen them before, they’re essentially markdown versions of page content, no styling, no javascript.

This article explains how to generate these files auto-magically from existing content, for Hugo SSG sites.

The Flow

We want to be able to:

  1. Provide clean, plain-text versions of all content
  2. Allow ourselves to write custom versions when needed
  3. Auto-generate LLM-optimized content in all other cases
  4. Create a master index of all LLM ready content
  5. Do as little manually as possible.

If you’re not familiar with Hugo’s Template system, it’s incredibly powerful, and can do a lot more then just layout HTML files. We’re going to use it today to generate llms.txt files automatically, falling back to our custom content in all other cases.

Important pages like the work portfolio benefit from custom llms.txt files containing hand-crafted, LLM-optimized content. These files live alongside the main content:

content/
  info/
    work/
      index.en.md     # Human-readable page
      llms.txt        # Custom LLM version

Custom files provide complete control over LLM-visible content, enabling:

  • Remove visual elements that don’t translate to text
  • Restructure information for better LLM comprehension
  • Include additional context or explanations
  • Format data in LLM-friendly structures

Pages without custom LLM files automatically generate clean text versions using a custom output format. The implementation works as follows:

Hugo Configuration (config.toml):

The custom content generation system requires three key configuration sections:

# Custom output formats
[mediaTypes]
  [mediaTypes."text/plain"]
    suffixes = ["txt"]
  [mediaTypes."text/markdown"]
    suffixes = ["md"]

[outputFormats]
  [outputFormats.llmsfull]
    mediaType = "text/plain"
    baseName = "llms-full"
    isPlainText = true
    notAlternative = true
  [outputFormats.llms]
    mediaType = "text/plain"
    baseName = "llms"
    isPlainText = true
    notAlternative = true

[outputs]
  home = ["HTML", "RSS", "llmsfull"]
  page = ["HTML", "llms"]

Configuration Breakdown:

  • mediaTypes: Defines the MIME types Hugo recognizes. The text/plain type with txt suffix enables plain text file generation.

  • outputFormats: Creates two custom output formats:

    • llms: Generates individual llms.txt files for each page
    • llmsfull: Creates the comprehensive site index as llms-full.txt

    The isPlainText = true flag ensures proper text formatting, while notAlternative = true prevents these from appearing in RSS feeds or sitemaps.

  • outputs: Specifies which formats to generate:

    • home: The homepage generates HTML, RSS, and the full LLM index
    • page: Individual pages generate HTML and LLM-friendly versions

Template (layouts/_default/single.llms.txt):

{{- if .File -}}
{{- $customLlmsFile := printf "%sllms.txt" .File.Dir -}}
{{- if not (fileExists (printf "content/%s" $customLlmsFile)) -}}
# {{ .Title }}

{{ if .Params.description }}{{ .Params.description }}

{{ end }}{{ if .Date }}Published: {{ .Date.Format "January 2, 2006" }}

{{ end }}{{ .RawContent }}
{{- end -}}
{{- end -}}

This template only generates content when no custom llms.txt file exists, ensuring the custom versions always take precedence. The crown jewel is llms-full.txt - a comprehensive index of all LLM-friendly content on the site, organized by content type:

Template (layouts/index.llmsfull.txt):

# LLM-Friendly Content Index

# Info
{{- range (where .Site.RegularPages "Type" "info") }}
- [{{ .Title }}]({{ .Permalink }}llms.txt)
{{- end }}

# Posts  
{{- range (where .Site.RegularPages "Type" "posts") }}
- [{{ .Title }}]({{ .Permalink }}llms.txt)
{{- end }}

# Projects
{{- range (where .Site.RegularPages "Type" "projects") }}
- [{{ .Title }}]({{ .Permalink }}llms.txt)
{{- end }}

# IndieWeb Notes
{{- range (where .Site.RegularPages "Type" "indieweb") }}
{{- if eq .Params.kind "note" }}
- [{{ .Title }}]({{ .Permalink }}llms.txt)
{{- end }}
{{- end }}

Smart Content Detection

The system automatically detects which pages have LLM versions available and shows links accordingly. On individual pages, the template checks for content and displays the link in the page header:

{{/* Always show llms.txt link - either custom or auto-generated */}}
{{ if .File }}
<a href="{{ .Permalink }}llms.txt" class="llms-link">
  <span class="llms-letter">l</span><span class="llms-letter">l</span>...
</a>
{{ end }}

The Result

Every page now has an LLM-friendly version accessible at /page-url/llms.txt, with a comprehensive index at /llms-full.txt. The system automatically maintains itself as new content gets added, while providing full control over the most important pages.

The complete implementation is available in the site’s source code, and the system generates this very post in LLM-friendly format automatically. The LLM version of this post and complete site index demonstrate the system in action.


Comments