This document describes how the Union.ai documentation platform works, including local development, production builds, the Cloudflare Pages deployment pipeline, LLM documentation generation, and CI checks.
The docs system is split across three repositories:
- unionai-docs — the parent repository containing version-specific content and configuration. Files that differ between
main(v2) andv1branches live here:content/,data/,linkmap/,include/,api-packages.toml,makefile.inc, and CI workflows (.github/). - unionai-docs-infra (this repo) — shared build infrastructure, imported as a git submodule at
unionai-docs-infra/in the parentunionai-docsrepo. This includes Hugo configuration (hugo.toml,hugo.site.toml,hugo.ver.toml,config.*.toml), layouts, themes, static assets (static/), Python tools (tools/), shell scripts (scripts/), Makefiles, and redirect data. The contents are identical across both production branches.
A thin top-level Makefile in unionai-docs forwards all build targets to unionai-docs-infra/Makefile.
A third repository, unionai-examples (at unionai-examples/), contains example code referenced by the documentation. Imported as unionai-docs-infra/ in the parent unionai-docs repo.
- Requirements
- Local development
- Managing tutorial pages
- Production builds
- Cloudflare Pages deployment
- Redirect management
- LLM documentation pipeline
- CI checks on pull requests
-
Hugo (>= 0.145.0)
brew install hugo -
Python (>= 3.8) for build tools (API generator, LLM doc builder, shortcode processor).
-
Local configuration file
Copy the sample configuration and customize it:
cp unionai-docs-infra/hugo.local.toml~sample hugo.local.tomlReview
hugo.local.tomlbefore starting development. See Controlling the development environment for available settings.
Start the development server:
make dev
This launches the site at localhost:1313 in development mode with hot reloading. Edit content files and the browser refreshes automatically.
The development environment gives you live preview and variant-aware rendering. You can see content from all variants at once, highlight the active variant's content, and identify pages missing from a variant.
Change how the development environment works by setting values in hugo.local.toml:
| Setting | Description |
|---|---|
variant |
The current variant to display. Change this, save, and the browser refreshes automatically with the new variant. |
show_inactive |
If true, shows all content that did not match the variant. Useful for seeing all variant sections at once. |
highlight_active |
If true, highlights the current content for the variant. |
highlight_keys |
If true, highlights replacement keys and their values. |
Variants are flavors of the site (flyte, byoc, selfmanaged). During development, render any variant by setting it in hugo.local.toml:
variant = "byoc"To show content from other variants alongside the active one:
show_inactive = trueTo highlight the active variant's content (to distinguish it from common content):
highlight_active = trueContent may be hidden due to {{< variant ... >}} blocks. To see what's missing, adjust the variant show/hide settings in development mode.
For a production-like view:
show_inactive = false
highlight_active = falseFor full developer visibility:
show_inactive = true
highlight_active = trueThe developer site shows in red any pages missing from the variant. For a page to exist in a variant, it must be listed in the variants: frontmatter at the top of the file. Clicking on a red page gives you the path you need to add.
See Contributing docs and examples for authoring guidelines.
Tutorials are maintained in the unionai-examples repository and imported as a git submodule in the unionai-examples directory.
To initialize the submodule on a fresh clone:
make init-examples
To update the submodule to the latest main branch:
make update-examples
make dist
This is the main production build command. It performs the following steps:
- Converts Jupyter notebooks from
unionai-examplesto markdown - Runs
make update-redirectsto detect moved pages and updateredirects.csv - Builds all three Hugo variants (flyte, byoc, selfmanaged) into the
dist/directory - Generates LLM-optimized documentation (
llms-full.txt) for each variant - Regenerates API reference documentation from the latest SDK packages
make dist is the single command that regenerates everything. If CI checks are failing, running make dist locally and committing the changed files will usually fix them.
Serve the dist/ directory with a local web server:
make serve PORT=4444
If no port is specified, defaults to PORT=9000. Open http://localhost:<port> to view the site as it would appear at its official URL.
The production site is deployed via Cloudflare Pages.
Configure your Cloudflare Pages project with these settings:
| Setting | Value |
|---|---|
| Framework preset | None (Custom/Static site) |
| Build command | chmod +x build.sh && ./build.sh |
| Build output directory | dist |
| Root directory | / |
Set these in the Cloudflare Pages dashboard:
PYTHON_VERSION:3.9(or higher)NODE_VERSION:18(or higher)
- The
build.shscript installs Python dependencies using pip3 - Runs
make dist, which builds all documentation variants - The Python processor (
process_shortcodes.py) converts Hugo shortcodes to markdown - Output is generated in the
dist/directory for Cloudflare Pages to serve
When content pages are moved or renamed, redirects.csv tracks the old-to-new URL mappings. These are deployed to Cloudflare as a Bulk Redirect List, so old URLs automatically redirect to the new locations.
Each row in redirects.csv has seven columns:
| Column | Description |
|---|---|
| 1 | Source URL |
| 2 | Target URL |
| 3 | HTTP status code (e.g., 302) |
| 4 | Include subdomains (TRUE/FALSE) |
| 5 | Subpath matching (TRUE/FALSE) |
| 6 | Preserve query string (TRUE/FALSE) |
| 7 | Preserve path suffix (TRUE/FALSE) |
The detect_moved_pages.py script scans git history for file renames under content/ and generates redirect entries for all four variants. Run it with:
make update-redirects
This is also called automatically by make dist.
Redirects are deployed to Cloudflare automatically via GitHub Actions when redirects.csv is modified on the main branch. The deploy_redirects.py script reads the CSV, converts it to the Cloudflare API format, and replaces all items in the Bulk Redirect List via a single PUT request.
The workflow can also be triggered manually from the Actions tab in GitHub.
For local deployment (requires environment variables CLOUDFLARE_API_TOKEN, CLOUDFLARE_ACCOUNT_ID, CLOUDFLARE_LIST_ID):
make deploy-redirects
For a dry run that parses the CSV without making API calls:
python3 tools/redirect_generator/deploy_redirects.py --dry-run
The build generates LLM-optimized documentation at four levels of granularity, designed for AI coding agents and AI search engines:
| File | Scope | Description |
|---|---|---|
page.md |
Per page | Clean Markdown version of every page, with links to other page.md files |
section.md |
Per section | Single-file bundle of all pages in a section (where enabled) |
llms.txt |
Per variant | Page index with H2/H3 headings, grouped by section |
llms-full.txt |
Per variant | Entire documentation as one file with hierarchical link references |
dist/docs/llms.txt # Root discovery: lists versions
dist/docs/v2/llms.txt # Version discovery: lists variants
dist/docs/v2/{variant}/
├── llms.txt # Page index with headings
├── llms-full.txt # Full consolidated doc
├── page.md # Root page
├── user-guide/
│ ├── page.md # User Guide landing page
│ ├── task-configuration/
│ │ ├── page.md # Section landing page
│ │ ├── section.md # Section bundle (all pages concatenated)
│ │ ├── resources/
│ │ │ └── page.md
│ │ ├── caching/
│ │ │ └── page.md
│ │ └── ...
│ └── ...
└── ...
The make llm-docs target (called automatically by make dist) runs two scripts in sequence:
Stage 1: process_shortcodes.py — Generates page.md files
- Reads Hugo's Markdown output from
tmp-md/(Hugo builds this alongside HTML via the MD output format). - Resolves all shortcodes:
{{< variant >}},{{< code >}},{{< tabs >}},{{< note >}},{{< key >}},{{< llm-bundle-note >}}, etc. - Writes the result as
page.mdalongside eachindex.htmlindist/. - Converts all internal links to point to other
page.mdfiles using relative paths.
Stage 2: build_llm_docs.py — Generates bundles and indexes
- Lookup tables: Traverses all
page.mdfiles depth-first via## Subpageslinks, building a lookup table mapping file paths and anchors to hierarchical titles (e.g."user-guide/task-configuration/resources/page.md"→"Configure tasks > Resources"). llms-full.txt: Processes all pages, converting internalpage.mdlinks to hierarchical bold references (e.g.**Configure tasks > Resources**).- Subpage enhancement: Adds H2/H3 headings to
## Subpageslistings inpage.mdfiles. - Section bundles: Generates
section.mdfor sections withllm_readable_bundle: true. - Link absolutization: Converts all relative links in
page.mdfiles to absolute URLs (https://www.union.ai/docs/...). llms.txt: Creates the page index with headings and bundle references.
To enable a section.md bundle for a documentation section, two things are required in the section's _index.md:
- Frontmatter:
llm_readable_bundle: true - Body:
{{< llm-bundle-note >}}shortcode (renders a note pointing to the bundle)
A CI check (check-llm-bundle-notes) verifies these are always in sync.
In section bundles, links to pages within the section become hierarchical bold references, while links to pages outside the section become absolute URLs.
Link conversion in llms-full.txt:
- Cross-page:
[Resources](../resources/page.md)→**Configure tasks > Resources** - Anchor:
[Caching](../caching/page.md#cache-versions)→**Configure tasks > Caching > Cache versions** - Same-page:
[Image building](#image-building)→**Container images > Image building** - External links preserved unchanged
Hierarchy optimization: Strips the Documentation > {Variant} prefix automatically.
Error handling: Missing files log warnings; broken links fall back to link text with context. A link-issues.txt report is written per variant.
LLM documentation regenerates automatically as part of make dist. To regenerate only the LLM files:
make llm-docs
New pages are included automatically if linked via ## Subpages in their parent's Hugo output. New variants are detected automatically.
Every push triggers five checks. Four are GitHub Actions workflows; one is a Cloudflare Pages build preview.
What it checks: Whether the committed API reference docs match what the latest SDK versions would generate.
Why it fails: The upstream flyte-sdk or plugin packages released a new version and the generated API docs in content/api-reference/ are stale.
How to fix:
make update-api-docsThen commit the changed files in content/api-reference/ and linkmap/flytesdk-linkmap.json.
What it checks: That all images referenced in content files actually exist in the repository.
Why it fails: A content file references an image that doesn't exist, was deleted, or was moved without updating the reference.
How to fix: Ensure the image file exists at the path referenced in the markdown. Run make check-images locally to see which references are broken.
What it checks: That generated markdown from Jupyter notebooks is up to date with the source notebooks in unionai-examples.
Why it fails: A notebook in the examples submodule was updated but the generated markdown wasn't regenerated.
How to fix:
make update-examples # pull latest notebooks
make dist # regenerates everything including notebook markdownThen commit the changed files.
What it checks: That redirects.csv includes entries for all file renames detected in git history.
Why it fails: A content file was renamed or moved but the corresponding redirect entries weren't added to redirects.csv.
How to fix:
make update-redirectsThen commit the updated redirects.csv.
What it checks: That all internal links in content files resolve to existing pages.
Why it fails: A link points to a page that doesn't exist, was moved, or has a typo in the path. Note that links to section pages must use the /_index suffix (e.g., [Foo](./foo/_index) not [Foo](./foo)).
How to fix: Run make check-links locally to see which links are broken. Fix the links in the source files. Patterns can be excluded via .link-checker-exclude in the repository root (regex patterns matched against source_file:link_url).
What it checks: That generated content files (API docs, Jupyter notebook conversions, redirects) are up to date with their sources.
Why it fails: An upstream source changed (SDK release, notebook update, file rename) but the generated files weren't regenerated.
How to fix:
make distThen commit the changed files. This single command regenerates all generated content.
What it checks: That llm_readable_bundle: true in frontmatter and the {{< llm-bundle-note >}} shortcode in the page body are always in sync for section _index.md files.
Why it fails: A section has llm_readable_bundle: true but is missing the shortcode, or vice versa.
How to fix: Either add the missing {{< llm-bundle-note >}} shortcode to the page body, or add llm_readable_bundle: true to the frontmatter. Both must be present together, or neither.
What it checks: Builds a deploy preview of the site.
How to use: Click the "Details" link to view a preview of your changes. This is not a pass/fail check — it just provides a preview URL.
Running make dist locally regenerates everything: API docs, redirects, and notebook conversions. It's the single command that covers all the generated-file checks. Commit any changed files afterward.