Skip to content

kalv25/latexport

Repository files navigation

latexport

A workflow for converting LaTeX documents to accessible HTML with PDF output.

Overview

This project converts .tex files into web-ready HTML pages using LaTeXML, while also generating PDF versions via pdflatex. The HTML output is customised with additional CSS, JavaScript, and accessibility enhancements.

Project Structure

latexport/
├── static/            # Shared CSS/JS assets (source of truth)
│   ├── css/
│   │   └── custom.css
│   └── js/
│       ├── custom.js
│       └── mathjax-config.js
├── latexml/           # Custom LaTeXML bindings (.ltxml); all loaded automatically
│   ├── amsmath-compat.ltxml
│   └── emph-in-math.ltxml
├── output/            # Generated output (seeded from static/ on each run)
│   ├── css/           # Copied from static/css/
│   ├── js/            # Copied from static/js/
│   ├── index.html     # Generated by latexport-index
│   └── {document}/    # Per-document output
│       ├── index.html
│       └── {document}.pdf
├── templates/         # HTML templates
├── main.py            # Main processing script
├── create_main_index.py  # Index page generator
├── embed_assets.py    # Self-contained HTML bundler
└── config.py          # Configuration settings

Prerequisites

Python 3.12+ and uv

Install uv, which manages the Python version and dependencies:

curl -LsSf https://astral.sh/uv/install.sh | sh

LaTeXML

LaTeXML converts .tex files to HTML5.

macOS:

brew install latexml

Ubuntu / Debian:

sudo apt install latexml

Other: see the LaTeXML installation docs.

TeX distribution (pdflatex)

A TeX distribution provides pdflatex, used to produce PDF output.

macOS:

brew install --cask mactex-no-gui

Ubuntu / Debian:

sudo apt install texlive-latex-base

Already have TeX Live? Install only pdflatex via tlmgr:

tlmgr install pdftex

bibtex is included with most TeX distributions. For biber (used with biblatex):

tlmgr install biber

Both are optional — latexport auto-detects whether they are needed based on the source file.

Installation

# Clone the repository
git clone <repository-url>
cd latexport

# Install Python dependencies and register CLI commands
uv sync
uv pip install -e .

Usage

1. Process LaTeX Files

Convert one or more .tex files to HTML and PDF:

# Process a single file
uv run latexport tex_src/example.tex

# Process multiple files
uv run latexport tex_src/file1.tex tex_src/file2.tex

# Write output to a custom directory instead of output/
uv run latexport -o ./public tex_src/example.tex

# Override the output subdirectory name (single file only)
uv run latexport --name lecture-notes tex_src/example.tex
# → output goes to output/lecture-notes/ instead of output/example/

# Dry run (preview without changes)
uv run latexport -n tex_src/example.tex

This will:

  • Seed the output directory with shared assets from static/
  • Auto-detect whether bibliography processing (bibtex/biber) is needed
  • If \cite commands are present: run bibtex/biber before LaTeXML so citations resolve in HTML
  • Generate HTML at {output}/{stem}/index.html (via LaTeXML, with all latexml/*.ltxml bindings)
  • Generate PDF at {output}/{stem}/{stem}.pdf (via pdflatex, with bibtex/biber if needed)
  • Clean up auxiliary files (.aux, .log, .out, .bbl, .blg, .bcf, .run.xml)
  • Remove empty subdirectories left by pdflatex's \include handling
  • Inject custom CSS and JavaScript references
  • Replace QED symbols with accessible HTML
  • Consolidate local CSS files to the shared css/ folder

2. Generate Main Index Page

Create an index page listing all documents:

# Use the default output directory (from config.py)
uv run latexport-index

# Use a custom output directory
uv run latexport-index -o examples/output

This scans the output directory for index.html files and generates a main index with links to each document (and PDF if available).

4. Clean Up Log Files

Remove latexml.log files left behind by LaTeXML:

uv run latexport-clean

This removes latexml.log from the current directory and recursively from the output directory. During a normal latexport run these are cleaned up automatically; latexport-clean handles any leftovers from previous runs.

3. Bundle a Self-Contained HTML File

Inline all CSS and JS into a single portable file:

# Bundle with all assets inlined (CSS + JS) — default behaviour
uv run embed_assets.py output/example/index.html

# Bundle but skip remote assets (they remain as external references)
uv run embed_assets.py --skip-remote output/example/index.html

# Bundle CSS only — leave <script src> tags untouched
uv run embed_assets.py --skip-js output/example/index.html

# Write the bundled file to a custom path
uv run embed_assets.py output/example/index.html dist/standalone.html

Configuration

Edit config.py to customise paths and settings:

OUTPUT_DIR = Path("./output")    # Root directory for generated output
STATIC_DIR = Path("./static")    # Shared CSS/JS source; copied into output on each run
LATEXML_DIR = Path(__file__).parent / "latexml"  # LaTeXML binding files (absolute path)
SRC_QED_SYMBOL = "∎"             # QED symbol to replace in HTML
ENCODING = "utf-8"               # File encoding

# Index generator settings
ROOT_DIR = OUTPUT_DIR
PATTERN = "index.html"
TEMPLATE_PATH = Path("templates/main_index_template.html")

Examples

Live demos are published at https://kalv25.github.io/latexport/.

Single standalone file — testmath.tex

Source: latex3/latex2e — © American Mathematical Society / LaTeX Project, LPPL 1.3c.

A self-contained file with no \include dependencies. The stem is overridden so the output folder has a descriptive name rather than the generic testmath.

uv run latexport \
  -o examples/output \
  --name latex2e-testmath \
  examples/tex_src/testmath.tex

Output:

examples/output/latex2e-testmath/index.html
examples/output/latex2e-testmath/testmath.pdf

Live: https://kalv25.github.io/latexport/latex2e-testmath/


Multi-part document — hermish-proofs-notes/main.tex

Source: hermish/proofs-notes — CS70 lecture notes by Hermish Mehta.

A document split across multiple files via \include. latexport creates the required subdirectories for pdflatex, then removes them once they are empty after aux file cleanup.

uv run latexport \
  -o examples/output \
  --name hermish-proofs-notes \
  examples/tex_src/hermish-proofs-notes/main.tex

Output:

examples/output/hermish-proofs-notes/index.html
examples/output/hermish-proofs-notes/main.pdf

Live: https://kalv25.github.io/latexport/hermish-proofs-notes/


Generate the main index

After converting one or more documents, build the navigable index page:

uv run latexport-index -o examples/output

This scans examples/output/ and writes examples/output/index.html with links to each document (and its PDF where available).


Typical Workflow

  1. Write LaTeX — Create/edit .tex files in tex_src/
  2. Convert to HTML/PDF — Run uv run latexport tex_src/yourfile.tex
  3. Regenerate index — Run uv run latexport-index
  4. Deploy — Upload output/ to your web server

Customisation

Custom CSS

Edit static/css/custom.css. This file is automatically copied into the output directory and injected into every processed HTML file.

Custom JavaScript

Edit files in static/js/. The following are automatically injected:

  • custom.js — Page-width slider, MathJax toggle, go-to-top button
  • mathjax-config.js — MathJax configuration

Localising the toolbar (custom.js)

All user-visible strings in the toolbar are read from window.latexportI18n. To override them for another language, add a <script> block before custom.js loads:

<script>
  window.latexportI18n = {
      widthLabel:     'Breite',
      widthAriaLabel: 'Seitenbreite in ch-Einheiten',
      mathOn:         'Formel ✓',
      mathOff:        'Formel ✗',
      mathAriaOn:     'MathJax-Darstellung ein',
      mathAriaOff:    'MathJax-Darstellung aus',
      goToTopAria:    'Zum Seitenanfang',
  };
</script>

Only the keys you want to change need to be provided; omitted keys fall back to the English defaults.

LaTeXML Bindings

Custom LaTeXML behaviour is defined in .ltxml files inside latexml/. These are Perl modules loaded via --preload on every latexmlc invocation. All .ltxml files in latexml/ are loaded automatically (alphabetical order) — no changes to main.py needed when adding new ones.

Currently included:

  • amsmath-compat.ltxml — no-op stubs for amsmath internal commands (e.g. \ctagsplit@true) that would otherwise cause "undefined macro" errors.
  • emph-in-math.ltxml — redefines \emph{…} as \mathit{…} inside math environments, \textit{…} elsewhere.

To add a new binding, simply create a .ltxml file in latexml/.

Index Template

Edit templates/main_index_template.html. The template uses Python str.format-style placeholders:

Placeholder Default Description
{lang} en <html lang> attribute
{title} Documents <title> and <meta name="description">
{description} Document index Meta description content
{heading} Documents <h1> text
{contents_label} Contents <h3> section label
{links} (generated) Rendered <li> elements — filled automatically

To generate the index in another language, pass keyword arguments to create_main_index_page:

create_main_index_page(
    root_dir=Path("output"),
    lang="de",
    title="Dokumente",
    description="Dokumentenindex",
    heading="Dokumente",
    contents_label="Inhalt",
)

Caveats

SVG Dark Mode

SVG images (e.g., diagrams generated by TikZ) use a simple CSS filter to invert colours in dark mode. This works well for simple black-and-white diagrams but may produce unexpected results when multiple colours are used. Always test your documents in dark mode to verify SVG rendering.

LaTeXML Conversion Limitations

LaTeXML does not support all LaTeX packages and document structures. Known cases where HTML conversion fails or produces degraded output:

Multi-part documents — Projects where the root .tex file relies on a custom build system, non-standard \include chaining, or shared preamble files split across multiple directories may not convert correctly. LaTeXML resolves includes relative to --sourcedirectory; files outside that tree are not found.

memoir class — Documents using the memoir document class are not reliably converted. LaTeXML has limited support for memoir's extended sectioning, captioning, and page-layout commands. For example, the UiO Introduction to LaTeX repository uses memoir and fails to produce usable HTML output.

In these cases pdflatex still produces a correct PDF; only the HTML output is affected. Consider restructuring such documents to use a standard class (article, report, book) for full LaTeXML compatibility.

Further Reading

Resources that informed this project:

Contributing

Contributions are welcome — see CONTRIBUTING.md for setup instructions, code style, and how to submit a pull request.

License

MIT — see LICENSE.

About

Convert LaTeX documents to accessible HTML + PDF. Powered by LaTeXML.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors