Modern syntax highlighting for Python 3.14t
from rosettes import highlight
html = highlight("def hello(): print('world')", "python")- O(n) guaranteed — Hand-written state machines, no regex backtracking
- Zero ReDoS — No exploitable patterns, safe for untrusted input
- Thread-safe — Immutable state, optimized for Python 3.14t free-threading
- Pygments compatible — Drop-in CSS class compatibility
- 55 languages — Python, JavaScript, Rust, Go, and 51 more
pip install rosettesRequires Python 3.14+
| Function | Description |
|---|---|
highlight(code, lang) |
Generate HTML with syntax highlighting |
tokenize(code, lang) |
Get raw tokens for custom processing |
highlight_many(items) |
Parallel highlighting for multiple blocks |
list_languages() |
List all 55 supported languages |
| Feature | Description | Docs |
|---|---|---|
| Basic Highlighting | highlight() and tokenize() functions |
Highlighting → |
| Parallel Processing | highlight_many() for multi-core systems |
Parallel → |
| Line Highlighting | Highlight specific lines, add line numbers | Lines → |
| CSS Styling | Semantic or Pygments-compatible classes | Styling → |
| Custom Formatters | Build terminal, LaTeX, or custom output | Extending → |
📚 Full documentation: lbliii.github.io/rosettes
Basic Highlighting — Generate HTML from code
from rosettes import highlight
# Basic highlighting
html = highlight("def foo(): pass", "python")
# <div class="rosettes" data-language="python">...</div>
# With line numbers
html = highlight(code, "python", show_linenos=True)
# Highlight specific lines
html = highlight(code, "python", hl_lines={2, 3, 4})Parallel Processing — Speed up multiple blocks
For 8+ code blocks, use highlight_many() for parallel processing:
from rosettes import highlight_many
blocks = [
("def foo(): pass", "python"),
("const x = 1;", "javascript"),
("fn main() {}", "rust"),
]
# Highlight in parallel
results = highlight_many(blocks)On Python 3.14t with free-threading, this provides 1.5-2x speedup for 50+ blocks.
Tokenization — Raw tokens for custom processing
from rosettes import tokenize
tokens = tokenize("x = 42", "python")
for token in tokens:
print(f"{token.type.name}: {token.value!r}")
# NAME: 'x'
# WHITESPACE: ' '
# OPERATOR: '='
# WHITESPACE: ' '
# NUMBER_INTEGER: '42'CSS Class Styles — Semantic or Pygments
Semantic (default) — Readable, self-documenting:
html = highlight(code, "python")
# <span class="syntax-keyword">def</span>
# <span class="syntax-function">hello</span>.syntax-keyword { color: #ff79c6; }
.syntax-function { color: #50fa7b; }
.syntax-string { color: #f1fa8c; }Pygments-compatible — Use existing themes:
html = highlight(code, "python", css_class_style="pygments")
# <span class="k">def</span>
# <span class="nf">hello</span>55 languages with full syntax support
| Category | Languages |
|---|---|
| Core | Python, JavaScript, TypeScript, JSON, YAML, TOML, Bash, HTML, CSS, Diff |
| Systems | C, C++, Rust, Go, Zig |
| JVM | Java, Kotlin, Scala, Groovy, Clojure |
| Apple | Swift |
| Scripting | Ruby, Perl, PHP, Lua, R, PowerShell |
| Functional | Haskell, Elixir |
| Data/Query | SQL, CSV, GraphQL |
| Markup | Markdown, XML |
| Config | INI, Nginx, Dockerfile, Makefile, HCL |
| Schema | Protobuf |
| Modern | Dart, Julia, Nim, Gleam, V |
| AI/ML | Mojo, Triton, CUDA, Stan |
| Other | PKL, CUE, Tree, Kida, Jinja, Plaintext |
State Machine Lexers — O(n) guaranteed
Every lexer is a hand-written finite state machine:
┌─────────────────────────────────────────────────────────────┐
│ State Machine Lexer │
│ │
│ ┌─────────┐ char ┌─────────┐ char ┌─────────┐ │
│ │ INITIAL │ ────────► │ STRING │ ────────► │ ESCAPE │ │
│ │ STATE │ │ STATE │ │ STATE │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │
│ │ emit │ emit │ emit │
│ ▼ ▼ ▼ │
│ [Token] [Token] [Token] │
└─────────────────────────────────────────────────────────────┘
Key properties:
- Single character lookahead (O(n) guaranteed)
- No backtracking (no ReDoS possible)
- Immutable state (thread-safe)
- Local variables only (no shared mutable state)
Thread Safety — Free-threading ready
All public APIs are thread-safe:
- Lexers use only local variables during tokenization
- Formatter state is immutable
- Registry uses
functools.cachefor memoization - Module declares itself safe for free-threading (PEP 703)
Benchmarked against Pygments on a 10,000-line Python file:
| Operation | Rosettes | Pygments | Speedup |
|---|---|---|---|
| Tokenize | 12ms | 45ms | 3.75x |
| Highlight | 18ms | 52ms | 2.89x |
| Parallel (8 blocks) | 22ms | 48ms | 2.18x |
| Section | Description |
|---|---|
| Get Started | Installation and quickstart |
| Highlighting | Core highlighting APIs |
| Styling | CSS classes and themes |
| Reference | Complete API documentation |
| About | Architecture and design |
git clone https://github.com/lbliii/rosettes.git
cd rosettes
uv sync --group dev
pytestMIT License — see LICENSE for details.