Skip to content

lbliii/rosettes

Repository files navigation

⌾⌾⌾ Rosettes

PyPI version Build Status Python 3.14+ License: MIT

Modern syntax highlighting for Python 3.14t

from rosettes import highlight

html = highlight("def hello(): print('world')", "python")

Why Rosettes?

  • O(n) guaranteed — Hand-written state machines, no regex backtracking
  • Zero ReDoS — No exploitable patterns, safe for untrusted input
  • Thread-safe — Immutable state, optimized for Python 3.14t free-threading
  • Pygments compatible — Drop-in CSS class compatibility
  • 55 languages — Python, JavaScript, Rust, Go, and 51 more

Installation

pip install rosettes

Requires Python 3.14+


Quick Start

Function Description
highlight(code, lang) Generate HTML with syntax highlighting
tokenize(code, lang) Get raw tokens for custom processing
highlight_many(items) Parallel highlighting for multiple blocks
list_languages() List all 55 supported languages

Features

Feature Description Docs
Basic Highlighting highlight() and tokenize() functions Highlighting →
Parallel Processing highlight_many() for multi-core systems Parallel →
Line Highlighting Highlight specific lines, add line numbers Lines →
CSS Styling Semantic or Pygments-compatible classes Styling →
Custom Formatters Build terminal, LaTeX, or custom output Extending →

📚 Full documentation: lbliii.github.io/rosettes


Usage

Basic Highlighting — Generate HTML from code
from rosettes import highlight

# Basic highlighting
html = highlight("def foo(): pass", "python")
# <div class="rosettes" data-language="python">...</div>

# With line numbers
html = highlight(code, "python", show_linenos=True)

# Highlight specific lines
html = highlight(code, "python", hl_lines={2, 3, 4})
Parallel Processing — Speed up multiple blocks

For 8+ code blocks, use highlight_many() for parallel processing:

from rosettes import highlight_many

blocks = [
    ("def foo(): pass", "python"),
    ("const x = 1;", "javascript"),
    ("fn main() {}", "rust"),
]

# Highlight in parallel
results = highlight_many(blocks)

On Python 3.14t with free-threading, this provides 1.5-2x speedup for 50+ blocks.

Tokenization — Raw tokens for custom processing
from rosettes import tokenize

tokens = tokenize("x = 42", "python")
for token in tokens:
    print(f"{token.type.name}: {token.value!r}")
# NAME: 'x'
# WHITESPACE: ' '
# OPERATOR: '='
# WHITESPACE: ' '
# NUMBER_INTEGER: '42'
CSS Class Styles — Semantic or Pygments

Semantic (default) — Readable, self-documenting:

html = highlight(code, "python")
# <span class="syntax-keyword">def</span>
# <span class="syntax-function">hello</span>
.syntax-keyword { color: #ff79c6; }
.syntax-function { color: #50fa7b; }
.syntax-string { color: #f1fa8c; }

Pygments-compatible — Use existing themes:

html = highlight(code, "python", css_class_style="pygments")
# <span class="k">def</span>
# <span class="nf">hello</span>

Supported Languages

55 languages with full syntax support
Category Languages
Core Python, JavaScript, TypeScript, JSON, YAML, TOML, Bash, HTML, CSS, Diff
Systems C, C++, Rust, Go, Zig
JVM Java, Kotlin, Scala, Groovy, Clojure
Apple Swift
Scripting Ruby, Perl, PHP, Lua, R, PowerShell
Functional Haskell, Elixir
Data/Query SQL, CSV, GraphQL
Markup Markdown, XML
Config INI, Nginx, Dockerfile, Makefile, HCL
Schema Protobuf
Modern Dart, Julia, Nim, Gleam, V
AI/ML Mojo, Triton, CUDA, Stan
Other PKL, CUE, Tree, Kida, Jinja, Plaintext

Architecture

State Machine Lexers — O(n) guaranteed

Every lexer is a hand-written finite state machine:

┌─────────────────────────────────────────────────────────────┐
│                    State Machine Lexer                       │
│                                                              │
│  ┌─────────┐   char    ┌─────────┐   char    ┌─────────┐   │
│  │ INITIAL │ ────────► │ STRING  │ ────────► │ ESCAPE  │   │
│  │ STATE   │           │ STATE   │           │ STATE   │   │
│  └─────────┘           └─────────┘           └─────────┘   │
│      │                      │                     │         │
│      │ emit                 │ emit                │ emit    │
│      ▼                      ▼                     ▼         │
│  [Token]               [Token]               [Token]        │
└─────────────────────────────────────────────────────────────┘

Key properties:

  • Single character lookahead (O(n) guaranteed)
  • No backtracking (no ReDoS possible)
  • Immutable state (thread-safe)
  • Local variables only (no shared mutable state)
Thread Safety — Free-threading ready

All public APIs are thread-safe:

  • Lexers use only local variables during tokenization
  • Formatter state is immutable
  • Registry uses functools.cache for memoization
  • Module declares itself safe for free-threading (PEP 703)

Performance

Benchmarked against Pygments on a 10,000-line Python file:

Operation Rosettes Pygments Speedup
Tokenize 12ms 45ms 3.75x
Highlight 18ms 52ms 2.89x
Parallel (8 blocks) 22ms 48ms 2.18x

Documentation

📚 lbliii.github.io/rosettes

Section Description
Get Started Installation and quickstart
Highlighting Core highlighting APIs
Styling CSS classes and themes
Reference Complete API documentation
About Architecture and design

Development

git clone https://github.com/lbliii/rosettes.git
cd rosettes
uv sync --group dev
pytest

License

MIT License — see LICENSE for details.

About

⌾⌾⌾ Rosettes — ReDoS-safe syntax highlighter for Python 3.14+ with free-threading.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages