High-performance YAML 1.2.2 parser for Python and Node.js, powered by Rust.
Drop-in replacement for PyYAML and js-yaml. Matches or beats PyYAML C on small/medium files, 2-4x faster than pure Python, 1.2-1.4x faster than js-yaml. Full YAML 1.2.2 Core Schema compliance, comprehensive linting, and multi-threaded parallel processing.
Important
YAML 1.2.2 Compliance — Unlike PyYAML (YAML 1.1), fast-yaml follows the modern YAML 1.2.2 specification. This means yes/no/on/off are strings, not booleans.
# Python
pip install fastyaml-rs
# Node.js
npm install fastyaml-rs
# CLI
cargo install fast-yaml-cliWarning
Requires Rust 1.88+, Python 3.10+ or Node.js 20+
Build from source
git clone https://github.com/bug-ops/fast-yaml.git
cd fast-yaml
# Python
uv sync && uv run maturin develop
# Node.js
cd nodejs && npm install && npm run buildfast-yaml is organized as a modular Rust workspace with clear separation of concerns:
fast-yaml/
├── crates/fast-yaml-core/ # Parser + Emitter + Streaming Formatter
├── crates/fast-yaml-linter/ # Linter + Diagnostic Formatters
├── crates/fast-yaml-parallel/ # Multi-threaded processing
├── python/ # PyO3 bindings
└── nodejs/ # NAPI-RS bindings
| Component | Location | Input | Output | Use Case |
|---|---|---|---|---|
| Parser | fast-yaml-core |
YAML text | Value (DOM) |
Deserialize YAML to data structures |
| Emitter | fast-yaml-core |
Value (DOM) |
YAML text | Serialize data structures to YAML |
| Streaming Formatter | fast-yaml-core |
Parser events | YAML text | Format YAML without building DOM |
| Linter | fast-yaml-linter |
YAML text | Vec<Diagnostic> |
Validate YAML against rules |
| Parallel Processor | fast-yaml-parallel |
YAML files/streams | BatchResult |
Parallel processing at document and file level |
Tip
Parser vs Streaming Formatter: Parser builds a full DOM (use for data manipulation), Streaming Formatter processes events directly (use for formatting/conversion).
Tip
Linter vs Diagnostic Formatter: Linter validates YAML and produces diagnostics, Diagnostic Formatter renders them for display (rustc-style text, JSON, SARIF).
| Type | API | Use Case |
|---|---|---|
| Document-level | parse_parallel() |
Parse multi-document YAML streams |
| File-level | process_files(), FileProcessor |
Process multiple files in parallel |
| CLI batch mode | fy format -j 8 dir/ |
Format directories with parallel workers |
Note
All parallelism is now unified in the fast-yaml-parallel crate. CLI batch mode and FFI bindings use this single implementation.
import fast_yaml
data = fast_yaml.safe_load("""
name: fast-yaml
features: [fast, safe, yaml-1.2.2]
""")
yaml_str = fast_yaml.safe_dump(data)Tip
Migrating from PyYAML? Just change your import: import fast_yaml as yaml
import { safeLoad, safeDump } from 'fastyaml-rs';
const data = safeLoad(`name: fast-yaml`);
const yamlStr = safeDump(data);# Single file operations
fy parse config.yaml # Validate syntax
fy format -i config.yaml # Format in-place
fy convert json config.yaml # YAML → JSON
fy lint config.yaml # Lint with diagnostics
# Batch mode (directories, globs, multiple files)
fy format -i src/ # Format entire directory
fy format -i "**/*.yaml" # Format with glob pattern
fy format -i -j 8 project/ # Parallel processing (8 workers)
fy lint --exclude "tests/**" . # Lint all except testsTip
Batch mode activates automatically for directories, globs, or multiple files. Supports parallel processing, include/exclude patterns, and respects .gitignore.
- High Performance — Matches PyYAML C on small/medium files, 2-4x faster than pure Python
- YAML 1.2.2 — Full Core Schema compliance
- Drop-in API — Compatible with PyYAML/js-yaml
- Batch Processing — Multi-file operations with parallel workers, glob patterns, .gitignore support
- Linting — Rich diagnostics with line/column tracking
- Parallel — Multi-threaded processing for large files
- Safe — Memory-safe Rust with minimal
unsafe(FFI boundaries only, explicitly documented)
Tip
Parallel processing provides 3-6x speedup on 4-8 core systems for multi-document files.
Feature details
from fast_yaml._core.lint import lint
diagnostics = lint("key: value\nkey: duplicate")
for diag in diagnostics:
print(f"{diag.severity}: {diag.message} at line {diag.span.start.line}")from fast_yaml._core.parallel import parse_parallel, ParallelConfig
# Parse ONE file with MULTIPLE documents in parallel
multi_doc_yaml = "---\nfoo: 1\n---\nbar: 2\n---\nbaz: 3"
config = ParallelConfig(thread_count=4, max_input_size=100*1024*1024)
docs = parse_parallel(multi_doc_yaml, config) # 3 documents parsed in parallel[!NOTE] This is document-level parallelism (parsing documents inside one file). For file-level parallelism (processing multiple files), use CLI batch mode:
fy format -i -j 8 directory/
Note
Three separate benchmark suites: Python API (vs PyYAML), Node.js API (vs js-yaml), and CLI Batch Mode (vs yamlfmt).
Note
Process startup overhead (~15ms for Python, ~20-25ms for Node.js) affects small file benchmarks. In long-running servers (persistent processes), speedups would be 2-4x higher.
Tip
Batch mode is where fast-yaml excels with parallel processing. Use -j to specify worker count.
Benchmark results
Parse (loading):
| File Size | fast-yaml | PyYAML (C) | PyYAML (pure) | vs C | vs pure |
|---|---|---|---|---|---|
| Small (502B) | 15.5 ms | 20.2 ms | 20.8 ms | 1.30x | 1.34x |
| Medium (44KB) | 26.3 ms | 26.4 ms | 61.2 ms | 1.00x | 2.33x |
| Large (449KB) | 130.3 ms | 79.3 ms | 429.6 ms | 0.61x | 3.30x |
Dump (serialization):
| File Size | fast-yaml | PyYAML (C) | PyYAML (pure) | vs C | vs pure |
|---|---|---|---|---|---|
| Small (502B) | 15.7 ms | 20.8 ms | 21.2 ms | 1.33x | 1.35x |
| Medium (44KB) | 31.6 ms | 31.7 ms | 82.7 ms | 1.00x | 2.62x |
| Large (449KB) | 177.6 ms | 131.1 ms | 653.8 ms | 0.74x | 3.68x |
Key findings:
- Small/Medium files: fast-yaml matches or beats PyYAML C (1.0-1.3x speedup)
- Pure Python: fast-yaml consistently 1.3-3.7x faster across all sizes
- Large files: PyYAML C optimized for single large files; use fast-yaml's parallel mode for multi-document streams
Full benchmarks: benches/comparison
Parse (loading):
| File Size | fast-yaml | js-yaml | Speedup |
|---|---|---|---|
| Small (502B) | 24.4 ms | 28.1 ms | 1.15x |
| Medium (44KB) | 26.2 ms | 31.9 ms | 1.22x |
| Large (449KB) | 40.4 ms | 48.3 ms | 1.20x |
Dump (serialization):
| File Size | fast-yaml | js-yaml | Speedup |
|---|---|---|---|
| Small (502B) | 24.1 ms | 29.3 ms | 1.22x |
| Medium (44KB) | 27.1 ms | 34.9 ms | 1.29x |
| Large (449KB) | 50.7 ms | 72.1 ms | 1.42x |
Key findings:
- Consistent advantage: fast-yaml 1.15-1.42x faster across all scenarios
- Best performance: Large file dump operations (1.42x speedup)
- V8 JIT competitive: js-yaml benefits from TurboFan optimization, reducing speedup vs pure Python
- Real-world servers: In persistent processes without startup overhead, expect 2-4x speedup
| File Size | fast-yaml | yamlfmt | Result |
|---|---|---|---|
| Small (502 bytes) | 1.7 ms | 3.1 ms | 1.80x faster ✓ |
| Medium (45 KB) | 2.5 ms | 2.9 ms | 1.19x faster ✓ |
| Large (460 KB) | 8.4 ms | 2.9 ms | yamlfmt 2.88x faster |
| Workload | fast-yaml (parallel) | yamlfmt (sequential) | Speedup |
|---|---|---|---|
| 50 files (26 KB) | 4.3 ms | 10.3 ms | 2.40x faster ✓ |
| 200 files (204 KB) | 8.0 ms | 52.7 ms | 6.63x faster ✓ |
| 500 files (1 MB) | 15.5 ms | 244.7 ms | 15.77x faster ⚡ |
| 1000 files (1 MB) | 23.4 ms | 323.4 ms | 13.80x faster ⚡ |
Key takeaway: Batch mode with parallel workers provides 6-15x speedup on multi-file operations, making it ideal for formatting entire codebases.
# Run benchmarks
bash benches/comparison/scripts/run_python_benchmark.sh # Python API
bash benches/comparison/scripts/run_nodejs_benchmark.sh # Node.js API
bash benches/comparison/scripts/run_batch_benchmark.sh # CLI batch modeTest environment: macOS 14, Apple M3 Pro (12 cores), fast-yaml 0.4.1, PyYAML 6.0.3, js-yaml 4.1.1, Node.js 25.2.1, yamlfmt 0.21.0
Differences from PyYAML (YAML 1.1)
| Feature | PyYAML (YAML 1.1) | fast-yaml (YAML 1.2.2) |
|---|---|---|
yes/no |
True/False |
"yes"/"no" (strings) |
on/off |
True/False |
"on"/"off" (strings) |
014 (octal) |
12 |
14 (decimal) |
0o14 (octal) |
Error | 12 |
fast_yaml.safe_load("yes") # "yes" (string, not True!)
fast_yaml.safe_load("0o14") # 12 (octal)
fast_yaml.safe_load("014") # 14 (decimal, NOT octal!)Loading YAML
# Single document
data = fast_yaml.safe_load(yaml_string)
# Multiple documents
for doc in fast_yaml.safe_load_all(yaml_string):
print(doc)
# PyYAML-compatible
data = fast_yaml.load(yaml_string, Loader=fast_yaml.SafeLoader)Dumping YAML
yaml_str = fast_yaml.safe_dump(data)
# With options
yaml_str = fast_yaml.dump(
data,
indent=2,
width=80,
explicit_start=True,
sort_keys=False,
)
# Multiple documents
yaml_str = fast_yaml.safe_dump_all([doc1, doc2, doc3])Type mappings
| YAML Type | Python Type |
|---|---|
null, ~ |
None |
true, false |
bool |
123, 0x1F, 0o17 |
int |
1.23, .inf, .nan |
float |
"string", 'string' |
str |
[a, b, c] |
list |
{a: 1, b: 2} |
dict |
Input validation prevents denial-of-service attacks.
Security limits
| Limit | Default | Configurable |
|---|---|---|
| Max input size | 100 MB | Yes (up to 1GB) |
| Max documents | 100,000 | Yes (up to 10M) |
| Max threads | 128 | Yes |
Project structure
fast-yaml/
├── crates/
│ ├── fast-yaml-core/ # Core YAML parser/emitter
│ ├── fast-yaml-linter/ # Linting engine
│ └── fast-yaml-parallel/ # Multi-threaded processing
├── python/ # PyO3 Python bindings
├── nodejs/ # NAPI-RS Node.js bindings
└── Cargo.toml # Workspace manifest
Technology stack
| Component | Library |
|---|---|
| YAML Parser | saphyr |
| Python Bindings | PyO3 |
| Node.js Bindings | NAPI-RS |
| Parallelism | Rayon |
Rust 2024 Edition • Python 3.10+ • Node.js 20+
Contributions welcome! All PRs must pass CI checks:
cargo +nightly fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspaceWhy not just use PyYAML?
PyYAML is excellent. Use fast-yaml when you need performance (5-10x faster), YAML 1.2.2 compliance, built-in linting, or parallel processing.
Is this a drop-in replacement?
For safe_* functions, yes. Just change import yaml to import fast_yaml as yaml. Note that YAML 1.2.2 has different boolean/octal handling.
When should I use parallel processing?
Document-level parallelism (parse_parallel() in Python/Node.js):
- Use for single large files with multiple
---separated documents - File size > 1MB with dozens/hundreds of documents
- Example: Log files, data dumps
File-level parallelism (CLI batch mode):
- Use for processing multiple separate files
- Example:
fy format -i -j 8 src/for entire directories
For single-document files, use safe_load().
Licensed under MIT or Apache-2.0 at your option.