Skip to content

Parse YAML at Rust speed. Full 1.2.2 spec, built-in linter, parallel processing. Native bindings for Python & Node.js.

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
Notifications You must be signed in to change notification settings

bug-ops/fast-yaml

fast-yaml

CI Status codecov Crates.io docs.rs PyPI npm License

High-performance YAML 1.2.2 parser for Python and Node.js, powered by Rust.

Drop-in replacement for PyYAML and js-yaml. Matches or beats PyYAML C on small/medium files, 2-4x faster than pure Python, 1.2-1.4x faster than js-yaml. Full YAML 1.2.2 Core Schema compliance, comprehensive linting, and multi-threaded parallel processing.

Important

YAML 1.2.2 Compliance — Unlike PyYAML (YAML 1.1), fast-yaml follows the modern YAML 1.2.2 specification. This means yes/no/on/off are strings, not booleans.

Installation

# Python
pip install fastyaml-rs

# Node.js
npm install fastyaml-rs

# CLI
cargo install fast-yaml-cli

Warning

Requires Rust 1.88+, Python 3.10+ or Node.js 20+

Build from source
git clone https://github.com/bug-ops/fast-yaml.git
cd fast-yaml

# Python
uv sync && uv run maturin develop

# Node.js
cd nodejs && npm install && npm run build

Architecture

fast-yaml is organized as a modular Rust workspace with clear separation of concerns:

fast-yaml/
├── crates/fast-yaml-core/     # Parser + Emitter + Streaming Formatter
├── crates/fast-yaml-linter/   # Linter + Diagnostic Formatters
├── crates/fast-yaml-parallel/ # Multi-threaded processing
├── python/                    # PyO3 bindings
└── nodejs/                    # NAPI-RS bindings

Core Components

Component Location Input Output Use Case
Parser fast-yaml-core YAML text Value (DOM) Deserialize YAML to data structures
Emitter fast-yaml-core Value (DOM) YAML text Serialize data structures to YAML
Streaming Formatter fast-yaml-core Parser events YAML text Format YAML without building DOM
Linter fast-yaml-linter YAML text Vec<Diagnostic> Validate YAML against rules
Parallel Processor fast-yaml-parallel YAML files/streams BatchResult Parallel processing at document and file level

Tip

Parser vs Streaming Formatter: Parser builds a full DOM (use for data manipulation), Streaming Formatter processes events directly (use for formatting/conversion).

Tip

Linter vs Diagnostic Formatter: Linter validates YAML and produces diagnostics, Diagnostic Formatter renders them for display (rustc-style text, JSON, SARIF).

Parallelism Types

Type API Use Case
Document-level parse_parallel() Parse multi-document YAML streams
File-level process_files(), FileProcessor Process multiple files in parallel
CLI batch mode fy format -j 8 dir/ Format directories with parallel workers

Note

All parallelism is now unified in the fast-yaml-parallel crate. CLI batch mode and FFI bindings use this single implementation.

Quick Start

Python

import fast_yaml

data = fast_yaml.safe_load("""
name: fast-yaml
features: [fast, safe, yaml-1.2.2]
""")

yaml_str = fast_yaml.safe_dump(data)

Tip

Migrating from PyYAML? Just change your import: import fast_yaml as yaml

Node.js

import { safeLoad, safeDump } from 'fastyaml-rs';

const data = safeLoad(`name: fast-yaml`);
const yamlStr = safeDump(data);

CLI

# Single file operations
fy parse config.yaml           # Validate syntax
fy format -i config.yaml       # Format in-place
fy convert json config.yaml    # YAML → JSON
fy lint config.yaml            # Lint with diagnostics

# Batch mode (directories, globs, multiple files)
fy format -i src/              # Format entire directory
fy format -i "**/*.yaml"       # Format with glob pattern
fy format -i -j 8 project/     # Parallel processing (8 workers)
fy lint --exclude "tests/**" . # Lint all except tests

Tip

Batch mode activates automatically for directories, globs, or multiple files. Supports parallel processing, include/exclude patterns, and respects .gitignore.

Features

  • High Performance — Matches PyYAML C on small/medium files, 2-4x faster than pure Python
  • YAML 1.2.2 — Full Core Schema compliance
  • Drop-in API — Compatible with PyYAML/js-yaml
  • Batch Processing — Multi-file operations with parallel workers, glob patterns, .gitignore support
  • Linting — Rich diagnostics with line/column tracking
  • Parallel — Multi-threaded processing for large files
  • Safe — Memory-safe Rust with minimal unsafe (FFI boundaries only, explicitly documented)

Tip

Parallel processing provides 3-6x speedup on 4-8 core systems for multi-document files.

Feature details

Linting

from fast_yaml._core.lint import lint

diagnostics = lint("key: value\nkey: duplicate")
for diag in diagnostics:
    print(f"{diag.severity}: {diag.message} at line {diag.span.start.line}")

Parallel Processing (Document-Level)

from fast_yaml._core.parallel import parse_parallel, ParallelConfig

# Parse ONE file with MULTIPLE documents in parallel
multi_doc_yaml = "---\nfoo: 1\n---\nbar: 2\n---\nbaz: 3"
config = ParallelConfig(thread_count=4, max_input_size=100*1024*1024)
docs = parse_parallel(multi_doc_yaml, config)  # 3 documents parsed in parallel

[!NOTE] This is document-level parallelism (parsing documents inside one file). For file-level parallelism (processing multiple files), use CLI batch mode: fy format -i -j 8 directory/

Performance

Note

Three separate benchmark suites: Python API (vs PyYAML), Node.js API (vs js-yaml), and CLI Batch Mode (vs yamlfmt).

Note

Process startup overhead (~15ms for Python, ~20-25ms for Node.js) affects small file benchmarks. In long-running servers (persistent processes), speedups would be 2-4x higher.

Tip

Batch mode is where fast-yaml excels with parallel processing. Use -j to specify worker count.

Benchmark results

Python API vs PyYAML

Parse (loading):

File Size fast-yaml PyYAML (C) PyYAML (pure) vs C vs pure
Small (502B) 15.5 ms 20.2 ms 20.8 ms 1.30x 1.34x
Medium (44KB) 26.3 ms 26.4 ms 61.2 ms 1.00x 2.33x
Large (449KB) 130.3 ms 79.3 ms 429.6 ms 0.61x 3.30x

Dump (serialization):

File Size fast-yaml PyYAML (C) PyYAML (pure) vs C vs pure
Small (502B) 15.7 ms 20.8 ms 21.2 ms 1.33x 1.35x
Medium (44KB) 31.6 ms 31.7 ms 82.7 ms 1.00x 2.62x
Large (449KB) 177.6 ms 131.1 ms 653.8 ms 0.74x 3.68x

Key findings:

  • Small/Medium files: fast-yaml matches or beats PyYAML C (1.0-1.3x speedup)
  • Pure Python: fast-yaml consistently 1.3-3.7x faster across all sizes
  • Large files: PyYAML C optimized for single large files; use fast-yaml's parallel mode for multi-document streams

Full benchmarks: benches/comparison

Node.js API vs js-yaml (Apple M3 Pro, 12 cores)

Parse (loading):

File Size fast-yaml js-yaml Speedup
Small (502B) 24.4 ms 28.1 ms 1.15x
Medium (44KB) 26.2 ms 31.9 ms 1.22x
Large (449KB) 40.4 ms 48.3 ms 1.20x

Dump (serialization):

File Size fast-yaml js-yaml Speedup
Small (502B) 24.1 ms 29.3 ms 1.22x
Medium (44KB) 27.1 ms 34.9 ms 1.29x
Large (449KB) 50.7 ms 72.1 ms 1.42x

Key findings:

  • Consistent advantage: fast-yaml 1.15-1.42x faster across all scenarios
  • Best performance: Large file dump operations (1.42x speedup)
  • V8 JIT competitive: js-yaml benefits from TurboFan optimization, reducing speedup vs pure Python
  • Real-world servers: In persistent processes without startup overhead, expect 2-4x speedup

CLI Single-File vs yamlfmt (Apple M3 Pro, 12 cores)

File Size fast-yaml yamlfmt Result
Small (502 bytes) 1.7 ms 3.1 ms 1.80x faster
Medium (45 KB) 2.5 ms 2.9 ms 1.19x faster
Large (460 KB) 8.4 ms 2.9 ms yamlfmt 2.88x faster

CLI Batch Mode vs yamlfmt

Workload fast-yaml (parallel) yamlfmt (sequential) Speedup
50 files (26 KB) 4.3 ms 10.3 ms 2.40x faster
200 files (204 KB) 8.0 ms 52.7 ms 6.63x faster
500 files (1 MB) 15.5 ms 244.7 ms 15.77x faster
1000 files (1 MB) 23.4 ms 323.4 ms 13.80x faster

Key takeaway: Batch mode with parallel workers provides 6-15x speedup on multi-file operations, making it ideal for formatting entire codebases.

# Run benchmarks
bash benches/comparison/scripts/run_python_benchmark.sh  # Python API
bash benches/comparison/scripts/run_nodejs_benchmark.sh  # Node.js API
bash benches/comparison/scripts/run_batch_benchmark.sh   # CLI batch mode

Test environment: macOS 14, Apple M3 Pro (12 cores), fast-yaml 0.4.1, PyYAML 6.0.3, js-yaml 4.1.1, Node.js 25.2.1, yamlfmt 0.21.0

YAML 1.2.2 Differences

Differences from PyYAML (YAML 1.1)
Feature PyYAML (YAML 1.1) fast-yaml (YAML 1.2.2)
yes/no True/False "yes"/"no" (strings)
on/off True/False "on"/"off" (strings)
014 (octal) 12 14 (decimal)
0o14 (octal) Error 12
fast_yaml.safe_load("yes")    # "yes" (string, not True!)
fast_yaml.safe_load("0o14")   # 12 (octal)
fast_yaml.safe_load("014")    # 14 (decimal, NOT octal!)

API Reference

Loading YAML
# Single document
data = fast_yaml.safe_load(yaml_string)

# Multiple documents
for doc in fast_yaml.safe_load_all(yaml_string):
    print(doc)

# PyYAML-compatible
data = fast_yaml.load(yaml_string, Loader=fast_yaml.SafeLoader)
Dumping YAML
yaml_str = fast_yaml.safe_dump(data)

# With options
yaml_str = fast_yaml.dump(
    data,
    indent=2,
    width=80,
    explicit_start=True,
    sort_keys=False,
)

# Multiple documents
yaml_str = fast_yaml.safe_dump_all([doc1, doc2, doc3])
Type mappings
YAML Type Python Type
null, ~ None
true, false bool
123, 0x1F, 0o17 int
1.23, .inf, .nan float
"string", 'string' str
[a, b, c] list
{a: 1, b: 2} dict

Security

Input validation prevents denial-of-service attacks.

Security limits
Limit Default Configurable
Max input size 100 MB Yes (up to 1GB)
Max documents 100,000 Yes (up to 10M)
Max threads 128 Yes

Project

Project structure
fast-yaml/
├── crates/
│   ├── fast-yaml-core/     # Core YAML parser/emitter
│   ├── fast-yaml-linter/   # Linting engine
│   └── fast-yaml-parallel/ # Multi-threaded processing
├── python/                 # PyO3 Python bindings
├── nodejs/                 # NAPI-RS Node.js bindings
└── Cargo.toml             # Workspace manifest
Technology stack
Component Library
YAML Parser saphyr
Python Bindings PyO3
Node.js Bindings NAPI-RS
Parallelism Rayon

Rust 2024 EditionPython 3.10+Node.js 20+

Contributing

Contributions welcome! All PRs must pass CI checks:

cargo +nightly fmt --all
cargo clippy --workspace --all-targets -- -D warnings
cargo nextest run --workspace

FAQ

Why not just use PyYAML?

PyYAML is excellent. Use fast-yaml when you need performance (5-10x faster), YAML 1.2.2 compliance, built-in linting, or parallel processing.

Is this a drop-in replacement?

For safe_* functions, yes. Just change import yaml to import fast_yaml as yaml. Note that YAML 1.2.2 has different boolean/octal handling.

When should I use parallel processing?

Document-level parallelism (parse_parallel() in Python/Node.js):

  • Use for single large files with multiple --- separated documents
  • File size > 1MB with dozens/hundreds of documents
  • Example: Log files, data dumps

File-level parallelism (CLI batch mode):

  • Use for processing multiple separate files
  • Example: fy format -i -j 8 src/ for entire directories

For single-document files, use safe_load().

License

Licensed under MIT or Apache-2.0 at your option.

About

Parse YAML at Rust speed. Full 1.2.2 spec, built-in linter, parallel processing. Native bindings for Python & Node.js.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Contributing

Security policy

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •