mojo-json

A high-performance JSON library for Mojo with GPU acceleration. Pure Mojo, no external dependencies.

Performance Highlights (Apple M2 Max)

Configuration	Throughput	Use Case
GPU Classification	2.1 GB/s	Large JSON (>64KB), Stage 1a only
GPU Full Stage 1	905 MB/s	Full structural extraction
Adaptive Parser	500-800 MB/s	Auto-selects best parser
NDJSON Zero-Copy	1,900+ MB/s	Line extraction
On-Demand	600+ MB/s	Sparse field access

GPU Acceleration (M2 Max)

GPU compute via Metal FFI - no Mojo Metal compiler needed:

JSON Size	GPU	CPU SIMD	Winner
16 KB	51 MB/s	500 MB/s	CPU
64 KB	281 MB/s	500 MB/s	Crossover
256 KB	1,007 MB/s	500 MB/s	GPU 2x
1 MB	2,094 MB/s	500 MB/s	GPU 4x

GPU provides 2 modes:

Simple classification (2.1 GB/s): Raw byte classification, no string filtering
Full Stage 1 (905 MB/s): Quote bitmap → string mask → structural extraction

from src.gpu_parser import parse_adaptive_gpu

# Auto-selects GPU (>64KB) or CPU (<64KB)
var tape = parse_adaptive_gpu(large_json)

Build GPU support: cd metal && ./build_all.sh

Standard Benchmark Results (M2 Max)

File	Size	V1	V4	Adaptive	Best For
twitter.json	617 KB	523 MB/s	469 MB/s	V1	Web API payloads
canada.json	2.2 MB	336 MB/s	446 MB/s	V4	GeoJSON (coordinates)
citm_catalog.json	1.7 MB	814 MB/s	748 MB/s	V1	Nested structures

NDJSON Processing (M2 Max)

Operation	Throughput	Speedup
Zero-copy extraction	1,900 MB/s	13.6x vs copying
Traditional extraction	140 MB/s	Baseline
Full parsing	60-65 MB/s	Parsing dominates

Features

GPU Acceleration - Metal compute shaders via FFI (2.5 GB/s for 1MB+)
Adaptive Parser - Auto-selects V1/V2/V4 based on content analysis (83-100% optimal)
Zero-Copy NDJSON - StringSlice extraction at 1.9 GB/s (13x faster)
SIMD Optimized - Vectorized structural scanning (500+ MB/s)
On-Demand Parsing - Parse-on-access for sparse field access (2x speedup)
Full JSON Spec - RFC 8259 compliant parser
JsonValue Variant - Type-safe representation of all JSON types
Unicode Support - Full handling including surrogate pairs
Detailed Errors - Parse errors with line/column information

Installation

Add to your pixi.toml:

[dependencies]
mojo-json = { path = "../mojo-json" }

Quick Start

Parsing JSON

from mojo_json import parse, JsonValue

# Parse a JSON string
var value = parse('{"name": "Alice", "age": 30, "active": true}')

# Access values
print(value["name"].as_string())  # Alice
print(value["age"].as_int())      # 30
print(value["active"].as_bool())  # True

# Check types before accessing
if value["name"].is_string():
    print("Name is a string!")

Creating JSON Values

from mojo_json import JsonValue, JsonObject, JsonArray

# Primitives
var null_val = JsonValue.null()
var bool_val = JsonValue.from_bool(True)
var int_val = JsonValue.from_int(42)
var float_val = JsonValue.from_float(3.14159)
var str_val = JsonValue.from_string("hello")

# Arrays
var arr = JsonArray()
arr.append(JsonValue.from_int(1))
arr.append(JsonValue.from_int(2))
arr.append(JsonValue.from_int(3))
var arr_val = JsonValue.from_array(arr)

# Objects
var obj = JsonObject()
obj["name"] = JsonValue.from_string("Bob")
obj["age"] = JsonValue.from_int(25)
obj["scores"] = arr_val
var obj_val = JsonValue.from_object(obj)

Serializing JSON

from mojo_json import serialize, serialize_pretty

var obj = JsonObject()
obj["message"] = JsonValue.from_string("Hello, World!")
obj["count"] = JsonValue.from_int(42)
var value = JsonValue.from_object(obj)

# Compact output (default)
print(serialize(value))
# {"message":"Hello, World!","count":42}

# Pretty-printed output
print(serialize_pretty(value))
# {
#   "message": "Hello, World!",
#   "count": 42
# }

# Custom indentation
print(serialize_pretty(value, "    "))  # 4 spaces

Error Handling

from mojo_json import parse_safe, JsonParseError

# Safe parsing (no exceptions)
var result = parse_safe('{"invalid": }')
if result.get[2, Bool]():
    var value = result.get[0, JsonValue]()
    # Use value
else:
    var error = result.get[1, JsonParseError]()
    print(error.format())
    # JSON parse error at line 1, column 13: Expected value

# With try/except
try:
    var value = parse('{"invalid": }')
except e:
    print("Parse error:", e)

Adaptive Parser (Auto-Select Best Parser)

from mojo_json import parse_adaptive, analyze_json_content

# Automatically selects V1, V2, or V4 based on content
var tape = parse_adaptive(large_json)

# Analyze content to see what parser will be selected
var profile = analyze_json_content(json_string)
print(profile)
# JsonContentProfile(quotes=8%, digits=0%, structural=5%, recommended=V1)

# Decision logic:
# - >20% digits → V4 (number-heavy, like GeoJSON coordinates)
# - >3% quotes  → V1 (string-heavy, like API responses)
# - Otherwise   → V2 (balanced default)

NDJSON Zero-Copy Processing

from mojo_json import (
    extract_line_slices,  # Zero-copy (recommended)
    extract_lines,        # With copying
    parse_to_tape,
    StringSlice,
    SliceList,
)

var ndjson = """
{"name": "Alice", "age": 30}
{"name": "Bob", "age": 25}
{"name": "Charlie", "age": 35}
"""

# Zero-copy extraction (13x faster) - recommended
var slices = extract_line_slices(ndjson)
for i in range(len(slices)):
    var line = slices[i]  # StringSlice - no allocation
    # Only allocate string when actually parsing
    var tape = parse_to_tape(line.to_string())
    # Process tape...

# Traditional extraction (slower, copies each line)
var lines = extract_lines(ndjson)
for i in range(len(lines)):
    var tape = parse_to_tape(lines[i])

On-Demand Parsing (Sparse Access)

from mojo_json import parse_on_demand

# Only builds structural index - values parsed on access
var doc = parse_on_demand(huge_json)

# Access specific fields (only these get parsed)
var name = doc.root()["user"]["name"].get_string()
var age = doc.root()["user"]["age"].get_int()

# 50+ other fields in the JSON are never parsed

API Reference

JsonValue

The core type representing any JSON value.

Constructors

Method	Description
`JsonValue.null()`	Create null value
`JsonValue.from_bool(Bool)`	Create boolean value
`JsonValue.from_int(Int64)`	Create integer value
`JsonValue.from_int(Int)`	Create integer value
`JsonValue.from_float(Float64)`	Create float value
`JsonValue.from_string(String)`	Create string value
`JsonValue.from_array(JsonArray)`	Create array value
`JsonValue.from_object(JsonObject)`	Create object value

Type Checking

Method	Returns
`is_null()`	`Bool`
`is_bool()`	`Bool`
`is_int()`	`Bool`
`is_float()`	`Bool`
`is_number()`	`Bool` (int or float)
`is_string()`	`Bool`
`is_array()`	`Bool`
`is_object()`	`Bool`
`type_name()`	`String`

Value Access

Method	Returns
`as_bool()`	`Bool`
`as_int()`	`Int64`
`as_float()`	`Float64`
`as_string()`	`String`
`as_array()`	`JsonArray`
`as_object()`	`JsonObject`

Array/Object Operations

Method	Description
`len()`	Get length of array or object
`[index]`	Get array element by index
`[key]`	Get object value by key
`contains(key)`	Check if object contains key
`keys()`	Get all object keys

Parser

Functions

# Parse JSON string (raises on error)
fn parse(source: String) raises -> JsonValue

# Parse without exceptions
fn parse_safe(source: String) -> Tuple[JsonValue, JsonParseError, Bool]

# Parse with custom configuration
fn parse_with_config(source: String, config: ParserConfig) raises -> JsonValue

ParserConfig

var config = ParserConfig(
    max_depth=1000,           # Maximum nesting depth
    allow_trailing_comma=False,  # Allow trailing commas (non-standard)
    allow_comments=False,     # Allow // and /* */ comments (non-standard)
)

Serializer

Functions

# Compact serialization
fn serialize(value: JsonValue) -> String

# Pretty-printed serialization
fn serialize_pretty(value: JsonValue, indent: String = "  ") -> String

# Serialize with custom configuration
fn serialize_with_config(value: JsonValue, config: SerializerConfig) -> String

# Alias for serialize
fn to_json(value: JsonValue) -> String

SerializerConfig

var config = SerializerConfig(
    indent="  ",              # Indentation string (empty for compact)
    sort_keys=False,          # Sort object keys alphabetically
    escape_unicode=False,     # Escape non-ASCII as \uXXXX
    escape_forward_slash=False,  # Escape / for HTML embedding
)

JsonParseError

struct JsonParseError:
    var message: String      # Error description
    var position: Int        # Byte offset in source
    var line: Int           # Line number (1-indexed)
    var column: Int         # Column number (1-indexed)

    fn format() -> String   # Human-readable error message
    fn format_with_context(source: String, context_chars: Int = 20) -> String

Type Aliases

alias JsonArray = List[JsonValue]
alias JsonObject = Dict[String, JsonValue]

Escape Sequences

The parser handles all standard JSON escape sequences:

Escape	Character
`\"`	Double quote
`\\`	Backslash
`\/`	Forward slash
`\b`	Backspace
`\f`	Form feed
`\n`	Newline
`\r`	Carriage return
`\t`	Tab
`\uXXXX`	Unicode code point

Unicode surrogate pairs (\uD800-\uDFFF) are properly combined.

Performance Notes

Single-pass parser with no backtracking
O(n) time complexity for parsing
Minimal memory allocations
No regex or external dependencies

GPU Acceleration

For large JSON files (>64KB), GPU acceleration provides massive throughput improvements.

Build Metal Library

cd metal
./build_all.sh

This compiles:

json_classify.metallib - GPU compute kernels
libmetal_bridge.dylib - C bridge for Mojo FFI

Usage

from src.metal_ffi import MetalJsonClassifier, is_metal_available

fn main() raises:
    if not is_metal_available():
        print("Metal GPU not available")
        return

    var classifier = MetalJsonClassifier()
    print("GPU:", classifier.device_name())

    # Classify JSON characters on GPU
    var json = '{"name": "test", "values": [1, 2, 3]}'
    var classifications = classifier.classify(json)

    # Classifications: 0=whitespace, 1={, 2=}, 3=[, 4=], 5=", 6=:, 7=,

Benchmark Results (Apple M3 Ultra)

JSON Size	GPU Throughput	Notes
16 KB	46 MB/s	GPU overhead dominates
64 KB	358 MB/s	Crossover point
256 KB	1,357 MB/s	GPU significantly faster
1 MB	4,352 MB/s	Full GPU utilization

Why the variation? GPU kernel launches have ~15μs overhead. For small JSON, this overhead dominates. At 64KB (the "crossover point"), GPU starts winning. At 1MB, the GPU processes data so fast that we're approaching memory bandwidth limits - the M3 Ultra has 800 GB/s unified memory bandwidth, and we're achieving 4.3 GB/s with read+write+FFI overhead, which is excellent utilization for a classification workload.

Practical guidance:

< 64KB: Use CPU SIMD (faster due to no GPU launch overhead)
64KB - 256KB: GPU provides moderate speedup (2-3x)
> 256KB: GPU provides significant speedup (4-8x vs CPU SIMD)
> 1MB: GPU essential for real-time processing

Kernel Variants

Four GPU kernel implementations optimized for different tradeoffs:

Kernel	Throughput	Strategy
`lookup_vec8`	1,401 MB/s	Lookup table + 8 bytes/thread (default, fastest)
`lookup`	1,365 MB/s	Lookup table + 1 byte/thread
`contiguous`	1,294 MB/s	If-else branches per character
`vec4`	1,262 MB/s	4 bytes/thread with branches

Why lookup_vec8 wins:

Lookup table - 256-byte table in GPU constant memory eliminates all branches
8 bytes/thread - Reduces thread dispatch overhead, better memory coalescing
Unrolled loop - 8 loads per thread amortizes instruction overhead

The lookup table approach converts character classification from branching code (unpredictable on GPU) to simple array indexing (single memory read).

Architecture

JSON String (>64KB)
    │
    ▼
┌─────────────────────────────────────────┐
│  GPU Stage 1a: Character Classification │
│  - Parallel: 1000s of threads           │
│  - Lookup table in constant memory      │
│  - 8 bytes per thread (vec8 kernel)     │
└─────────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────────┐
│  CPU Stage 1b: Structural Index         │
│  - Sequential string state tracking     │
│  - Handle escape sequences              │
└─────────────────────────────────────────┘
    │
    ▼
StructuralIndex → Stage 2 Parsing

Why FFI Instead of Mojo GPU?

Mojo 0.25.7's Metal compiler has a bug that crashes during metallib generation. This implementation bypasses it by:

Pre-compiling Metal shaders with Apple's xcrun metal tools
Using a C/Objective-C bridge for Metal API calls
Calling from Mojo via OwnedDLHandle FFI

See docs/GPU_ACCELERATION.md for technical details.

Examples

Working with Nested Data

var json = '''
{
    "users": [
        {"name": "Alice", "email": "alice@example.com"},
        {"name": "Bob", "email": "bob@example.com"}
    ],
    "total": 2
}
'''

var data = parse(json)

# Access nested values
for i in range(data["users"].len()):
    var user = data["users"][i]
    print(user["name"].as_string() + ": " + user["email"].as_string())

Building Complex JSON

# Build a response object
fn build_response(success: Bool, message: String, data: JsonValue) -> JsonValue:
    var obj = JsonObject()
    obj["success"] = JsonValue.from_bool(success)
    obj["message"] = JsonValue.from_string(message)
    obj["data"] = data
    obj["timestamp"] = JsonValue.from_int(get_current_timestamp())
    return JsonValue.from_object(obj)

Validation Pattern

fn validate_user(json: String) -> Bool:
    var result = parse_safe(json)
    if not result.get[2, Bool]():
        print("Invalid JSON:", result.get[1, JsonParseError]().format())
        return False

    var user = result.get[0, JsonValue]()

    if not user.is_object():
        print("Expected object")
        return False

    if not user.contains("name") or not user["name"].is_string():
        print("Missing or invalid 'name' field")
        return False

    if not user.contains("age") or not user["age"].is_int():
        print("Missing or invalid 'age' field")
        return False

    return True

License

Apache 2.0 License

Part of mojo-contrib

This library is part of mojo-contrib, a collection of pure Mojo libraries.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
docs		docs
examples		examples
metal		metal
neon		neon
src		src
tests		tests
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MISCBENCHMARKS.md		MISCBENCHMARKS.md
README.md		README.md
SECURITY.md		SECURITY.md
bench_adaptive.mojo		bench_adaptive.mojo
bench_ndjson.mojo		bench_ndjson.mojo
bench_on_demand.mojo		bench_on_demand.mojo
bench_parallel.mojo		bench_parallel.mojo
bench_parallel_stage1.mojo		bench_parallel_stage1.mojo
bench_prefetch.mojo		bench_prefetch.mojo
bench_standard.mojo		bench_standard.mojo
bench_v5.mojo		bench_v5.mojo
benchmark_lazy_vs_eager.mojo		benchmark_lazy_vs_eager.mojo
benchmark_phase2.mojo		benchmark_phase2.mojo
benchmark_simd_width.mojo		benchmark_simd_width.mojo
benchmark_tape.mojo		benchmark_tape.mojo
pixi.toml		pixi.toml
profile_stages.mojo		profile_stages.mojo
profile_tape.mojo		profile_tape.mojo
test_compressed_tape.mojo		test_compressed_tape.mojo
test_gpjson_pipeline.mojo		test_gpjson_pipeline.mojo
test_gpu_structural_scan.mojo		test_gpu_structural_scan.mojo
test_json_main.mojo		test_json_main.mojo
test_json_pointer.mojo		test_json_pointer.mojo
test_lazy.mojo		test_lazy.mojo
test_lazy_parser.mojo		test_lazy_parser.mojo
test_metal_ffi.mojo		test_metal_ffi.mojo
test_streaming.mojo		test_streaming.mojo
test_tape_parser.mojo		test_tape_parser.mojo

License

atsentia/mojo-json

Folders and files

Latest commit

History

Repository files navigation

mojo-json

Performance Highlights (Apple M2 Max)

GPU Acceleration (M2 Max)

Standard Benchmark Results (M2 Max)

NDJSON Processing (M2 Max)

Features

Installation

Quick Start

Parsing JSON

Creating JSON Values

Serializing JSON

Error Handling

Adaptive Parser (Auto-Select Best Parser)

NDJSON Zero-Copy Processing

On-Demand Parsing (Sparse Access)

API Reference

JsonValue

Constructors

Type Checking

Value Access

Array/Object Operations

Parser

Functions

ParserConfig

Serializer

Functions

SerializerConfig

JsonParseError

Type Aliases

Escape Sequences

Performance Notes

GPU Acceleration

Build Metal Library

Usage

Benchmark Results (Apple M3 Ultra)

Kernel Variants

Architecture

Why FFI Instead of Mojo GPU?

Examples

Working with Nested Data

Building Complex JSON

Validation Pattern

License

Part of mojo-contrib

About

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages