go-openzl

Go bindings for Meta's OpenZL format-aware compression framework

OpenZL is Meta's high-performance, format-aware compression library that delivers compression ratios comparable to specialized compressors while maintaining high speed. This project provides idiomatic Go bindings to make OpenZL accessible to the Go ecosystem.

What is OpenZL?

OpenZL is a novel data compression framework that:

Optimizes for your data format - Takes a description of your data and builds a specialized compressor
Maintains high speed - Performance comparable to dedicated tools without sacrificing compression ratios
Uses a universal decoder - All specialized compressors work with a single decoder
Self-describing format - Compressed data includes metadata about its structure
Production-proven - Used extensively in production at Meta

Perfect for:

AI/ML workloads with specialized datasets
High-throughput data processing pipelines
Structured data (logs, telemetry, database exports)
Network protocol optimization
Type-aware storage systems

Status

✅ v0.4.0 Complete - Dictionary Support with Public Training API!

This project is in active development:

✅ Phase 1: MVP with simple Compress/Decompress API
✅ Phase 2: Context API with 20-50% better performance
✅ Phase 3: Typed compression for structured data (2-50x better ratios!)
✅ Phase 4: Streaming API with io.Reader/Writer (2287 MB/s throughput!)
✅ Phase 5: Production hardening (benchmarks, edge cases, CI/CD)
✅ Phase 6: Pure Go Implementation (Zero-CGO compression AND decompression!)
✅ Phase 7: Dictionary Support (49× compression on CSV data!)

Current Status:

✅ One-shot compression/decompression API
✅ Reusable Compressor and Decompressor types
✅ Thread-safe concurrent operations
✅ Typed compression with Go generics (50x better ratios!)
✅ Streaming API with io.Reader/Writer interfaces
✅ Support for all numeric types (int8-64, uint8-64, float32/64)
✅ Automatic buffering and frame management
✅ File compression/decompression support
✅ Options pattern for configuration
✅ Dictionary-based LZ77 compression (49× on CSV!)
✅ Public dictionary training API (dicttrainer package)
✅ Pre-trained dictionaries for CSV, JSON, source code
✅ Comprehensive test coverage (100% passing - 300+ tests)
✅ Fuzz testing (8.2M+ executions, zero crashes)
✅ Edge case coverage (100MB files, 10K concurrent ops)
✅ Performance benchmarks vs gzip/zstd
✅ Complete godoc documentation (100% coverage)
✅ CI/CD with GitHub Actions
✅ Pure Go compression AND decompression (complete end-to-end Pure Go support!)

We're looking for contributors! See Contributing below.

Features

Phase 1: MVP ✅ Complete

✅ Simple Compress() and Decompress() functions
✅ Basic compression and decompression
✅ Error handling and reporting
✅ Frame introspection (size queries)
✅ Comprehensive test coverage
✅ Example programs

Phase 2: Context API ✅ Complete

✅ Reusable Compressor and Decompressor types
✅ Thread-safe concurrent operations (verified with race detector)
✅ Options pattern framework for configuration
✅ 20-50% performance improvement over one-shot API
✅ Extensive benchmarks and performance testing
✅ Context example program

Phase 3: Typed API ✅ Complete

✅ TypedRef creation and management
✅ Typed numeric compression/decompression
✅ Type-safe API using Go generics
✅ Support for all numeric types (int8-64, uint8-64, float32/64)
✅ Context API integration for typed compression
✅ 2-50x better compression ratios on numeric data

Phase 4: Streaming API ✅ Complete

✅ io.Reader/io.Writer interfaces
✅ Streaming compression/decompression
✅ Automatic buffer management
✅ Large file support (tested with 100MB files)
✅ Configurable frame sizes
✅ Reset and reuse support
✅ 2.3 GB/s throughput

Phase 5: Production Hardening ✅ Complete

✅ Fuzz testing (2M+ executions, zero crashes)
✅ Edge case coverage (truncated frames, large files, 10K concurrent ops)
✅ Benchmark comparisons vs gzip/zstd
✅ Migration guide from other compressors
✅ Complete godoc documentation (100% coverage)
✅ CI/CD for multiple platforms (Linux, macOS)
✅ golangci-lint with 30+ linters
✅ v0.1.0 release

Phase 6: Pure Go Implementation ✅ COMPLETE! (v0.3.3)

Goal: Eliminate CGO dependency for compression AND decompression, enabling faster builds and cross-compilation.

Status: ✅ COMPLETE - Frame v22 with native multi-stage pipelines!

Latest (v0.3.3): 🔥 Frame Format v22 with native LZ77→Huffman pipelines

27-35× compression ratios on JSON and text data!
Single frame instead of double-wrapping (~30-60 bytes overhead saved)
Stores intermediate node sizes for proper multi-stage decompression
Fully backward compatible with v21 frames
CompressSmart() automatically uses best pipeline

What's Implemented:

✅ Pure Go Compression with Multi-Stage Pipelines (v0.3.3)
- CompressSmart() - Intelligent codec selection with automatic pipelines
- 27.64× compression on JSON (12KB → 460 bytes) 🔥
- 35.25× compression on repeated text (5KB → 139 bytes) 🔥
- 20× compression on sparse data (1KB → 50 bytes) 🔥
- Native LZ77→Huffman pipelines in single Frame v22
- Smart fallback: only uses multi-stage if it helps
- Compress() with Huffman-only (2.59x on text, legacy)
- CompressInt64/Float64/String() with Delta encoding (2.74x)
- 2.8 GB/s compression speed
- All tests passing (100% pass rate)
✅ Pure Go Decompression (Complete decoder)
- Frame v22 support - Reads intermediate node sizes
- Reverse execution - Properly decodes multi-stage pipelines
- Frame parser (79 tests, 1.6 GB/s)
- Graph executor (42 tests, 16.2 GB/s)
- 10 codecs: Identity, Constant, Delta, ZigZag, Bitpack, FSE, Huffman, LZ77, RLE, Transpose
- Multi-stage pipelines (v0.3.3):
  - LZ77→Huffman: 27.64× on JSON (Frame v22) 🔥
  - LZ77→Huffman: 35.25× on repeated text (Frame v22) 🔥
  - RLE→Huffman: 20× on sparse data
  - Delta→Huffman: 2.78x on timestamps
- Typed API: DecompressInt64/Float64/etc. (17 tests, 490 MB/s)
- Streaming API: purgo.Reader with io.Reader interface (12 tests, 2.3 GB/s)
- 280+ tests (100% passing)

Phase 7: Dictionary Support ✅ COMPLETE! (v0.4.0)

Goal: Add specialized dictionary support to LZ77 with external dictionary API for batch compression.

Status: ✅ COMPLETE - External Dictionary API Working!

What's Implemented:

✅ Dictionary-Based LZ77 Compression
- 47.76× compression on 100MB repetitive data (best case) 🔥
- Dictionary LZ77 → Huffman pipeline (Frame v22 multi-stage)
- Type 0/1/2 tokens (Literal, Window Match, Dictionary Match)
- Efficient linear search with 3-byte prefix optimization
- NewLZ77WithDict() constructor
- Full roundtrip encode/decode support
- Params-based dictionary passing for graph execution
✅ Public Dictionary Training API (dicttrainer package)
- Train custom dictionaries on your data
- Smart sampling (1M samples for fast training)
- Compression value scoring: score = frequency × (length - 5)
- Greedy non-overlapping pattern selection
- Custom pattern injection
- Configurable pattern lengths (3-32 bytes default)
- Statistics API for corpus analysis
- ~50 MB/s training speed
✅ External Dictionary API (purgo package) - NEW!
- CompressWithDict() - Compresses WITHOUT embedding dictionary
- DecompressWithDict() - Requires external dictionary file
- 46.76× compression on batch workloads (10 files) 🔥
- 28% better than no dictionary on batch compression
- Dictionary stored once, reused for all files (like a "library" file)
- Perfect for compressing many similar files
✅ Test Coverage
- 40+ new tests (100% passing)
- 10 dictionary LZ77 tests
- 11 dictionary trainer tests
- 7 external dictionary tests (batch compression)
- 4 error handling tests
- Comprehensive documentation

Batch Compression Results:

10 × 11KB CSV files (117KB total):

CompressSmart (no dict): 36.45× compression (baseline)
CompressWithDict (external): 46.76× compression ✅ 28% better!
Storage: 500-byte dictionary + 10 compressed files = 2.5KB total

Single File (20KB CSV):

CompressSmart (no dict): 9.72× compression
CompressWithDict (external): 10.04× compression ✅ 3% better!

Usage Example (External Dictionary):

import "github.com/boris-chu/go-openzl/dicttrainer"
import "github.com/boris-chu/go-openzl/purgo"

// Step 1: Train dictionary on representative data
trainer := dicttrainer.New()
trainer.AddFile("sample1.csv")
trainer.AddFile("sample2.csv")
dict := trainer.Train(500) // 500-byte dictionary
os.WriteFile("csv-dict.bin", dict, 0644)

// Step 2: Compress many files with same dictionary
dict, _ := os.ReadFile("csv-dict.bin")
for _, file := range filesToCompress {
    data, _ := os.ReadFile(file)
    compressed, _ := purgo.CompressWithDict(data, dict)
    os.WriteFile(file+".openzl", compressed, 0644)
}
// Dictionary overhead: 500 bytes total (stored ONCE!)

// Step 3: Decompress (dictionary required)
dict, _ := os.ReadFile("csv-dict.bin")
for _, file := range compressedFiles {
    compressed, _ := os.ReadFile(file)
    data, _ := purgo.DecompressWithDict(compressed, dict)
}

Key Features:

✅ Dictionary NOT embedded in compressed files (smaller output!)
✅ Dictionary stored separately (like a .dll or codec pack)
✅ 28% better compression on batch workloads
✅ All roundtrip tests passing
✅ Proper error handling (wrong dict, missing dict, etc.)

When to use:

✅ Compressing 10+ similar files (CSV, JSON, logs)
✅ Batch compression scenarios
✅ When dictionary can be shared/distributed once

When NOT to use:

❌ Single-file compression (use CompressSmart instead)
❌ Files too small (<1KB each)
❌ Cannot distribute dictionary file

Usage Examples:

// Pure Go compression AND decompression (no CGO!)
import "github.com/boris-chu/go-openzl/purgo"

// NEW v0.3.3: CompressSmart with automatic pipeline selection
compressed, _ := purgo.CompressSmart([]byte(`{"users":[...]}`))
// → 27.64× compression on JSON (automatic LZ77→Huffman pipeline!)

// Compress text with intelligent codec selection
compressed, _ := purgo.CompressSmart([]byte("repeated text pattern..."))
// → 35.25× compression (automatic multi-stage pipeline!)

// Legacy: Simple Huffman compression
compressed, _ := purgo.Compress([]byte("your CSV data here"))
// → 2.59× compression (Huffman-only)

// Compress numeric data (timestamps, IDs, sorted values)
compressed, _ := purgo.CompressInt64([]int64{1, 2, 3, 100, 101, 102})
// → 2.74× compression (Delta encoding)

// Decompress - simple one-liner!
data, _ := purgo.Decompress(compressed)
numbers, _ := purgo.DecompressInt64(compressed)
floats, _ := purgo.DecompressFloat64(compressed)

// Supports all numeric types:
// - int8, int16, int32, int64
// - uint8, uint16, uint32, uint64
// - float32, float64

// Streaming decompression (io.Reader interface)
file, _ := os.Open("data.zl")
reader, _ := purgo.NewReader(file)
defer reader.Close()

io.Copy(os.Stdout, reader) // Stream decompressed data!

// Or read incrementally:
buffer := make([]byte, 4096)
for {
    n, err := reader.Read(buffer)
    if err == io.EOF {
        break
    }
    process(buffer[:n])
}

Benefits of Pure Go Implementation (v0.3.3):

🚀 Faster builds: No C compilation (10x faster go build)
🌍 Easy cross-compilation: GOOS=windows go build just works
📦 Smaller binaries: No CGO overhead
🐛 Better debugging: Pure Go stack traces
⚡ Excellent performance: 2.8 GB/s compression, 2.3 GB/s decompression
🔥 Amazing compression: 27-35× on JSON/text (Frame v22 pipelines!)
💪 Production-ready: 280+ tests (100% passing), fuzz tested

JSON/Text Compression (v0.3.3 - ready for production!):

// Compress JSON with 27× compression ratio!
jsonData := []byte(`{"users":[...]}`)
compressed, _ := purgo.CompressSmart(jsonData)  // 27× compression!

// Compress CSV/text with 35× compression ratio!
textData := []byte("repeated text pattern...")
compressed, _ := purgo.CompressSmart(textData)  // 35× compression!

// Decompress later
original, _ := purgo.Decompress(compressed)

Test Coverage (Pure Go v0.3.3):

✅ 280+ total Pure Go tests (100% passing)
- Compression tests (encoder + multi-stage pipelines)
- Decompression tests (decoder + Frame v22)
- Frame writer tests (v21/v22 compatibility)
✅ Frame parser: 79 tests
✅ Codec system: 181 tests (10 codecs: Identity, Constant, Delta, ZigZag, Bitpack, FSE, Huffman, LZ77, RLE, Transpose)
✅ Graph executor: 42 tests
✅ Integration tests: 10 end-to-end pipeline tests
✅ Public API: 3 tests (compression + decompression)
✅ Fuzz testing: 8.2M+ executions (zero crashes)

Phase 7: Advanced Features (Planned - v1.1+)

See Advanced Features Roadmap below for Python/C++ feature parity plans.

Installation

go get github.com/boris-chu/go-openzl@v0.1.0

Or add to your go.mod:

require github.com/boris-chu/go-openzl v0.1.0

Requirements

Go 1.21 or later
CGO enabled
C11 compiler
C++17 compiler (for OpenZL library)

The OpenZL C library will be automatically built during installation.

Quick Start

Simple One-Shot API

package main

import (
    "fmt"
    "log"

    "github.com/borischu/go-openzl"
)

func main() {
    // Compress data (one-shot)
    input := []byte("Hello, OpenZL!")
    compressed, err := openzl.Compress(input)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Original size: %d bytes\n", len(input))
    fmt.Printf("Compressed size: %d bytes\n", len(compressed))

    // Decompress data (one-shot)
    decompressed, err := openzl.Decompress(compressed)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Decompressed: %s\n", decompressed)
}

Context API (Better Performance)

For repeated operations, use the Context API for 20-50% better performance:

package main

import (
    "fmt"
    "log"

    "github.com/borischu/go-openzl"
)

func main() {
    // Create reusable compressor
    compressor, err := openzl.NewCompressor()
    if err != nil {
        log.Fatal(err)
    }
    defer compressor.Close()

    // Create reusable decompressor
    decompressor, err := openzl.NewDecompressor()
    if err != nil {
        log.Fatal(err)
    }
    defer decompressor.Close()

    // Compress multiple messages (context reuse = faster!)
    messages := []string{"First message", "Second message", "Third message"}

    for _, msg := range messages {
        // Compress using reusable context
        compressed, err := compressor.Compress([]byte(msg))
        if err != nil {
            log.Fatal(err)
        }

        // Decompress using reusable context
        decompressed, err := decompressor.Decompress(compressed)
        if err != nil {
            log.Fatal(err)
        }

        fmt.Printf("Original: %s, Compressed: %d bytes\n", msg, len(compressed))
    }
}

Zero-Allocation API (Klaus Post Pattern)

For maximum performance with zero allocations, use the CompressTo API:

// Pre-allocate buffer once
dst := make([]byte, openzl.CompressBound(maxMessageSize))

// Process many messages with ZERO allocations!
for _, msg := range messages {
    n, err := compressor.CompressTo(dst, msg)
    if err != nil {
        log.Fatal(err)
    }
    // Use dst[:n] - no allocation!
    sendOverNetwork(dst[:n])
}

Performance: 0 B/op, 0 allocs/op (175k ops/sec, 159 MB/s)

Typed Compression (Phase 3)

OpenZL excels at compressing typed data - achieving 2-50x better compression ratios:

// Compress an array of integers (achieves much better compression!)
numbers := []int64{1, 2, 3, 4, 5, 100, 101, 102}
compressed, err := openzl.CompressNumeric(numbers)
if err != nil {
    log.Fatal(err)
}

// Decompress back to typed slice
decompressed, err := openzl.DecompressNumeric[int64](compressed)
if err != nil {
    log.Fatal(err)
}

// Use with context API for best performance
compressor, _ := openzl.NewCompressor()
defer compressor.Close()

compressed, err := openzl.CompressorCompressNumeric(compressor, numbers)

// Supports all numeric types
int32Data := []int32{1, 2, 3, 4, 5}
uint64Data := []uint64{100, 200, 300}
float64Data := []float64{1.1, 2.2, 3.3}

compressed1, _ := openzl.CompressNumeric(int32Data)
compressed2, _ := openzl.CompressNumeric(uint64Data)
compressed3, _ := openzl.CompressNumeric(float64Data)

Streaming API (Phase 4)

Stream large files without loading them entirely into memory:

// Compress a file
input, _ := os.Open("large-file.txt")
output, _ := os.Create("large-file.txt.zl")

writer, _ := openzl.NewWriter(output)
io.Copy(writer, input)  // Stream and compress
writer.Close()

// Decompress a file
compressedFile, _ := os.Open("large-file.txt.zl")
decompressed, _ := os.Create("large-file.txt.decompressed")

reader, _ := openzl.NewReader(compressedFile)
io.Copy(decompressed, reader)  // Stream and decompress
reader.Close()

// Custom frame size for different use cases
writer, _ := openzl.NewWriter(output, openzl.WithFrameSize(256*1024)) // 256KB frames

Performance: 2287 MB/s streaming compression throughput!

Performance

Benchmarked on Apple M4 Pro:

Phase 2 Context API (Reusable Contexts)

Compression: 327k ops/sec (3.6 μs/op)
Decompression: 2.2M ops/sec (545 ns/op)
Memory: 576 B/op compress, 16 B/op decompress

Phase 1 One-Shot API

Compression: 264k ops/sec (4.6 μs/op)
Decompression: 1.0M ops/sec (1.1 μs/op)
Memory: 584 B/op compress, 24 B/op decompress

Performance Improvement (Phase 2 vs Phase 1)

Compression: 21% faster with context reuse
Decompression: 49% faster with context reuse
Memory: Reduced allocations per operation

Compression Ratios (Observed)

Small text (11 bytes): 0.26x (expected header overhead)
Repeated data (400 bytes): 9.52x compression ratio
Large repeated data (45KB): 500x compression ratio
Unicode text: 0.37x (small data overhead)

Note: Compression ratios improve significantly with larger and more structured data.

Run benchmarks yourself:

go test -bench=. -benchmem

Architecture

Current (CGO-based)

┌─────────────────────────────────────────────────┐
│                Go API Layer                     │
│  - Idiomatic Go interfaces                      │
│  - io.Reader/Writer support                     │
│  - Type-safe generics                           │
│  - Concurrent processing                        │
└─────────────────────────────────────────────────┘
                     ↓
┌─────────────────────────────────────────────────┐
│                CGO Bindings                     │
│  - Thin wrapper over C API                      │
│  - Memory management                            │
│  - Error translation                            │
└─────────────────────────────────────────────────┘
                     ↓
┌─────────────────────────────────────────────────┐
│             OpenZL C Library                    │
│  - C11 core implementation                      │
│  - Format-aware compression                     │
│  - Universal decompressor                       │
└─────────────────────────────────────────────────┘

Future (Pure Go Decoder) 🚀

┌─────────────────────────────────────────────────┐
│                Go API Layer                     │
│  - Unified API for both compression paths      │
│  - Automatic fallback/selection                │
└─────────────────────────────────────────────────┘
           ↓                          ↓
  [Compression]              [Decompression]
           ↓                          ↓
┌──────────────────┐      ┌──────────────────────┐
│  CGO → C Library │      │   Pure Go Decoder    │
│  (Fast encoding) │      │   - Frame Parser ✅   │
│                  │      │   - Graph Executor ✅ │
│                  │      │   - Codecs (WIP)     │
│                  │      │   - No CGO needed!   │
└──────────────────┘      └──────────────────────┘

Benefits of Pure Go Decoder:
✅ Faster builds (no CGO)
✅ Easy cross-compilation
✅ Smaller binaries
✅ Better debugging
✅ Maintained compression performance via C library

Documentation

API Documentation - Full godoc reference
Migration Guide - Migrate from gzip/zstd to OpenZL
Klaus Post Improvements - Zero-allocation optimizations
Benchmarks - Performance comparisons vs gzip/zstd
Testing Results - Comprehensive test coverage
Examples - Working code examples

Upstream Documentation

Project Structure

go-openzl/
├── README.md           # This file
├── LICENSE             # BSD 3-Clause License
├── go.mod              # Go module definition
├── *.go                # Public API (CGO-based)
│   ├── compress.go     # One-shot compression
│   ├── compressor.go   # Reusable compressor
│   ├── decompressor.go # Reusable decompressor
│   ├── typed.go        # Typed compression
│   ├── reader.go       # Streaming reader
│   └── writer.go       # Streaming writer
├── internal/           # Pure Go decoder (in development)
│   ├── frame/          # Frame parser (Phase 1 ✅)
│   ├── codec/          # 10 codecs: Identity, Constant, Delta, ZigZag, Bitpack, FSE, Huffman, LZ77, RLE, Transpose ✅
│   └── graph/          # Graph executor (Phase 2 ✅)
├── examples/           # Usage examples
│   ├── simple/         # Basic compression example
│   ├── context/        # Context API example
│   ├── typed/          # Typed compression example
│   └── streaming/      # Streaming API example
├── documentation/      # Additional documentation
└── vendor/             # Vendored OpenZL C library

Contributing

We welcome contributions! This project is in its early stages and there's plenty to do.

Areas Where We Need Help

Core Implementation: CGO bindings for OpenZL C API
Testing: Comprehensive test coverage and fuzzing
Documentation: Examples, guides, and API docs
Performance: Benchmarking and optimization
CI/CD: GitHub Actions workflows for multiple platforms
Packaging: Cross-platform build and distribution

Getting Started

Fork the repository
Read the OpenZL documentation to understand the library
Check the issues for tasks
Join the discussion in issues or discussions
Submit a PR with your contribution

Development Setup

# Clone the repository
git clone https://github.com/yourusername/go-openzl.git
cd go-openzl

# Initialize submodules (for OpenZL C library)
git submodule update --init --recursive

# Build the OpenZL C library
make build-openzl

# Run tests
go test ./...

# Run benchmarks
go test -bench=. ./benchmarks/

Code of Conduct

This project follows the Go Community Code of Conduct. Please be respectful and constructive in all interactions.

Why Go Bindings?

Go is widely used for:

Cloud-native applications and microservices
Data processing pipelines
Network services and proxies
CLI tools and utilities

OpenZL's format-aware compression is perfect for these use cases, but there are currently no Go bindings. This project aims to bring OpenZL's power to the Go ecosystem with idiomatic, high-performance bindings.

Comparison with Other Go Compression Libraries

Library	Compression Ratio	Speed	Format-Aware	Type-Aware
gzip	Baseline	Slow	No	No
zstd	Good	Fast	No	No
snappy	Low	Very Fast	No	No
go-openzl	Excellent	Fast	Yes	Yes

OpenZL excels when you have:

Structured or typed data
Repeated data patterns
High compression requirements with speed constraints
Need for format introspection

Roadmap

✅ v0.1.0 (October 2025) - Initial Release

✅ Core compression/decompression
✅ Context API (20-50% faster)
✅ Typed numeric compression (2-50x better ratios)
✅ Streaming API (io.Reader/Writer)
✅ 45 tests, 100% passing
✅ Full CI/CD pipeline
✅ Complete documentation

🎯 v1.0.0 (Q1 2026) - Stable Release

🚀 v1.1.0 (Q2 2026) - Enhanced Parameters

🔬 v2.0.0 (Q3 2026) - Advanced Features

Python/C++ feature parity - see Advanced Features Roadmap below.

Advanced Features Roadmap

The following advanced features from OpenZL's C++ and Python implementations are planned for future releases:

Custom Compression Graphs (v2.0)

What it is: Build custom compression pipelines by combining encoding nodes.

C++ Example:

CustomGraph graph;
graph.addNode("delta");      // Delta encoding
graph.addNode("bitpack");    // Bit packing
graph.addNode("entropy");    // Entropy coding
graph.connect(0, 1);
graph.connect(1, 2);

Planned Go API:

graph := openzl.NewGraph()
graph.AddNode(openzl.NodeDelta)
graph.AddNode(openzl.NodeBitpack)
graph.AddNode(openzl.NodeEntropy)
graph.Connect(0, 1, 2)

compressor, _ := openzl.NewCompressor(
    openzl.WithCustomGraph(graph),
)

Status: 📋 Planned for v2.0 Complexity: High - requires deep OpenZL internals integration Use Case: <5% of users need this level of customization

Custom Selectors (v2.0)

What it is: Dynamically choose compression strategy per data block.

Python Example:

selector = AdaptiveSelector(
    strategies=["fast", "balanced", "best"],
    threshold=0.8  # Switch strategy based on compression ratio
)
compressor = openzl.Compressor(selector=selector)

Planned Go API:

selector := openzl.NewAdaptiveSelector(
    openzl.StrategyFast,
    openzl.StrategyBalanced,
    openzl.StrategyBest,
)

compressor, _ := openzl.NewCompressor(
    openzl.WithSelector(selector),
)

Status: 📋 Planned for v2.0 Complexity: High - requires profiling and decision logic Use Case: Performance-critical applications with mixed data

Multi-Input Compression (v2.0+)

What it is: Compress multiple input streams together for better correlation.

Python Example:

streams = [timestamps, values, metadata]
compressed = openzl.compress_multi(streams)

Planned Go API:

streams := [][]byte{
    timestamps,
    values,
    metadata,
}

compressed, _ := openzl.CompressMulti(streams)

Status: 📋 Planned for v2.0 or later Complexity: Medium - requires stream coordination Use Case: Time-series data, columnar storage

Training & Dictionary Support (v2.0+)

What it is: Train compressor on representative data samples for better compression.

C++ Example:

Trainer trainer;
trainer.addSample(sample1);
trainer.addSample(sample2);
Dictionary dict = trainer.train();

Compressor compressor(dict);

Planned Go API:

trainer := openzl.NewTrainer()
trainer.AddSample(sample1)
trainer.AddSample(sample2)

dict, _ := trainer.Train()

compressor, _ := openzl.NewCompressor(
    openzl.WithDictionary(dict),
)

Status: 📋 Research phase Complexity: Very High - requires training algorithm implementation Use Case: Domain-specific data with known patterns

Transform Composition (v2.0)

What it is: Chain multiple transforms for specialized compression.

Python Example:

from openzl import transforms

pipeline = transforms.Pipeline([
    transforms.Delta(),
    transforms.Quantize(bits=8),
    transforms.Entropy(),
])

compressed = pipeline.compress(data)

Planned Go API:

pipeline := openzl.NewPipeline(
    openzl.TransformDelta(),
    openzl.TransformQuantize(8),
    openzl.TransformEntropy(),
)

compressed, _ := pipeline.Compress(data)

Status: 📋 Planned for v2.0 Complexity: Medium - requires transform chaining infrastructure Use Case: Specialized numeric/scientific data

Feature Priority

Based on user feedback and demand, we'll prioritize:

High Priority (v1.1):

✅ Basic parameter controls (compression level, buffer size)
✅ Additional platform support (Windows)
✅ Performance monitoring and profiling

Medium Priority (v2.0):

Custom compression graphs
Adaptive selectors
Transform composition
Multi-input compression

Lower Priority (v2.0+):

Training and dictionary support
Advanced introspection APIs
Custom codec development

Why Not in v1.0?

We deliberately excluded advanced features from v1.0 because:

Complexity: Each feature adds significant API surface area
Usage: Less than 5% of users need these features
Stability: v1.0 focuses on rock-solid core functionality
Testing: Advanced features require extensive testing
Documentation: Each feature needs comprehensive docs and examples

Our v1.0 release covers 95% of use cases with:

✅ General-purpose compression
✅ High-performance context reuse
✅ Typed numeric compression
✅ Streaming for large files
✅ Thread-safe concurrent operations

Advanced features can be added in v2.0 without breaking v1.0 APIs.

Contributing to Advanced Features

Interested in helping implement advanced features? We welcome contributors!

Good first advanced features:

Basic parameter controls (v1.1)
Performance monitoring (v1.1)
Transform composition (v2.0)

Complex features needing experts:

Custom compression graphs
Training and dictionaries
Custom selectors

See CONTRIBUTING.md for guidelines.

Feedback Welcome!

Which advanced features would be most valuable to you?

Open an issue to discuss
Join discussions
Vote on feature requests with 👍 reactions

Your input helps us prioritize development!

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

OpenZL itself is also BSD licensed - see the OpenZL LICENSE.

Acknowledgments

Meta Open Source for creating and open-sourcing OpenZL
The Go Community for excellent CGO documentation and examples
Contributors who help make this project possible

Contact & Support

Issues: GitHub Issues
Discussions: GitHub Discussions
Package Documentation: pkg.go.dev

Related Projects

OpenZL - The upstream C/C++ library
zstd-go - High-performance zstd in Go
compress - Optimized Go compression packages

Star this project if you find it interesting! It helps us gauge interest and attract contributors.

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
.github/workflows		.github/workflows
dicttrainer		dicttrainer
docs		docs
documentation		documentation
examples		examples
internal		internal
purgo		purgo
test		test
.gitignore		.gitignore
.golangci.yml		.golangci.yml
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
ROADMAP_CGO.md		ROADMAP_CGO.md
TEST_ORGANIZATION.md		TEST_ORGANIZATION.md
benchmark_comparison_test.go		benchmark_comparison_test.go
benchmark_dict_test.go		benchmark_dict_test.go
benchmark_test.go		benchmark_test.go
c_library_comparison_test.go		c_library_comparison_test.go
codec_testing_matrix_test.go		codec_testing_matrix_test.go
compressor.go		compressor.go
compressor_purego.go		compressor_purego.go
compressor_test.go		compressor_test.go
decompressor.go		decompressor.go
decompressor_purego.go		decompressor_purego.go
doc.go		doc.go
edge_case_test.go		edge_case_test.go
errors.go		errors.go
fuzz_test.go		fuzz_test.go
go.mod		go.mod
go.sum		go.sum
klaus_post_improvements_test.go		klaus_post_improvements_test.go
lz77_optimized_comparison_test.go		lz77_optimized_comparison_test.go
options.go		options.go
purego_api_test.go		purego_api_test.go
reader.go		reader.go
reader_purego.go		reader_purego.go
simple_cgo.go		simple_cgo.go
simple_purego.go		simple_purego.go
simple_test.go		simple_test.go
stream_test.go		stream_test.go
typed.go		typed.go
typed_purego.go		typed_purego.go
typed_test.go		typed_test.go
version.go		version.go
writer.go		writer.go
writer_purego.go		writer_purego.go

Folders and files

Latest commit

History

Repository files navigation

go-openzl

What is OpenZL?

Status

Features

Phase 1: MVP ✅ Complete

Phase 2: Context API ✅ Complete

Phase 3: Typed API ✅ Complete

Phase 4: Streaming API ✅ Complete

Phase 5: Production Hardening ✅ Complete

Phase 6: Pure Go Implementation ✅ COMPLETE! (v0.3.3)

Phase 7: Dictionary Support ✅ COMPLETE! (v0.4.0)

Phase 7: Advanced Features (Planned - v1.1+)

Installation

Requirements

Quick Start

Simple One-Shot API

Context API (Better Performance)

Zero-Allocation API (Klaus Post Pattern)

Typed Compression (Phase 3)

Streaming API (Phase 4)

Performance

Phase 2 Context API (Reusable Contexts)

Phase 1 One-Shot API

Performance Improvement (Phase 2 vs Phase 1)

Compression Ratios (Observed)

Architecture

Current (CGO-based)

Future (Pure Go Decoder) 🚀

Documentation

Upstream Documentation

Project Structure

Contributing

Areas Where We Need Help

Getting Started

Development Setup

Code of Conduct

Why Go Bindings?

Comparison with Other Go Compression Libraries

Roadmap

✅ v0.1.0 (October 2025) - Initial Release

🎯 v1.0.0 (Q1 2026) - Stable Release

🚀 v1.1.0 (Q2 2026) - Enhanced Parameters

🔬 v2.0.0 (Q3 2026) - Advanced Features

Advanced Features Roadmap

Custom Compression Graphs (v2.0)

Custom Selectors (v2.0)

Multi-Input Compression (v2.0+)

Training & Dictionary Support (v2.0+)

Transform Composition (v2.0)

Feature Priority

Why Not in v1.0?

Contributing to Advanced Features

Feedback Welcome!

License

Acknowledgments

Contact & Support

Related Projects

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages