Skip to content

boris-chu/go-openzl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

87 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

go-openzl

Test Go Reference Go Report Card codecov License

Go bindings for Meta's OpenZL format-aware compression framework

OpenZL is Meta's high-performance, format-aware compression library that delivers compression ratios comparable to specialized compressors while maintaining high speed. This project provides idiomatic Go bindings to make OpenZL accessible to the Go ecosystem.

What is OpenZL?

OpenZL is a novel data compression framework that:

  • Optimizes for your data format - Takes a description of your data and builds a specialized compressor
  • Maintains high speed - Performance comparable to dedicated tools without sacrificing compression ratios
  • Uses a universal decoder - All specialized compressors work with a single decoder
  • Self-describing format - Compressed data includes metadata about its structure
  • Production-proven - Used extensively in production at Meta

Perfect for:

  • AI/ML workloads with specialized datasets
  • High-throughput data processing pipelines
  • Structured data (logs, telemetry, database exports)
  • Network protocol optimization
  • Type-aware storage systems

Status

✅ v0.4.0 Complete - Dictionary Support with Public Training API!

This project is in active development:

  • Phase 1: MVP with simple Compress/Decompress API
  • Phase 2: Context API with 20-50% better performance
  • Phase 3: Typed compression for structured data (2-50x better ratios!)
  • Phase 4: Streaming API with io.Reader/Writer (2287 MB/s throughput!)
  • Phase 5: Production hardening (benchmarks, edge cases, CI/CD)
  • Phase 6: Pure Go Implementation (Zero-CGO compression AND decompression!)
  • Phase 7: Dictionary Support (49× compression on CSV data!)

Current Status:

  • ✅ One-shot compression/decompression API
  • ✅ Reusable Compressor and Decompressor types
  • ✅ Thread-safe concurrent operations
  • ✅ Typed compression with Go generics (50x better ratios!)
  • ✅ Streaming API with io.Reader/Writer interfaces
  • ✅ Support for all numeric types (int8-64, uint8-64, float32/64)
  • ✅ Automatic buffering and frame management
  • ✅ File compression/decompression support
  • ✅ Options pattern for configuration
  • Dictionary-based LZ77 compression (49× on CSV!)
  • Public dictionary training API (dicttrainer package)
  • Pre-trained dictionaries for CSV, JSON, source code
  • ✅ Comprehensive test coverage (100% passing - 300+ tests)
  • ✅ Fuzz testing (8.2M+ executions, zero crashes)
  • ✅ Edge case coverage (100MB files, 10K concurrent ops)
  • ✅ Performance benchmarks vs gzip/zstd
  • ✅ Complete godoc documentation (100% coverage)
  • ✅ CI/CD with GitHub Actions
  • Pure Go compression AND decompression (complete end-to-end Pure Go support!)

We're looking for contributors! See Contributing below.

Features

Phase 1: MVP ✅ Complete

  • ✅ Simple Compress() and Decompress() functions
  • ✅ Basic compression and decompression
  • ✅ Error handling and reporting
  • ✅ Frame introspection (size queries)
  • ✅ Comprehensive test coverage
  • ✅ Example programs

Phase 2: Context API ✅ Complete

  • ✅ Reusable Compressor and Decompressor types
  • ✅ Thread-safe concurrent operations (verified with race detector)
  • ✅ Options pattern framework for configuration
  • ✅ 20-50% performance improvement over one-shot API
  • ✅ Extensive benchmarks and performance testing
  • ✅ Context example program

Phase 3: Typed API ✅ Complete

  • ✅ TypedRef creation and management
  • ✅ Typed numeric compression/decompression
  • ✅ Type-safe API using Go generics
  • ✅ Support for all numeric types (int8-64, uint8-64, float32/64)
  • ✅ Context API integration for typed compression
  • ✅ 2-50x better compression ratios on numeric data

Phase 4: Streaming API ✅ Complete

  • io.Reader/io.Writer interfaces
  • ✅ Streaming compression/decompression
  • ✅ Automatic buffer management
  • ✅ Large file support (tested with 100MB files)
  • ✅ Configurable frame sizes
  • ✅ Reset and reuse support
  • ✅ 2.3 GB/s throughput

Phase 5: Production Hardening ✅ Complete

  • ✅ Fuzz testing (2M+ executions, zero crashes)
  • ✅ Edge case coverage (truncated frames, large files, 10K concurrent ops)
  • ✅ Benchmark comparisons vs gzip/zstd
  • ✅ Migration guide from other compressors
  • ✅ Complete godoc documentation (100% coverage)
  • ✅ CI/CD for multiple platforms (Linux, macOS)
  • ✅ golangci-lint with 30+ linters
  • ✅ v0.1.0 release

Phase 6: Pure Go Implementation ✅ COMPLETE! (v0.3.3)

Goal: Eliminate CGO dependency for compression AND decompression, enabling faster builds and cross-compilation.

Status: ✅ COMPLETE - Frame v22 with native multi-stage pipelines!

Latest (v0.3.3): 🔥 Frame Format v22 with native LZ77→Huffman pipelines

  • 27-35× compression ratios on JSON and text data!
  • Single frame instead of double-wrapping (~30-60 bytes overhead saved)
  • Stores intermediate node sizes for proper multi-stage decompression
  • Fully backward compatible with v21 frames
  • CompressSmart() automatically uses best pipeline

What's Implemented:

  • Pure Go Compression with Multi-Stage Pipelines (v0.3.3)

    • CompressSmart() - Intelligent codec selection with automatic pipelines
    • 27.64× compression on JSON (12KB → 460 bytes) 🔥
    • 35.25× compression on repeated text (5KB → 139 bytes) 🔥
    • 20× compression on sparse data (1KB → 50 bytes) 🔥
    • Native LZ77→Huffman pipelines in single Frame v22
    • Smart fallback: only uses multi-stage if it helps
    • Compress() with Huffman-only (2.59x on text, legacy)
    • CompressInt64/Float64/String() with Delta encoding (2.74x)
    • 2.8 GB/s compression speed
    • All tests passing (100% pass rate)
  • Pure Go Decompression (Complete decoder)

    • Frame v22 support - Reads intermediate node sizes
    • Reverse execution - Properly decodes multi-stage pipelines
    • Frame parser (79 tests, 1.6 GB/s)
    • Graph executor (42 tests, 16.2 GB/s)
    • 10 codecs: Identity, Constant, Delta, ZigZag, Bitpack, FSE, Huffman, LZ77, RLE, Transpose
    • Multi-stage pipelines (v0.3.3):
      • LZ77→Huffman: 27.64× on JSON (Frame v22) 🔥
      • LZ77→Huffman: 35.25× on repeated text (Frame v22) 🔥
      • RLE→Huffman: 20× on sparse data
      • Delta→Huffman: 2.78x on timestamps
    • Typed API: DecompressInt64/Float64/etc. (17 tests, 490 MB/s)
    • Streaming API: purgo.Reader with io.Reader interface (12 tests, 2.3 GB/s)
    • 280+ tests (100% passing)

Phase 7: Dictionary Support ✅ COMPLETE! (v0.4.0)

Goal: Add specialized dictionary support to LZ77 with external dictionary API for batch compression.

Status: ✅ COMPLETE - External Dictionary API Working!

What's Implemented:

  • Dictionary-Based LZ77 Compression

    • 47.76× compression on 100MB repetitive data (best case) 🔥
    • Dictionary LZ77 → Huffman pipeline (Frame v22 multi-stage)
    • Type 0/1/2 tokens (Literal, Window Match, Dictionary Match)
    • Efficient linear search with 3-byte prefix optimization
    • NewLZ77WithDict() constructor
    • Full roundtrip encode/decode support
    • Params-based dictionary passing for graph execution
  • Public Dictionary Training API (dicttrainer package)

    • Train custom dictionaries on your data
    • Smart sampling (1M samples for fast training)
    • Compression value scoring: score = frequency × (length - 5)
    • Greedy non-overlapping pattern selection
    • Custom pattern injection
    • Configurable pattern lengths (3-32 bytes default)
    • Statistics API for corpus analysis
    • ~50 MB/s training speed
  • External Dictionary API (purgo package) - NEW!

    • CompressWithDict() - Compresses WITHOUT embedding dictionary
    • DecompressWithDict() - Requires external dictionary file
    • 46.76× compression on batch workloads (10 files) 🔥
    • 28% better than no dictionary on batch compression
    • Dictionary stored once, reused for all files (like a "library" file)
    • Perfect for compressing many similar files
  • Test Coverage

    • 40+ new tests (100% passing)
    • 10 dictionary LZ77 tests
    • 11 dictionary trainer tests
    • 7 external dictionary tests (batch compression)
    • 4 error handling tests
    • Comprehensive documentation

Batch Compression Results:

10 × 11KB CSV files (117KB total):

  • CompressSmart (no dict): 36.45× compression (baseline)
  • CompressWithDict (external): 46.76× compression28% better!
  • Storage: 500-byte dictionary + 10 compressed files = 2.5KB total

Single File (20KB CSV):

  • CompressSmart (no dict): 9.72× compression
  • CompressWithDict (external): 10.04× compression3% better!

Usage Example (External Dictionary):

import "github.com/boris-chu/go-openzl/dicttrainer"
import "github.com/boris-chu/go-openzl/purgo"

// Step 1: Train dictionary on representative data
trainer := dicttrainer.New()
trainer.AddFile("sample1.csv")
trainer.AddFile("sample2.csv")
dict := trainer.Train(500) // 500-byte dictionary
os.WriteFile("csv-dict.bin", dict, 0644)

// Step 2: Compress many files with same dictionary
dict, _ := os.ReadFile("csv-dict.bin")
for _, file := range filesToCompress {
    data, _ := os.ReadFile(file)
    compressed, _ := purgo.CompressWithDict(data, dict)
    os.WriteFile(file+".openzl", compressed, 0644)
}
// Dictionary overhead: 500 bytes total (stored ONCE!)

// Step 3: Decompress (dictionary required)
dict, _ := os.ReadFile("csv-dict.bin")
for _, file := range compressedFiles {
    compressed, _ := os.ReadFile(file)
    data, _ := purgo.DecompressWithDict(compressed, dict)
}

Key Features:

  • ✅ Dictionary NOT embedded in compressed files (smaller output!)
  • ✅ Dictionary stored separately (like a .dll or codec pack)
  • ✅ 28% better compression on batch workloads
  • ✅ All roundtrip tests passing
  • ✅ Proper error handling (wrong dict, missing dict, etc.)

When to use:

  • ✅ Compressing 10+ similar files (CSV, JSON, logs)
  • Batch compression scenarios
  • ✅ When dictionary can be shared/distributed once

When NOT to use:

  • ❌ Single-file compression (use CompressSmart instead)
  • ❌ Files too small (<1KB each)
  • ❌ Cannot distribute dictionary file

Usage Examples:

// Pure Go compression AND decompression (no CGO!)
import "github.com/boris-chu/go-openzl/purgo"

// NEW v0.3.3: CompressSmart with automatic pipeline selection
compressed, _ := purgo.CompressSmart([]byte(`{"users":[...]}`))
// → 27.64× compression on JSON (automatic LZ77→Huffman pipeline!)

// Compress text with intelligent codec selection
compressed, _ := purgo.CompressSmart([]byte("repeated text pattern..."))
// → 35.25× compression (automatic multi-stage pipeline!)

// Legacy: Simple Huffman compression
compressed, _ := purgo.Compress([]byte("your CSV data here"))
// → 2.59× compression (Huffman-only)

// Compress numeric data (timestamps, IDs, sorted values)
compressed, _ := purgo.CompressInt64([]int64{1, 2, 3, 100, 101, 102})
// → 2.74× compression (Delta encoding)

// Decompress - simple one-liner!
data, _ := purgo.Decompress(compressed)
numbers, _ := purgo.DecompressInt64(compressed)
floats, _ := purgo.DecompressFloat64(compressed)

// Supports all numeric types:
// - int8, int16, int32, int64
// - uint8, uint16, uint32, uint64
// - float32, float64

// Streaming decompression (io.Reader interface)
file, _ := os.Open("data.zl")
reader, _ := purgo.NewReader(file)
defer reader.Close()

io.Copy(os.Stdout, reader) // Stream decompressed data!

// Or read incrementally:
buffer := make([]byte, 4096)
for {
    n, err := reader.Read(buffer)
    if err == io.EOF {
        break
    }
    process(buffer[:n])
}

Benefits of Pure Go Implementation (v0.3.3):

  • 🚀 Faster builds: No C compilation (10x faster go build)
  • 🌍 Easy cross-compilation: GOOS=windows go build just works
  • 📦 Smaller binaries: No CGO overhead
  • 🐛 Better debugging: Pure Go stack traces
  • Excellent performance: 2.8 GB/s compression, 2.3 GB/s decompression
  • 🔥 Amazing compression: 27-35× on JSON/text (Frame v22 pipelines!)
  • 💪 Production-ready: 280+ tests (100% passing), fuzz tested

JSON/Text Compression (v0.3.3 - ready for production!):

// Compress JSON with 27× compression ratio!
jsonData := []byte(`{"users":[...]}`)
compressed, _ := purgo.CompressSmart(jsonData)  // 27× compression!

// Compress CSV/text with 35× compression ratio!
textData := []byte("repeated text pattern...")
compressed, _ := purgo.CompressSmart(textData)  // 35× compression!

// Decompress later
original, _ := purgo.Decompress(compressed)

Test Coverage (Pure Go v0.3.3):

  • ✅ 280+ total Pure Go tests (100% passing)
    • Compression tests (encoder + multi-stage pipelines)
    • Decompression tests (decoder + Frame v22)
    • Frame writer tests (v21/v22 compatibility)
  • ✅ Frame parser: 79 tests
  • ✅ Codec system: 181 tests (10 codecs: Identity, Constant, Delta, ZigZag, Bitpack, FSE, Huffman, LZ77, RLE, Transpose)
  • ✅ Graph executor: 42 tests
  • ✅ Integration tests: 10 end-to-end pipeline tests
  • ✅ Public API: 3 tests (compression + decompression)
  • ✅ Fuzz testing: 8.2M+ executions (zero crashes)

Phase 7: Advanced Features (Planned - v1.1+)

See Advanced Features Roadmap below for Python/C++ feature parity plans.

Installation

go get github.com/boris-chu/go-openzl@v0.1.0

Or add to your go.mod:

require github.com/boris-chu/go-openzl v0.1.0

Requirements

  • Go 1.21 or later
  • CGO enabled
  • C11 compiler
  • C++17 compiler (for OpenZL library)

The OpenZL C library will be automatically built during installation.

Quick Start

Simple One-Shot API

package main

import (
    "fmt"
    "log"

    "github.com/borischu/go-openzl"
)

func main() {
    // Compress data (one-shot)
    input := []byte("Hello, OpenZL!")
    compressed, err := openzl.Compress(input)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Original size: %d bytes\n", len(input))
    fmt.Printf("Compressed size: %d bytes\n", len(compressed))

    // Decompress data (one-shot)
    decompressed, err := openzl.Decompress(compressed)
    if err != nil {
        log.Fatal(err)
    }

    fmt.Printf("Decompressed: %s\n", decompressed)
}

Context API (Better Performance)

For repeated operations, use the Context API for 20-50% better performance:

package main

import (
    "fmt"
    "log"

    "github.com/borischu/go-openzl"
)

func main() {
    // Create reusable compressor
    compressor, err := openzl.NewCompressor()
    if err != nil {
        log.Fatal(err)
    }
    defer compressor.Close()

    // Create reusable decompressor
    decompressor, err := openzl.NewDecompressor()
    if err != nil {
        log.Fatal(err)
    }
    defer decompressor.Close()

    // Compress multiple messages (context reuse = faster!)
    messages := []string{"First message", "Second message", "Third message"}

    for _, msg := range messages {
        // Compress using reusable context
        compressed, err := compressor.Compress([]byte(msg))
        if err != nil {
            log.Fatal(err)
        }

        // Decompress using reusable context
        decompressed, err := decompressor.Decompress(compressed)
        if err != nil {
            log.Fatal(err)
        }

        fmt.Printf("Original: %s, Compressed: %d bytes\n", msg, len(compressed))
    }
}

Zero-Allocation API (Klaus Post Pattern)

For maximum performance with zero allocations, use the CompressTo API:

// Pre-allocate buffer once
dst := make([]byte, openzl.CompressBound(maxMessageSize))

// Process many messages with ZERO allocations!
for _, msg := range messages {
    n, err := compressor.CompressTo(dst, msg)
    if err != nil {
        log.Fatal(err)
    }
    // Use dst[:n] - no allocation!
    sendOverNetwork(dst[:n])
}

Performance: 0 B/op, 0 allocs/op (175k ops/sec, 159 MB/s)

Typed Compression (Phase 3)

OpenZL excels at compressing typed data - achieving 2-50x better compression ratios:

// Compress an array of integers (achieves much better compression!)
numbers := []int64{1, 2, 3, 4, 5, 100, 101, 102}
compressed, err := openzl.CompressNumeric(numbers)
if err != nil {
    log.Fatal(err)
}

// Decompress back to typed slice
decompressed, err := openzl.DecompressNumeric[int64](compressed)
if err != nil {
    log.Fatal(err)
}

// Use with context API for best performance
compressor, _ := openzl.NewCompressor()
defer compressor.Close()

compressed, err := openzl.CompressorCompressNumeric(compressor, numbers)

// Supports all numeric types
int32Data := []int32{1, 2, 3, 4, 5}
uint64Data := []uint64{100, 200, 300}
float64Data := []float64{1.1, 2.2, 3.3}

compressed1, _ := openzl.CompressNumeric(int32Data)
compressed2, _ := openzl.CompressNumeric(uint64Data)
compressed3, _ := openzl.CompressNumeric(float64Data)

Streaming API (Phase 4)

Stream large files without loading them entirely into memory:

// Compress a file
input, _ := os.Open("large-file.txt")
output, _ := os.Create("large-file.txt.zl")

writer, _ := openzl.NewWriter(output)
io.Copy(writer, input)  // Stream and compress
writer.Close()

// Decompress a file
compressedFile, _ := os.Open("large-file.txt.zl")
decompressed, _ := os.Create("large-file.txt.decompressed")

reader, _ := openzl.NewReader(compressedFile)
io.Copy(decompressed, reader)  // Stream and decompress
reader.Close()

// Custom frame size for different use cases
writer, _ := openzl.NewWriter(output, openzl.WithFrameSize(256*1024)) // 256KB frames

Performance: 2287 MB/s streaming compression throughput!

Performance

Benchmarked on Apple M4 Pro:

Phase 2 Context API (Reusable Contexts)

  • Compression: 327k ops/sec (3.6 μs/op)
  • Decompression: 2.2M ops/sec (545 ns/op)
  • Memory: 576 B/op compress, 16 B/op decompress

Phase 1 One-Shot API

  • Compression: 264k ops/sec (4.6 μs/op)
  • Decompression: 1.0M ops/sec (1.1 μs/op)
  • Memory: 584 B/op compress, 24 B/op decompress

Performance Improvement (Phase 2 vs Phase 1)

  • Compression: 21% faster with context reuse
  • Decompression: 49% faster with context reuse
  • Memory: Reduced allocations per operation

Compression Ratios (Observed)

  • Small text (11 bytes): 0.26x (expected header overhead)
  • Repeated data (400 bytes): 9.52x compression ratio
  • Large repeated data (45KB): 500x compression ratio
  • Unicode text: 0.37x (small data overhead)

Note: Compression ratios improve significantly with larger and more structured data.

Run benchmarks yourself:

go test -bench=. -benchmem

Architecture

Current (CGO-based)

┌─────────────────────────────────────────────────┐
│                Go API Layer                     │
│  - Idiomatic Go interfaces                      │
│  - io.Reader/Writer support                     │
│  - Type-safe generics                           │
│  - Concurrent processing                        │
└─────────────────────────────────────────────────┘
                     ↓
┌─────────────────────────────────────────────────┐
│                CGO Bindings                     │
│  - Thin wrapper over C API                      │
│  - Memory management                            │
│  - Error translation                            │
└─────────────────────────────────────────────────┘
                     ↓
┌─────────────────────────────────────────────────┐
│             OpenZL C Library                    │
│  - C11 core implementation                      │
│  - Format-aware compression                     │
│  - Universal decompressor                       │
└─────────────────────────────────────────────────┘

Future (Pure Go Decoder) 🚀

┌─────────────────────────────────────────────────┐
│                Go API Layer                     │
│  - Unified API for both compression paths      │
│  - Automatic fallback/selection                │
└─────────────────────────────────────────────────┘
           ↓                          ↓
  [Compression]              [Decompression]
           ↓                          ↓
┌──────────────────┐      ┌──────────────────────┐
│  CGO → C Library │      │   Pure Go Decoder    │
│  (Fast encoding) │      │   - Frame Parser ✅   │
│                  │      │   - Graph Executor ✅ │
│                  │      │   - Codecs (WIP)     │
│                  │      │   - No CGO needed!   │
└──────────────────┘      └──────────────────────┘

Benefits of Pure Go Decoder:
✅ Faster builds (no CGO)
✅ Easy cross-compilation
✅ Smaller binaries
✅ Better debugging
✅ Maintained compression performance via C library

Documentation

Upstream Documentation

Project Structure

go-openzl/
├── README.md           # This file
├── LICENSE             # BSD 3-Clause License
├── go.mod              # Go module definition
├── *.go                # Public API (CGO-based)
│   ├── compress.go     # One-shot compression
│   ├── compressor.go   # Reusable compressor
│   ├── decompressor.go # Reusable decompressor
│   ├── typed.go        # Typed compression
│   ├── reader.go       # Streaming reader
│   └── writer.go       # Streaming writer
├── internal/           # Pure Go decoder (in development)
│   ├── frame/          # Frame parser (Phase 1 ✅)
│   ├── codec/          # 10 codecs: Identity, Constant, Delta, ZigZag, Bitpack, FSE, Huffman, LZ77, RLE, Transpose ✅
│   └── graph/          # Graph executor (Phase 2 ✅)
├── examples/           # Usage examples
│   ├── simple/         # Basic compression example
│   ├── context/        # Context API example
│   ├── typed/          # Typed compression example
│   └── streaming/      # Streaming API example
├── documentation/      # Additional documentation
└── vendor/             # Vendored OpenZL C library

Contributing

We welcome contributions! This project is in its early stages and there's plenty to do.

Areas Where We Need Help

  • Core Implementation: CGO bindings for OpenZL C API
  • Testing: Comprehensive test coverage and fuzzing
  • Documentation: Examples, guides, and API docs
  • Performance: Benchmarking and optimization
  • CI/CD: GitHub Actions workflows for multiple platforms
  • Packaging: Cross-platform build and distribution

Getting Started

  1. Fork the repository
  2. Read the OpenZL documentation to understand the library
  3. Check the issues for tasks
  4. Join the discussion in issues or discussions
  5. Submit a PR with your contribution

Development Setup

# Clone the repository
git clone https://github.com/yourusername/go-openzl.git
cd go-openzl

# Initialize submodules (for OpenZL C library)
git submodule update --init --recursive

# Build the OpenZL C library
make build-openzl

# Run tests
go test ./...

# Run benchmarks
go test -bench=. ./benchmarks/

Code of Conduct

This project follows the Go Community Code of Conduct. Please be respectful and constructive in all interactions.

Why Go Bindings?

Go is widely used for:

  • Cloud-native applications and microservices
  • Data processing pipelines
  • Network services and proxies
  • CLI tools and utilities

OpenZL's format-aware compression is perfect for these use cases, but there are currently no Go bindings. This project aims to bring OpenZL's power to the Go ecosystem with idiomatic, high-performance bindings.

Comparison with Other Go Compression Libraries

Library Compression Ratio Speed Format-Aware Type-Aware
gzip Baseline Slow No No
zstd Good Fast No No
snappy Low Very Fast No No
go-openzl Excellent Fast Yes Yes

OpenZL excels when you have:

  • Structured or typed data
  • Repeated data patterns
  • High compression requirements with speed constraints
  • Need for format introspection

Roadmap

✅ v0.1.0 (October 2025) - Initial Release

  • ✅ Core compression/decompression
  • ✅ Context API (20-50% faster)
  • ✅ Typed numeric compression (2-50x better ratios)
  • ✅ Streaming API (io.Reader/Writer)
  • ✅ 45 tests, 100% passing
  • ✅ Full CI/CD pipeline
  • ✅ Complete documentation

🎯 v1.0.0 (Q1 2026) - Stable Release

  • Community feedback from v0.1.0
  • Windows platform support
  • Additional parameter controls
  • Performance optimizations
  • API stability guarantee
  • Production case studies

🚀 v1.1.0 (Q2 2026) - Enhanced Parameters

  • Compression level control (fast/default/best)
  • Window size configuration
  • Custom buffer management
  • Advanced error reporting
  • Memory usage controls
  • Performance profiling tools

🔬 v2.0.0 (Q3 2026) - Advanced Features

Python/C++ feature parity - see Advanced Features Roadmap below.

Advanced Features Roadmap

The following advanced features from OpenZL's C++ and Python implementations are planned for future releases:

Custom Compression Graphs (v2.0)

What it is: Build custom compression pipelines by combining encoding nodes.

C++ Example:

CustomGraph graph;
graph.addNode("delta");      // Delta encoding
graph.addNode("bitpack");    // Bit packing
graph.addNode("entropy");    // Entropy coding
graph.connect(0, 1);
graph.connect(1, 2);

Planned Go API:

graph := openzl.NewGraph()
graph.AddNode(openzl.NodeDelta)
graph.AddNode(openzl.NodeBitpack)
graph.AddNode(openzl.NodeEntropy)
graph.Connect(0, 1, 2)

compressor, _ := openzl.NewCompressor(
    openzl.WithCustomGraph(graph),
)

Status: 📋 Planned for v2.0 Complexity: High - requires deep OpenZL internals integration Use Case: <5% of users need this level of customization


Custom Selectors (v2.0)

What it is: Dynamically choose compression strategy per data block.

Python Example:

selector = AdaptiveSelector(
    strategies=["fast", "balanced", "best"],
    threshold=0.8  # Switch strategy based on compression ratio
)
compressor = openzl.Compressor(selector=selector)

Planned Go API:

selector := openzl.NewAdaptiveSelector(
    openzl.StrategyFast,
    openzl.StrategyBalanced,
    openzl.StrategyBest,
)

compressor, _ := openzl.NewCompressor(
    openzl.WithSelector(selector),
)

Status: 📋 Planned for v2.0 Complexity: High - requires profiling and decision logic Use Case: Performance-critical applications with mixed data


Multi-Input Compression (v2.0+)

What it is: Compress multiple input streams together for better correlation.

Python Example:

streams = [timestamps, values, metadata]
compressed = openzl.compress_multi(streams)

Planned Go API:

streams := [][]byte{
    timestamps,
    values,
    metadata,
}

compressed, _ := openzl.CompressMulti(streams)

Status: 📋 Planned for v2.0 or later Complexity: Medium - requires stream coordination Use Case: Time-series data, columnar storage


Training & Dictionary Support (v2.0+)

What it is: Train compressor on representative data samples for better compression.

C++ Example:

Trainer trainer;
trainer.addSample(sample1);
trainer.addSample(sample2);
Dictionary dict = trainer.train();

Compressor compressor(dict);

Planned Go API:

trainer := openzl.NewTrainer()
trainer.AddSample(sample1)
trainer.AddSample(sample2)

dict, _ := trainer.Train()

compressor, _ := openzl.NewCompressor(
    openzl.WithDictionary(dict),
)

Status: 📋 Research phase Complexity: Very High - requires training algorithm implementation Use Case: Domain-specific data with known patterns


Transform Composition (v2.0)

What it is: Chain multiple transforms for specialized compression.

Python Example:

from openzl import transforms

pipeline = transforms.Pipeline([
    transforms.Delta(),
    transforms.Quantize(bits=8),
    transforms.Entropy(),
])

compressed = pipeline.compress(data)

Planned Go API:

pipeline := openzl.NewPipeline(
    openzl.TransformDelta(),
    openzl.TransformQuantize(8),
    openzl.TransformEntropy(),
)

compressed, _ := pipeline.Compress(data)

Status: 📋 Planned for v2.0 Complexity: Medium - requires transform chaining infrastructure Use Case: Specialized numeric/scientific data


Feature Priority

Based on user feedback and demand, we'll prioritize:

High Priority (v1.1):

  1. ✅ Basic parameter controls (compression level, buffer size)
  2. ✅ Additional platform support (Windows)
  3. ✅ Performance monitoring and profiling

Medium Priority (v2.0):

  1. Custom compression graphs
  2. Adaptive selectors
  3. Transform composition
  4. Multi-input compression

Lower Priority (v2.0+):

  1. Training and dictionary support
  2. Advanced introspection APIs
  3. Custom codec development

Why Not in v1.0?

We deliberately excluded advanced features from v1.0 because:

  1. Complexity: Each feature adds significant API surface area
  2. Usage: Less than 5% of users need these features
  3. Stability: v1.0 focuses on rock-solid core functionality
  4. Testing: Advanced features require extensive testing
  5. Documentation: Each feature needs comprehensive docs and examples

Our v1.0 release covers 95% of use cases with:

  • ✅ General-purpose compression
  • ✅ High-performance context reuse
  • ✅ Typed numeric compression
  • ✅ Streaming for large files
  • ✅ Thread-safe concurrent operations

Advanced features can be added in v2.0 without breaking v1.0 APIs.


Contributing to Advanced Features

Interested in helping implement advanced features? We welcome contributors!

Good first advanced features:

  1. Basic parameter controls (v1.1)
  2. Performance monitoring (v1.1)
  3. Transform composition (v2.0)

Complex features needing experts:

  1. Custom compression graphs
  2. Training and dictionaries
  3. Custom selectors

See CONTRIBUTING.md for guidelines.


Feedback Welcome!

Which advanced features would be most valuable to you?

  • Open an issue to discuss
  • Join discussions
  • Vote on feature requests with 👍 reactions

Your input helps us prioritize development!

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.

OpenZL itself is also BSD licensed - see the OpenZL LICENSE.

Acknowledgments

  • Meta Open Source for creating and open-sourcing OpenZL
  • The Go Community for excellent CGO documentation and examples
  • Contributors who help make this project possible

Contact & Support

Related Projects

  • OpenZL - The upstream C/C++ library
  • zstd-go - High-performance zstd in Go
  • compress - Optimized Go compression packages

Star this project if you find it interesting! It helps us gauge interest and attract contributors.

About

Pure Go implementation of OpenZL compression library with typed compression, streaming API, and 2-50x better compression ratios than gzip

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors