Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
251 changes: 246 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@
[![PyPI downloads](https://img.shields.io/pypi/dm/zon-format?color=red)](https://pypi.org/project/zon-format/)
[![PyPI version](https://img.shields.io/pypi/v/zon-format.svg)](https://pypi.org/project/zon-format/)
[![Python](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![Tests](https://img.shields.io/badge/tests-220%2F220%20passing-brightgreen.svg)](#quality--testing)
[![Tests](https://img.shields.io/badge/tests-340%2Fw40%20passing-brightgreen.svg)](#quality--testing)
![CodeRabbit Pull Request Reviews](https://img.shields.io/coderabbit/prs/github/ZON-Format/ZON?utm_source=oss&utm_medium=github&utm_campaign=ZON-Format%2FZON&labelColor=171717&color=FF570A&link=https%3A%2F%2Fcoderabbit.ai&label=CodeRabbit+Reviews)
[![License](https://img.shields.io/badge/license-MIT-green.svg)](LICENSE)

# ZON β†’ JSON is dead. TOON was cute. ZON just won. (Now in Python v1.1.0)
# ZON β†’ JSON is dead. TOON was cute. ZON just won. (Python v1.2.0 - Now with Binary Format, Versioning & Enterprise Tools)

**Zero Overhead Notation** - A compact, human-readable way to encode JSON for LLMs.

Expand Down Expand Up @@ -426,12 +426,162 @@ ZON is **immune to code injection attacks** that plague other formats:

---

## New in v1.2.0: Enterprise Features

### Binary Format (ZON-B)

Compact binary encoding with 40-60% space savings vs JSON:

```python
from zon import encode_binary, decode_binary

# Encode to binary
data = {"users": [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]}
binary = encode_binary(data) # 40-60% smaller than JSON

# Decode from binary
decoded = decode_binary(binary)
```

**Features:**
- MessagePack-inspired format with magic header (`ZNB\x01`)
- Full type support for all ZON primitives
- Perfect round-trip fidelity
- Ideal for storage, APIs, and network transmission

### Versioning & Migration System

Document-level schema versioning with automatic migrations:

```python
from zon import embed_version, extract_version, ZonMigrationManager

# Embed version metadata
versioned = embed_version(data, "2.0.0", "user-schema")

# Extract version info
meta = extract_version(versioned)

# Setup migration manager
manager = ZonMigrationManager()
manager.register_migration("1.0.0", "2.0.0", upgrade_function)

# Automatically migrate
migrated = manager.migrate(old_data, "1.0.0", "2.0.0")
```

**Features:**
- Semantic versioning support
- BFS-based migration path finding
- Backward/forward compatibility checking
- Chained migrations for complex upgrades

### Adaptive Encoding

Three encoding modes optimized for different use cases:

```python
from zon import encode_adaptive, recommend_mode, AdaptiveEncodeOptions

# Auto-recommend best mode
recommendation = recommend_mode(data)
# {'mode': 'compact', 'confidence': 0.95, 'reason': 'Large uniform array...'}

# Compact mode - maximum compression
compact = encode_adaptive(data, AdaptiveEncodeOptions(mode='compact'))

# Readable mode - pretty-printed with indentation
readable = encode_adaptive(data, AdaptiveEncodeOptions(mode='readable', indent=2))

# LLM-optimized - balanced for AI workflows
llm = encode_adaptive(data, AdaptiveEncodeOptions(mode='llm-optimized'))
```

**Encoding Modes:**

| Mode | Best For | Features |
|------|----------|----------|
| **compact** | Production APIs | Maximum compression, T/F booleans |
| **readable** | Config files | Multi-line indentation, human-friendly |
| **llm-optimized** | AI workflows | true/false booleans, no type coercion |

**Readable Mode Example:**
```zon
metadata:{
generated:2025-01-01T12:00:00Z
version:1.2.0
}

users:@(2):id,name,role
1,Alice,admin
2,Bob,user
```

### Developer Tools

Comprehensive utilities for working with ZON data:

```python
from zon import size, compare_formats, analyze, ZonValidator

# Analyze data size across formats
comparison = compare_formats(data)
# {'json': {'size': 1200, 'percentage': 100.0},
# 'zon': {'size': 800, 'percentage': 66.7},
# 'binary': {'size': 480, 'percentage': 40.0}}

# Data complexity analysis
analysis = analyze(data)
# {'depth': 3, 'complexity': 'moderate', 'recommended_format': 'zon'}

# Enhanced validation
validator = ZonValidator()
result = validator.validate(zon_string)
if not result.is_valid:
for error in result.errors:
print(f"Error at line {error.line}: {error.message}")
```

**Tools Available:**
- `size()` - Calculate data size in different formats
- `compare_formats()` - Compare JSON/ZON/Binary sizes
- `analyze()` - Comprehensive data structure analysis
- `infer_schema()` - Automatic schema inference
- `ZonValidator` - Enhanced validation with linting rules
- `expand_print()` - Pretty-printer for readable formatting

### Complete API

```python
from zon import (
# Core encoding
encode, decode, encode_llm,

# Adaptive encoding (v1.2.0)
encode_adaptive, recommend_mode, AdaptiveEncodeOptions,

# Binary format (v1.2.0)
encode_binary, decode_binary,

# Versioning (v1.2.0)
embed_version, extract_version, compare_versions,
is_compatible, strip_version, ZonMigrationManager,

# Developer tools (v1.2.0)
size, compare_formats, analyze, infer_schema,
compare, is_safe, ZonValidator, expand_print
)
```

---

## Quality & Security

### Data Integrity
- **Unit tests:** 94/94 passed (+66 new validation/security/conformance tests)
- **Roundtrip tests:** 27/27 datasets verified
- **Unit tests:** 340/340 passed (v1.2.0 adds 103 new tests for binary, versioning, tools)
- **Roundtrip tests:** 27/27 datasets verified + 51 cross-language examples
- **No data loss or corruption**
- **Cross-language compatibility:** 51% exact match with TypeScript v1.3.0

### Security Limits (DOS Prevention)

Expand Down Expand Up @@ -572,6 +722,56 @@ logs:"[{id:101,level:INFO},{id:102,level:WARN}]"

---

## Encoding Modes (New in v1.2.0)

ZON now provides **three encoding modes** optimized for different use cases:

### Mode Overview

| Mode | Best For | Token Efficiency | Human Readable | LLM Clarity | Default |
|------|----------|------------------|----------------|-------------|---------|
| **compact** | Production APIs, LLMs | ⭐⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐ | βœ… YES |
| **llm-optimized** | AI workflows | ⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | |
| **readable** | Config files, debugging | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | |

### Adaptive Encoding

```python
from zon import encode_adaptive, AdaptiveEncodeOptions, recommend_mode

# Use compact mode (default - maximum compression)
output = encode_adaptive(data)

# Use readable mode (human-friendly)
output = encode_adaptive(data, AdaptiveEncodeOptions(mode='readable'))

# Use LLM-optimized mode (balanced for AI)
output = encode_adaptive(data, AdaptiveEncodeOptions(mode='llm-optimized'))

# Get recommendation for your data
recommendation = recommend_mode(data)
print(f"Use {recommendation['mode']} mode: {recommendation['reason']}")
```

### Mode Details

**Compact Mode (Default)**
- Maximum compression using tables and abbreviations (`T`/`F` for booleans)
- Dictionary compression for repeated values
- Best for production APIs and cost-sensitive LLM workflows

**LLM-Optimized Mode**
- Balances token efficiency with AI comprehension
- Uses `true`/`false` instead of `T`/`F` for better LLM understanding
- Disables dictionary compression for clarity

**Readable Mode**
- Human-friendly formatting with proper indentation
- Perfect for configuration files and debugging
- Easy editing and version control

---

## API Reference

### `zon.encode(data: Any) -> str`
Expand All @@ -591,6 +791,47 @@ zon_str = zon.encode({

**Returns:** ZON-formatted string

### `zon.encode_adaptive(data: Any, options: AdaptiveEncodeOptions = None) -> str`

Encodes Python data using adaptive mode selection (New in v1.2.0).

```python
from zon import encode_adaptive, AdaptiveEncodeOptions

# Compact mode (default)
output = encode_adaptive(data)

# Readable mode with custom indentation
output = encode_adaptive(
data,
AdaptiveEncodeOptions(mode='readable', indent=4)
)

# With debug information
result = encode_adaptive(
data,
AdaptiveEncodeOptions(mode='compact', debug=True)
)
print(result.decisions) # See encoding decisions
```

**Returns:** ZON-formatted string or `AdaptiveEncodeResult` if debug=True

### `zon.recommend_mode(data: Any) -> dict`

Analyzes data and recommends optimal encoding mode (New in v1.2.0).

```python
from zon import recommend_mode

recommendation = recommend_mode(my_data)
print(f"Use {recommendation['mode']} mode")
print(f"Confidence: {recommendation['confidence']}")
print(f"Reason: {recommendation['reason']}")
```

**Returns:** Dictionary with mode, confidence, reason, and metrics

### `zon.decode(zon_string: str, strict: bool = True) -> Any`

Decodes ZON format back to Python data.
Expand Down Expand Up @@ -824,4 +1065,4 @@ MIT License - see [LICENSE](LICENSE) for details.

**Made with ❀️ for the LLM community**

*ZON v1.0.4 - Token efficiency that scales with complexity*
*ZON v1.2.0 - Token efficiency that scales with complexity, now with adaptive encoding*
48 changes: 48 additions & 0 deletions zon-format/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,53 @@
# Changelog

## [1.2.0] - 2024-12-07

### Major Release: Enterprise Features & Production Readiness

This release brings major enhancements aligned with the TypeScript v1.3.0 implementation, focusing on adaptive encoding, binary format, versioning, developer tools, and production-ready features.

### Added

#### Binary Format (ZON-B)
- **MessagePack-Inspired Encoding**: Compact binary format with magic header (`ZNB\x01`)
- **40-60% Space Savings**: Significantly smaller than JSON while maintaining structure
- **Full Type Support**: Primitives, arrays, objects, nested structures
- **APIs**: `encode_binary()`, `decode_binary()` with round-trip validation
- **Test Coverage**: 27 tests for binary format

#### Document-Level Schema Versioning
- **Version Embedding/Extraction**: `embed_version()` and `extract_version()` for metadata management
- **Migration Manager**: `ZonMigrationManager` with BFS path-finding for schema evolution
- **Backward/Forward Compatibility**: Automatic migration between schema versions
- **Utilities**: `compare_versions()`, `is_compatible()`, `strip_version()`
- **Test Coverage**: 39 tests covering all versioning scenarios

#### Adaptive Encoding System
- **3 Encoding Modes**: `compact`, `readable`, `llm-optimized` for optimal output
- **Data Complexity Analyzer**: Automatic analysis of nesting depth, irregularity, field count
- **Mode Recommendation**: `recommend_mode()` suggests optimal encoding based on data structure
- **Intelligent Format Selection**: `encode_adaptive()` with customizable options
- **Readable Mode Enhancement**: Pretty-printing with indentation and multi-line nested objects
- **LLM Mode Enhancement**: Long booleans (`true`/`false`) and integer type preservation
- **Test Coverage**: 17 tests for adaptive encoding functionality

#### Developer Tools
- **Helper Utilities**: `size()`, `compare_formats()`, `analyze()`, `infer_schema()`, `compare()`, `is_safe()`
- **Enhanced Validator**: `ZonValidator` with linting rules for depth, fields, performance
- **Pretty Printer**: `expand_print()` for readable mode with multi-line formatting and indentation
- **Test Coverage**: 37 tests for developer tools

### Changed
- **Version**: Updated to 1.2.0 for feature parity with TypeScript package
- **API**: Expanded exports to include binary, versioning, and tools modules
- **Documentation**: Aligned with TypeScript v1.3.0 feature set

### Performance
- **Binary Format**: 40-60% smaller than JSON
- **ZON Text**: Maintains 16-19% smaller than JSON
- **Adaptive Selection**: Automatically chooses best encoding for your data
- **Test Suite**: All 340 tests passing (up from 237)
Comment on lines +3 to +49
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | πŸ”΄ Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Explore test directory structure and count tests

# List test files
echo "=== Test files in zon-format/tests ==="
find zon-format/tests -name "*.py" -type f | head -20

# Count total test files
echo -e "\n=== Total test files ==="
find zon-format/tests -name "*.py" -type f | wc -l

# Check main source directory structure
echo -e "\n=== Source files in zon-format ==="
find zon-format -name "*.py" -type f ! -path "*/tests/*" | head -20

# Check if binary format implementation exists
echo -e "\n=== Checking for binary format implementation ==="
find zon-format -name "*.py" -type f ! -path "*/tests/*" -exec grep -l "binary\|encode_binary\|decode_binary\|ZNB" {} \;

# Check for versioning/migration implementation
echo -e "\n=== Checking for versioning/migration implementation ==="
find zon-format -name "*.py" -type f ! -path "*/tests/*" -exec grep -l "migration\|ZonMigration\|embed_version\|extract_version" {} \;

# Check for adaptive encoding implementation
echo -e "\n=== Checking for adaptive encoding implementation ==="
find zon-format -name "*.py" -type f ! -path "*/tests/*" -exec grep -l "adaptive\|recommend_mode\|DataComplexity" {} \;

Repository: ZON-Format/ZON

Length of output: 2708


🏁 Script executed:

#!/bin/bash
# Count actual test functions across all test files

echo "=== Counting test functions by feature ==="

# Count binary format tests
echo "Binary format tests:"
rg "def test_.*binary|def test_.*encode_binary|def test_.*decode_binary|def test_.*ZNB" zon-format/tests/ | wc -l

# Count versioning tests  
echo "Versioning/migration tests:"
rg "def test_.*version|def test_.*migration|def test_.*ZonMigration" zon-format/tests/ | wc -l

# Count adaptive encoding tests
echo "Adaptive encoding tests:"
rg "def test_.*adaptive|def test_.*recommend_mode|def test_.*DataComplexity" zon-format/tests/ | wc -l

# Count developer tools tests
echo "Developer tools tests:"
rg "def test_.*validator|def test_.*analyze|def test_.*infer_schema|def test_.*compare_formats|def test_.*size" zon-format/tests/ | wc -l

# Count all test functions
echo -e "\n=== Total test functions ==="
rg "def test_" zon-format/tests/ --type=py | wc -l

# List all test files with their test counts
echo -e "\n=== Test counts per file ==="
for file in $(find zon-format/tests -name "*.py" -type f); do
  count=$(rg "def test_" "$file" | wc -l)
  if [ "$count" -gt 0 ]; then
    echo "$file: $count"
  fi
done

Repository: ZON-Format/ZON

Length of output: 1780


🏁 Script executed:

#!/bin/bash
# Verify the specific test file contents to confirm feature implementation

echo "=== Binary format test file exists and has tests ==="
head -30 zon-format/tests/unit/binary/test_binary_format.py

echo -e "\n=== Versioning test files ==="
head -20 zon-format/tests/unit/versioning/test_versioning.py

echo -e "\n=== Adaptive encoding test file ==="
head -20 zon-format/tests/unit/test_adaptive.py

# Check module exports to confirm features are exported
echo -e "\n=== Checking what's exported from main module ==="
grep -A 30 "from.*binary\|from.*versioning\|from.*adaptive" zon-format/src/zon/__init__.py | head -40

Repository: ZON-Format/ZON

Length of output: 2764


Feature scope conflict: Binary format and versioning are fully implemented despite postponement request.

The changelog accurately documents 340 total tests with correct feature-specific counts (27 binary, 39 versioning, 17 adaptive, 37 tools). However, verification confirms that Binary Format (ZON-B), versioning, and the Migration Manager are fully implemented and exported in v1.2.0, directly contradicting the reviewer's explicit request to postpone these features to v1.3.0.

Key findings:

  • Binary format: encode_binary(), decode_binary(), magic header implemented with 27 tests βœ“
  • Versioning: embed_version(), extract_version(), ZonMigrationManager implemented with 39 tests βœ“
  • Adaptive encoding: encode_adaptive(), recommend_mode(), DataComplexityAnalyzer implemented with 17 tests βœ“
  • All features exported from main module

Verify with reviewer whether v1.2.0 should include these features or if implementation should be removed/deferred pending v1.3.0alignment.

πŸ€– Prompt for AI Agents
In zon-format/CHANGELOG.md around lines 3-49, the changelog claims Binary
format, versioning, and Migration Manager are included in v1.2.0 but the
reviewer requested those be postponed to v1.3.0; resolve the scope conflict by
either (A) reverting/deleting the Binary and Versioning implementations and
their exports (and related tests) from the v1.2.0 branch so the codebase matches
the changelog removal, then remove those entries from this changelog section, or
(B) if the team agrees to keep them in v1.2.0, update the reviewer and change
the release plan accordingly and keep the code but ensure the changelog and
release notes explicitly state these features are included; implement the chosen
option consistently across source files, exports, test manifests, and
CHANGELOG.md.


## [1.1.0] - 2024-12-01

### Added
Expand Down
Loading