Drava - Ultra-High-Performance Synthetic Log Generator

Drava is an ultra-high-performance synthetic log generator designed to simulate distributed microservices producing logs at extreme rates. It generates structured JSON logs with realistic fields and can output either directly to Parquet files or stream to stdout with blazing-fast performance.

Features

🚀 Extreme Performance: Generate logs from 100/sec to 2M+/sec with real-time precision
⚡ Maximum Speed Generation: New --count option generates exact number of logs at 700k+/sec
🔥 Batch-Optimized: Dynamic batching system for maximum throughput at high rates
Multiple Output Modes:
- stdout - JSON logs for piping with optimized I/O batching
- parquet - Direct-to-Parquet with hourly partitioning and high-rate batching
Realistic Data: Configurable services, log levels, and message templates
Deterministic: Optional seed for reproducible generation
Lightweight: Single Rust binary with no external dependencies
Smart Timing: Hybrid spin-wait/sleep mechanism for microsecond precision

Installation

From Source

git clone https://github.com/pondwatch/drava
cd drava
cargo build --release
./target/release/drava --help

Usage

Basic Examples

Generate exact number of logs at maximum speed:

# Generate exactly 10,000 logs at max speed (~700k/sec)
drava --count 10000

# Generate 1 million logs to Parquet files at max speed
drava --count 1000000 --output parquet

# Generate 50k logs with deterministic seed
drava --count 50000 --seed 12345

Generate logs at specific rates:

# Generate 1000 logs/sec for 10 seconds
drava --rate 1000 --duration 10

# Generate logs indefinitely at 5000/sec
drava --rate 5000

# Generate 10k logs/sec for 60 seconds, save to Parquet
drava --rate 10000 --duration 60 --output parquet

Customize services and log levels:

# Only generate logs for specific services
drava --services "api,database,cache" --levels "INFO,ERROR"

# Generate exact count with custom services
drava --count 25000 --services "payments,auth" --levels "ERROR,WARN"

Command Line Options

Options:
  -r, --rate <RATE>          Log generation rate per second [default: 10000]
  -s, --services <SERVICES>  Service names (comma-separated) [default: api,auth,payments,users,frontend]
  -l, --levels <LEVELS>      Log levels (comma-separated) [default: INFO,WARN,ERROR,DEBUG]
  -d, --duration <DURATION>  Duration in seconds (0 for infinite) [default: 0]
  -c, --count <COUNT>        Number of logs to generate (overrides rate and duration for max speed generation)
      --seed <SEED>          Random seed for deterministic generation
  -o, --output <OUTPUT>      Output mode: stdout or parquet [default: stdout]
  -h, --help                 Print help
  -V, --version              Print version

Maximum Speed Generation

The --count option is perfect for generating exact datasets at maximum speed without rate limiting:

Key Benefits

🎯 Exact Count: Generate precisely the number of logs you need
⚡ Maximum Speed: No rate limiting - generates as fast as your system allows
📊 Performance Reporting: Shows actual generation rate achieved
🔧 All Options Supported: Works with services, levels, seed, and output modes

Performance Comparison

# Rate-limited: Generate 100k logs at 100k/sec (takes 1 second)
drava --rate 100000 --duration 1

# Maximum speed: Generate 100k logs as fast as possible (takes ~0.15 seconds!)
drava --count 100000

Use Cases for --count

Benchmarking: Generate exact dataset sizes for consistent testing
Quick Prototyping: Instantly create test datasets of any size
Performance Testing: Generate large datasets in minimal time
Controlled Experiments: Ensure reproducible dataset sizes

Log Schema

Each log entry includes:

timestamp - ISO8601 timestamp
service - Service name (e.g., "api", "auth", "payments")
level - Log level ("INFO", "WARN", "ERROR", "DEBUG")
message - Realistic message from predefined templates
trace_id - UUID for request tracing

Example Log Entry

{
  "timestamp": "2025-09-09T18:31:07.666301217+00:00",
  "service": "payments",
  "level": "INFO",
  "message": "User login successful",
  "trace_id": "9278ce8d-3960-4c7c-83a6-8b4673d77c4f"
}

Parquet Output

When using --output parquet, logs are written to files with hourly partitioning:

logs/
├── 2025/
│   └── 09/
│       └── 09/
│           └── 18/
│               ├── part-<uuid1>.parquet
│               └── part-<uuid2>.parquet

Files are compressed with ZSTD and can be queried directly with DuckDB:

-- Query all logs from a specific hour
SELECT COUNT(*), service, level 
FROM 'logs/2025/09/09/18/*.parquet' 
GROUP BY service, level;

-- Time-based analysis
SELECT DATE_TRUNC('minute', timestamp) as minute, COUNT(*) 
FROM 'logs/2025/09/09/18/*.parquet' 
GROUP BY minute 
ORDER BY minute;

Performance

🔥 Extreme Rates: Successfully tested from 100/sec to 2,000,000+ logs/sec
⚡ Real-time Generation: Generates logs in real-time with microsecond precision
💪 Massive Throughput: Up to 172 billion logs per day at maximum rates
🧠 Smart Batching: Dynamic batch sizes (1% of rate) for optimal performance
💾 Memory Efficient: Low memory footprint with buffer reuse and efficient batching
📁 File Size: ~78KB parquet file for ~2400 logs with ZSTD compression
🦆 DuckDB Compatible: Generated parquet files work seamlessly with DuckDB

Benchmark Results

Rate-Limited Generation:

Rate	Duration	Runtime	Efficiency	Logs Generated
50k/sec	2s	2.01s	99.5%	~100k
100k/sec	1s	1.00s	100%	~100k
500k/sec	1s	1.00s	100%	~500k
1M/sec	1s	1.00s	100%	~1M
2M/sec	1s	1.00s	100%	~2M

Maximum Speed Generation (--count):

Count	Output	Time	Speed	Notes
1,000	stdout	0.002s	543k/sec	Small batch
100,000	stdout	0.156s	642k/sec	Medium batch
1,000,000	stdout	1.589s	629k/sec	Large batch
250,000	parquet	0.346s	722k/sec	Compressed Parquet

Use Cases

🏋️ Extreme Performance Testing: Stress-test log ingestion pipelines at enterprise scale
🚀 High-Rate Data Generation: Generate massive datasets for big data testing
⚡ Real-time Simulation: Simulate high-traffic microservices environments
📊 Development: Generate test data without real microservices
🎓 Education: Learn log analytics with realistic datasets
📈 Benchmarking: Test DuckDB query performance on log data
🔬 Load Testing: Generate billions of logs for system stress testing
🎯 Exact Dataset Creation: Generate precise number of logs for controlled experiments
⏱️ Quick Prototyping: Generate specific log counts instantly for testing

Technical Details

Language: Rust for maximum performance and zero-cost abstractions
Dependencies: Arrow, Parquet, Chrono, Clap, Tokio
Compression: ZSTD for optimal parquet file sizes
Smart Batching:
- Rate-limited: Dynamic batch sizes (1% of rate, 100-10k logs per batch)
- Maximum speed: Large batches (10k stdout, 50k Parquet) for optimal throughput
- Batch-level I/O operations to minimize syscalls
- Buffer reuse to reduce memory allocations
Precision Timing:
- Hybrid spin-wait for sub-100μs intervals
- Thread sleep for longer intervals
- Processing time compensation for accurate rates
Threading: Single-threaded with microsecond-precise timing control
Optimizations:
- JSON buffer reuse
- Batch stdout flushing
- Optimized timestamp parsing
- Minimal memory allocations

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Drava - Ultra-High-Performance Synthetic Log Generator

Features

Installation

From Source

Usage

Basic Examples

Command Line Options

Maximum Speed Generation

Key Benefits

Performance Comparison

Use Cases for --count

Log Schema

Example Log Entry

Parquet Output

Performance

Benchmark Results

Use Cases

Technical Details

Contributing

About

Uh oh!

Releases

Packages

Languages

pondwatch/drava

Folders and files

Latest commit

History

Repository files navigation

Drava - Ultra-High-Performance Synthetic Log Generator

Features

Installation

From Source

Usage

Basic Examples

Command Line Options

Maximum Speed Generation

Key Benefits

Performance Comparison

Use Cases for --count

Log Schema

Example Log Entry

Parquet Output

Performance

Benchmark Results

Use Cases

Technical Details

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages