Drava is an ultra-high-performance synthetic log generator designed to simulate distributed microservices producing logs at extreme rates. It generates structured JSON logs with realistic fields and can output either directly to Parquet files or stream to stdout with blazing-fast performance.
- 🚀 Extreme Performance: Generate logs from 100/sec to 2M+/sec with real-time precision
- ⚡ Maximum Speed Generation: New
--countoption generates exact number of logs at 700k+/sec - 🔥 Batch-Optimized: Dynamic batching system for maximum throughput at high rates
- Multiple Output Modes:
stdout- JSON logs for piping with optimized I/O batchingparquet- Direct-to-Parquet with hourly partitioning and high-rate batching
- Realistic Data: Configurable services, log levels, and message templates
- Deterministic: Optional seed for reproducible generation
- Lightweight: Single Rust binary with no external dependencies
- Smart Timing: Hybrid spin-wait/sleep mechanism for microsecond precision
git clone https://github.com/pondwatch/drava
cd drava
cargo build --release
./target/release/drava --helpGenerate exact number of logs at maximum speed:
# Generate exactly 10,000 logs at max speed (~700k/sec)
drava --count 10000
# Generate 1 million logs to Parquet files at max speed
drava --count 1000000 --output parquet
# Generate 50k logs with deterministic seed
drava --count 50000 --seed 12345Generate logs at specific rates:
# Generate 1000 logs/sec for 10 seconds
drava --rate 1000 --duration 10
# Generate logs indefinitely at 5000/sec
drava --rate 5000
# Generate 10k logs/sec for 60 seconds, save to Parquet
drava --rate 10000 --duration 60 --output parquetCustomize services and log levels:
# Only generate logs for specific services
drava --services "api,database,cache" --levels "INFO,ERROR"
# Generate exact count with custom services
drava --count 25000 --services "payments,auth" --levels "ERROR,WARN"Options:
-r, --rate <RATE> Log generation rate per second [default: 10000]
-s, --services <SERVICES> Service names (comma-separated) [default: api,auth,payments,users,frontend]
-l, --levels <LEVELS> Log levels (comma-separated) [default: INFO,WARN,ERROR,DEBUG]
-d, --duration <DURATION> Duration in seconds (0 for infinite) [default: 0]
-c, --count <COUNT> Number of logs to generate (overrides rate and duration for max speed generation)
--seed <SEED> Random seed for deterministic generation
-o, --output <OUTPUT> Output mode: stdout or parquet [default: stdout]
-h, --help Print help
-V, --version Print version
The --count option is perfect for generating exact datasets at maximum speed without rate limiting:
- 🎯 Exact Count: Generate precisely the number of logs you need
- ⚡ Maximum Speed: No rate limiting - generates as fast as your system allows
- 📊 Performance Reporting: Shows actual generation rate achieved
- 🔧 All Options Supported: Works with services, levels, seed, and output modes
# Rate-limited: Generate 100k logs at 100k/sec (takes 1 second)
drava --rate 100000 --duration 1
# Maximum speed: Generate 100k logs as fast as possible (takes ~0.15 seconds!)
drava --count 100000- Benchmarking: Generate exact dataset sizes for consistent testing
- Quick Prototyping: Instantly create test datasets of any size
- Performance Testing: Generate large datasets in minimal time
- Controlled Experiments: Ensure reproducible dataset sizes
Each log entry includes:
timestamp- ISO8601 timestampservice- Service name (e.g., "api", "auth", "payments")level- Log level ("INFO", "WARN", "ERROR", "DEBUG")message- Realistic message from predefined templatestrace_id- UUID for request tracing
{
"timestamp": "2025-09-09T18:31:07.666301217+00:00",
"service": "payments",
"level": "INFO",
"message": "User login successful",
"trace_id": "9278ce8d-3960-4c7c-83a6-8b4673d77c4f"
}When using --output parquet, logs are written to files with hourly partitioning:
logs/
├── 2025/
│ └── 09/
│ └── 09/
│ └── 18/
│ ├── part-<uuid1>.parquet
│ └── part-<uuid2>.parquet
Files are compressed with ZSTD and can be queried directly with DuckDB:
-- Query all logs from a specific hour
SELECT COUNT(*), service, level
FROM 'logs/2025/09/09/18/*.parquet'
GROUP BY service, level;
-- Time-based analysis
SELECT DATE_TRUNC('minute', timestamp) as minute, COUNT(*)
FROM 'logs/2025/09/09/18/*.parquet'
GROUP BY minute
ORDER BY minute;- 🔥 Extreme Rates: Successfully tested from 100/sec to 2,000,000+ logs/sec
- ⚡ Real-time Generation: Generates logs in real-time with microsecond precision
- 💪 Massive Throughput: Up to 172 billion logs per day at maximum rates
- 🧠 Smart Batching: Dynamic batch sizes (1% of rate) for optimal performance
- 💾 Memory Efficient: Low memory footprint with buffer reuse and efficient batching
- 📁 File Size: ~78KB parquet file for ~2400 logs with ZSTD compression
- 🦆 DuckDB Compatible: Generated parquet files work seamlessly with DuckDB
Rate-Limited Generation:
| Rate | Duration | Runtime | Efficiency | Logs Generated |
|---|---|---|---|---|
| 50k/sec | 2s | 2.01s | 99.5% | ~100k |
| 100k/sec | 1s | 1.00s | 100% | ~100k |
| 500k/sec | 1s | 1.00s | 100% | ~500k |
| 1M/sec | 1s | 1.00s | 100% | ~1M |
| 2M/sec | 1s | 1.00s | 100% | ~2M |
Maximum Speed Generation (--count):
| Count | Output | Time | Speed | Notes |
|---|---|---|---|---|
| 1,000 | stdout | 0.002s | 543k/sec | Small batch |
| 100,000 | stdout | 0.156s | 642k/sec | Medium batch |
| 1,000,000 | stdout | 1.589s | 629k/sec | Large batch |
| 250,000 | parquet | 0.346s | 722k/sec | Compressed Parquet |
- 🏋️ Extreme Performance Testing: Stress-test log ingestion pipelines at enterprise scale
- 🚀 High-Rate Data Generation: Generate massive datasets for big data testing
- ⚡ Real-time Simulation: Simulate high-traffic microservices environments
- 📊 Development: Generate test data without real microservices
- 🎓 Education: Learn log analytics with realistic datasets
- 📈 Benchmarking: Test DuckDB query performance on log data
- 🔬 Load Testing: Generate billions of logs for system stress testing
- 🎯 Exact Dataset Creation: Generate precise number of logs for controlled experiments
- ⏱️ Quick Prototyping: Generate specific log counts instantly for testing
- Language: Rust for maximum performance and zero-cost abstractions
- Dependencies: Arrow, Parquet, Chrono, Clap, Tokio
- Compression: ZSTD for optimal parquet file sizes
- Smart Batching:
- Rate-limited: Dynamic batch sizes (1% of rate, 100-10k logs per batch)
- Maximum speed: Large batches (10k stdout, 50k Parquet) for optimal throughput
- Batch-level I/O operations to minimize syscalls
- Buffer reuse to reduce memory allocations
- Precision Timing:
- Hybrid spin-wait for sub-100μs intervals
- Thread sleep for longer intervals
- Processing time compensation for accurate rates
- Threading: Single-threaded with microsecond-precise timing control
- Optimizations:
- JSON buffer reuse
- Batch stdout flushing
- Optimized timestamp parsing
- Minimal memory allocations
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request