Skip to content

miguelbernadi/quickwit-exporter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Quickwit Exporter

License: MIT Go Version

A parallel data exporter for Quickwit with adaptive rate limiting and automatic splitting for large datasets.

Because sometimes you need to get those logs on your machine for the analysis you need to do.

Features

  • Parallel Processing - Export multiple days concurrently with configurable worker pools
  • 🚀 Adaptive Rate Limiting - Automatically adjusts request rate (1-20 req/s) based on server response times and errors
  • 📦 Automatic Subdivision - Handles large datasets by intelligently splitting time ranges based on record counts
  • 💾 Compressed Output - Automatic gzip compression of exported JSONL data
  • 🔄 Resume Support - Skip already-exported days and time ranges on restart
  • 🏗️ Work Queue Architecture - Worker-local subdivision queues prevent deadlock
  • 📋 Coverage-Based Aggregation - Detects completion and merges temp files when ranges are fully covered

Installation

Prerequisites

  • Go 1.21 or later

Build from source

git clone https://github.com/miguelbernadi/quickwit-exporter.git
cd quickwit-exporter
go build -o quickwit-exporter cmd/exporter/main.go

Or install directly:

go install github.com/miguelbernadi/quickwit-exporter/cmd/exporter@latest

Usage

Basic Usage

./quickwit-exporter --server https://quickwit.example.com

This will export the last 30 days (default) using query "*" (all records).

Full Options

./quickwit-exporter \
  --server https://quickwit.example.com \
  --index logs \
  --query "level:error" \
  --days 7 \
  --workers 5

Command-Line Flags

Flag Default Description
--server required Quickwit server URL
--index myindex Index name to query
--query * Quickwit search query
--days 30 Number of days to export (counting backwards from today)
--output quickwit_export_{date} Output directory path
--temp-dir {output}/.tmp Temporary files directory
--workers 3 Number of parallel workers
--debug false Enable debug logging

Examples

Export with custom time range

./quickwit-exporter \
  --server https://quickwit.example.com \
  --days 7

Export with custom query

./quickwit-exporter \
  --server https://quickwit.example.com \
  --query "level:error AND service:api"

Export specific index

./quickwit-exporter \
  --server https://quickwit.example.com \
  --index logs

High-performance export

For faster exports with powerful servers:

./quickwit-exporter \
  --server https://quickwit.example.com \
  --workers 5

Output Format

Directory Structure

quickwit_export_20251112/
├── export_1731369600-1731456000.jsonl.gz
├── export_1731283200-1731369600.jsonl.gz
├── export_1731196800-1731283200.jsonl.gz
└── ...

Files are named with Unix timestamps: export_{startUnix}-{endUnix}.jsonl.gz

Each file contains all records for a single day in JSONL format (one JSON object per line), compressed with gzip.

File Format

Each line in the uncompressed file is a complete JSON object:

{"timestamp":"2025-11-12T10:23:45Z","message":"API key validated","level":"info",...}
{"timestamp":"2025-11-12T10:24:12Z","message":"User authentication successful","level":"debug",...}

Working with exported data

Decompress and view:

zcat quickwit_export_20251112/export_*.jsonl.gz | head -10

Combine multiple days:

zcat quickwit_export_20251112/export_*.jsonl.gz > combined_all.jsonl

Sort by timestamp:

zcat quickwit_export_20251112/export_*.jsonl.gz | \
  jq -s 'sort_by(.timestamp) | .[]' -c > combined_sorted.jsonl

Filter and analyze:

# Count records by level
zcat export_*.jsonl.gz | jq -r '.level' | sort | uniq -c

# Extract specific fields
zcat export_*.jsonl.gz | jq '{timestamp, level, message}' -c

# Find all errors
zcat export_*.jsonl.gz | jq 'select(.level == "error")'

How It Works

Architecture

Main Process
     ↓
Orchestrator (splits time range into ≤1 day chunks)
     ↓
Shared Work Queue → [Day1, Day2, Day3, ...]
     ↓
Workers (N parallel, default: 3)
   ├─ Each worker has local subdivision queue (capacity: 100)
   ├─ Check count for time range
   ├─ If count ≤ 10K: Fetch & write temp file
   └─ If count > 10K: Subdivide & enqueue to local queue
     ↓
Adaptive Rate Limiter (1-20 req/s, adjusts automatically)
     ↓
Quickwit API
     ↓
Compactor (coverage-based aggregation)
   ├─ Monitors completed work items
   ├─ Aggregates when range fully covered
   └─ Produces: export_{startUnix}-{endUnix}.jsonl.gz

Handling Large Datasets

The exporter intelligently handles datasets exceeding the 10,000 record fetch limit:

  1. Check count first: Use lightweight count API (MaxHits: 0) to check record count
  2. Smart decision:
    • If count ≤ 10K: Fetch all records directly
    • If count > 10K: Subdivide time range before fetching
  3. Work queue subdivision: Split range and enqueue to worker-local queue
  4. Parallel processing: Each worker processes its own subdivisions
  5. Coverage-based aggregation: Compactor merges temp files when range is fully covered

Example for a day with 35,000 records:

Day (35K records) → Check count
  ↓ Count > 10K, subdivide into 4 quarters
Q1 (8,750 records) → Fetch directly → write temp file
Q2 (8,750 records) → Fetch directly → write temp file
Q3 (8,750 records) → Fetch directly → write temp file
Q4 (8,750 records) → Fetch directly → write temp file
  ↓ All quarters complete, coverage check passes
Compactor → Merge Q1+Q2+Q3+Q4 → Final file (35K records)

Key advantage: Never hits offset limit because we check count first and subdivide proactively.

Adaptive Rate Limiting

The rate limiter automatically adjusts based on server health:

  • Initial rate: 5 requests/second (conservative start)
  • Speed up: If response times < 500ms and error rate < 1%, increase by 20%
  • Slow down: If response times > 2s or error rate > 10%, decrease by 30%
  • Limits: Min 1 req/s, Max 20 req/s

This ensures optimal performance without overwhelming the server.

Performance

Optimization Tips

  1. Workers: Set to 3-5 for balanced performance (diminishing returns beyond 5)
  2. Adaptive rate limiting: Automatically adjusts between 1-20 req/s based on server health
  3. Network: Run on same cloud region as Quickwit for best performance
  4. Disk I/O: Use fast storage for output directory (SSD recommended)

Performance Characteristics

  • Automatically subdivides large time ranges to handle Quickwit's 10K record limit
  • Worker-local subdivision queues prevent deadlock
  • Coverage-based aggregation enables incremental completion
  • Resume capability allows restarts without re-downloading completed days

Troubleshooting

Rate limit errors (429)

The adaptive rate limiter should handle this automatically, but if you see persistent 429s:

  • Reduce --workers (try 1 or 2)
  • The rate limiter will automatically slow down on errors

Slow performance

  • Check network latency to Quickwit server
  • Monitor server resource usage
  • Increase --workers if server can handle it
  • Verify disk I/O isn't bottlenecked
  • Check adaptive rate limiter isn't being too conservative

Development

Project Structure

quickwit-exporter/
├── cmd/
│   └── exporter/
│       └── main.go                  # CLI entry point
├── internal/
│   ├── client/
│   │   ├── quickwit.go              # Quickwit API client
│   │   └── quickwit_test.go
│   ├── contextlog/
│   │   ├── contextlog.go            # Context-based logging helpers
│   │   └── contextlog_test.go
│   ├── exporter/
│   │   ├── orchestrator.go          # Main orchestration & coordination
│   │   ├── worker.go                # Work queue processing
│   │   ├── compactor.go             # Coverage-based aggregation
│   │   ├── coverage.go              # Coverage checking logic
│   │   ├── file_writer.go           # JSONL file writing
│   │   └── *_test.go                # Comprehensive test suite
│   └── ratelimit/
│       ├── adaptive.go              # Adaptive rate limiter
│       └── adaptive_test.go
├── .github/workflows/
│   └── pr-checks.yml                # CI/CD pipeline
├── Makefile                         # Build automation
├── CLAUDE.md                        # Development guide
└── README.md

Running Tests

make test          # Run tests with race detection (recommended)
go test ./...      # Basic test run
go test -race ./...  # With race detection

Building for Different Platforms

# Linux
GOOS=linux GOARCH=amd64 go build -o quickwit-exporter-linux cmd/exporter/main.go

# macOS (Intel)
GOOS=darwin GOARCH=amd64 go build -o quickwit-exporter-darwin-amd64 cmd/exporter/main.go

# macOS (Apple Silicon)
GOOS=darwin GOARCH=arm64 go build -o quickwit-exporter-darwin-arm64 cmd/exporter/main.go

# Windows
GOOS=windows GOARCH=amd64 go build -o quickwit-exporter.exe cmd/exporter/main.go

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Guidelines

  1. Follow Go best practices and idioms
  2. Maintain minimal external dependencies
  3. Add tests for new functionality
  4. Update README for user-facing changes
  5. Use meaningful commit messages

License

MIT License

Support

For issues, questions, or feature requests, please open an issue on GitHub.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors