Extreme-density encrypted data vault for LLM training pipelines.
MsgPack + Zstd-L22 + corpus dictionary + AES-256-GCM + SHA3-256 Merkle trees + REST API
▶ Full 26s demo video · ⭐ Star on GitHub
pip install quantum-pulse → qp seal dataset.json --offline → 39× compression + AES-256-GCM
pip install quantum-pulseOr clone for the full server + Docker setup:
git clone https://github.com/Naveenub/quantum-pulse.git
cd quantum-pulse
cp .env.example .env
docker-compose up -dQUANTUM-PULSE is an open-source compress-then-encrypt vault built specifically for LLM training data. Every blob is compressed with a cross-corpus Zstd dictionary, encrypted with AES-256-GCM, integrity-verified with a SHA3-256 Merkle tree, and stored in MongoDB — all through a single API call or CLI command.
Those tools only compress. QUANTUM-PULSE also:
- Trains a shared dictionary across your corpus so every shard benefits from every other shard's patterns
- Encrypts each blob with a per-record key derived via PBKDF2 + HKDF
- Verifies integrity via SHA3-256 Merkle trees on unseal — silently corrupted data is impossible
- Groups related shards into MasterPulses with cross-shard deduplication
- Exposes a virtual mount so training scripts read vaulted data without ever decrypting to disk
Cryptographic tools earn trust through scrutiny, not marketing.
QUANTUM-PULSE is open source because:
- Crypto needs public review — before anyone puts real training data through this pipeline, the implementation should be auditable. Security through obscurity is not security.
- Community builds better benchmarks — ML engineers working with real datasets will find edge cases no synthetic corpus can simulate. Submit yours to
benchmarks/community/. - Adoption precedes monetization — if this solves a real problem at scale, a hosted version becomes a natural next step. Demand should be proven, not assumed.
Self-hosting a secure vault means managing keys, uptime, and backups yourself. A managed version of QUANTUM-PULSE is in planning:
- Zero-ops — no MongoDB to run, no passphrase rotation to script
- Metered billing — pay per GiB sealed, not per seat
- Compliance-ready — audit log export, key rotation SLA, SOC 2 roadmap
- Same open protocol — data sealed via the API can always be unsealed with the self-hosted version; no lock-in
Interested in early access? Open a GitHub Discussion or star the repo to signal demand.
────────────────────────────────────────────────────────────────────
Algorithm Ratio vs gzip Time Enc Int
────────────────────────────────────────────────────────────────────
snappy 12.03× −80.9% 0.7 ms ✗ ✗
lz4 33.80× −46.2% 0.4 ms ✗ ✗
gzip-9 62.86× baseline 9.1 ms ✗ ✗
zstd-L3 76.19× +21.2% 0.7 ms ✗ ✗
QUANTUM-PULSE ◀ 95.51× +51.9% 553.4 ms ✓ ✓ ← fastest secure
zstd-L22+MsgPack 96.60× +53.7% 1173.4 ms ✗ ✗
zstd-L22 99.58× +58.4% 1644.3 ms ✗ ✗
brotli-11 112.95× +79.7% 1354.2 ms ✗ ✗
────────────────────────────────────────────────────────────────────
Enc = AES-256-GCM encryption Int = SHA3-256 Merkle integrity
| Claim | Evidence |
|---|---|
| Fastest high-compression pipeline with security | 553 ms vs 1173 ms (zstd+mp), 1354 ms (brotli), 1644 ms (zstd-L22) |
| Only option with both encryption + integrity | Every other row shows ✗/✗ |
| 3× faster than zstd-L22 vanilla | Dictionary eliminates per-shard pattern re-discovery |
| Brotli-11 wins raw ratio | 112× vs 95× — but 2.4× slower, no security at all |
Reproduce: python scripts/benchmark_compare.py
Full details: BENCHMARKS.md
pip install quantum-pulse
# Generate a strong passphrase
qp keygen
# Seal a file
qp seal dataset.json --passphrase "yourpassphrase16+" --offline
# → dataset.qp (AES-256-GCM encrypted · SHA3-256 Merkle signed)
# Recover it — byte-perfect
qp unseal dataset.qp --passphrase "yourpassphrase16+" --offline --output recovered.json
# Benchmark
qp benchmark --passphrase "yourpassphrase16+"git clone https://github.com/Naveenub/quantum-pulse.git
cd quantum-pulse
cp .env.example .env # set QUANTUM_PASSPHRASE and QUANTUM_API_KEYS
docker-compose up -d
# Seal via API
curl -X POST http://localhost:8747/pulse/seal \
-H "X-API-Key: my-api-key" \
-H "Content-Type: application/json" \
-d '{"payload": {"text": "hello world", "tokens": [1,2,3]}}'
# Seal via CLI
qp seal dataset.json --tag version=v1
qp unseal <pulse-id>qp keygen # generate strong passphrase
qp seal dataset.json --tag v1 # seal (needs MongoDB)
qp seal dataset.json --passphrase "p16+" --offline # seal offline → dataset.qp
qp unseal dataset.qp --passphrase "p16+" --offline # recover offline ← byte-perfect
qp unseal <pulse-id> # decrypt from MongoDB to stdout
qp list # list stored pulses
qp info <pulse-id> # pulse metadata
qp rotate <pulse-id> # re-encrypt under new passphrase
qp scan ./data/ # seal entire directory tree
qp master <id1> <id2> ... # build cross-shard MasterPulse
qp benchmark --passphrase "p16+" # run seal benchmark
qp health # query server health
qp config # print redacted config┌──────────────────────────────────────────────────────────────┐
│ QUANTUM-PULSE │
│ │
│ Scanner ──▶ Engine (MsgPack→Zstd-dict→AES-GCM→Merkle) │
│ │ │
│ Vault (PBKDF2+HKDF key derivation) │
│ │ │
│ DB (MongoDB/GridFS or in-memory) │
│ │
│ FastAPI :8747 · auth · rate-limit · Prometheus · audit │
└──────────────────────────────────────────────────────────────┘
dict → MsgPack → Zstd-L22+corpus-dict → AES-256-GCM → SHA3-256 Merkle
[ MAGIC 4B ][ VER 1B ][ NONCE 12B ][ CIPHERTEXT + GCM-TAG ]
quantum-pulse/
├── assets/
│ ├── quantum_pulse_demo.gif # animated demo — seal & unseal live
│ └── quantum_pulse_promo.mp4 # full 26s promo video
├── benchmarks/
│ └── community/ # submit your real-world benchmark results here
│ └── README.md
├── core/
│ ├── adaptive.py # AdaptiveDictManager — self-improving Zstd dict, retrains every 24h
│ ├── audit.py # append-only audit log (JSONL + MongoDB)
│ ├── auth.py # API-key + JWT auth, scope-based (read/write/admin)
│ ├── compression.py # PulseCompressor — async Zstd-L22 wrapper + streaming
│ ├── config.py # Pydantic Settings V2 — all config, secrets, validation
│ ├── db.py # async MongoDB / GridFS persistence (motor)
│ ├── engine.py # QuantumEngine — MsgPack→Zstd→AES-GCM→Merkle pipeline
│ ├── health.py # Kubernetes liveness / readiness / startup probes
│ ├── interface.py # FUSE-like virtual mount — sealed files, no plaintext on disk
│ ├── metrics.py # Prometheus counters, histograms, gauges
│ ├── middleware.py # security headers, request-id, timing, rate-limit
│ ├── retry.py # circuit breaker, bulkhead, exponential backoff
│ ├── scanner.py # high-speed filesystem scanner (os.scandir + threading)
│ ├── scheduler.py # APScheduler — dict retrain, TTL cleanup, metrics flush
│ └── vault.py # QuantumVault — PBKDF2-SHA256 + HKDF key derivation
├── models/
│ └── pulse_models.py # Pydantic V2 models — PulseBlob, MasterPulse, CompressionStats
├── scripts/
│ ├── benchmark_compare.py # head-to-head vs snappy / lz4 / gzip / brotli / zstd
│ ├── benchmark_demo.py # reproduces README benchmark numbers
│ ├── gen_corpus.py # generate synthetic LLM training corpus for testing
│ └── verify_scheduler.py # verify APScheduler dict retrain fires correctly
├── tests/
│ ├── test_api.py # 27 integration tests — full HTTP layer
│ ├── test_engine.py # 27 unit tests — core seal/unseal/Merkle pipeline
│ └── test_units.py # 223 extended unit tests — 81%+ coverage enforced
├── .github/
│ ├── CODEOWNERS # code ownership assignments
│ ├── ISSUE_TEMPLATE/
│ │ ├── benchmark.md # community benchmark submission template
│ │ ├── bug_report.md # structured bug report
│ │ ├── feature_request.md # feature proposal
│ │ └── security_audit_issue_template.md # crypto audit / security review template
│ ├── PULL_REQUEST_TEMPLATE.md # PR checklist
│ └── workflows/
│ └── ci.yml # lint → unit-tests → api-tests → benchmark → docker-build
├── cli.py # qp CLI — 12 commands, full offline seal/unseal
├── main.py # FastAPI app entry point — all routes wired
├── pyproject.toml # build config, dependencies, ruff/mypy/pytest/coverage
├── requirements.txt # pinned deps for Docker / CI
├── Makefile # make test / bench / lint / docker-up / run
├── Dockerfile
├── docker-compose.yml # MongoDB + API, one command start
├── .env.example # all environment variables with defaults
├── .pre-commit-config.yaml # ruff + mypy pre-commit hooks
├── .gitignore
├── BENCHMARKS.md # full benchmark methodology and results
├── CHANGELOG.md # version history
├── CONTRIBUTING.md # contribution guide, design principles
├── CODE_OF_CONDUCT.md
├── LICENSE # MIT
└── SECURITY.md # vulnerability reporting + audit status
Compression
- Zstd level 22 with trained cross-corpus dictionary
- MsgPack binary encoding (22% smaller than JSON pre-compression)
- Dictionary auto-retrains every 24h as your corpus grows
Security
- AES-256-GCM with hardware AES-NI
- Per-blob HKDF-derived keys — one compromised blob reveals nothing else
- PBKDF2-SHA256, 600,000 iterations (Argon2id planned for v1.1)
- SHA3-256 Merkle tree — every unseal is cryptographically verified
- No formal third-party audit yet — see SECURITY.md
Operations
- FastAPI REST with OpenAPI docs at
/docs - API-key + JWT auth, scope-based access (
read/write/admin) - Prometheus metrics, Kubernetes health probes (
/healthz/live|ready|startup) - Append-only audit log (JSONL file + MongoDB)
qpCLI with 12 commands, full offline mode
Developer Experience
- 277 tests (27 engine unit · 27 API integration · 223 extended unit) — 81%+ coverage
make test,make bench,make lint,make docker-up- GitHub Actions CI — lint → unit-tests → api-tests → benchmark → docker-build
- Pre-commit hooks — ruff + mypy on every commit
| Method | Path | Scope | Description |
|---|---|---|---|
| POST | /pulse/seal |
write | Compress + encrypt a payload |
| POST | /pulse/unseal |
read | Decrypt by pulse ID |
| GET | /pulse/stream/{id} |
read | Stream unsealed bytes |
| GET | /pulse/list |
read | List all pulses |
| DELETE | /pulse/{id} |
admin | Delete a pulse |
| POST | /pulse/rotate/{id} |
admin | Re-encrypt under new passphrase |
| POST | /pulse/master |
write | Build a cross-shard MasterPulse |
| POST | /scan |
write | Seal an entire directory tree |
| GET | /metrics |
— | Prometheus metrics |
| GET | /healthz |
— | Full health report |
| GET | /audit/recent |
admin | Audit log tail |
| POST | /auth/token |
— | Exchange API key for JWT |
Full interactive docs at http://localhost:8747/docs
QUANTUM_PASSPHRASE=my-passphrase # required, min 16 chars
QUANTUM_API_KEYS=["key1","key2"] # JSON array
QUANTUM_ENVIRONMENT=development # development|staging|production
QUANTUM_MONGO_URI=mongodb://localhost:27017
QUANTUM_PORT=8747
QUANTUM_ZSTD_LEVEL=22
QUANTUM_KDF_ITERATIONS=600000Full reference: .env.example · core/config.py
make test # all 277 tests
make test-unit # core pipeline only
make test-api # HTTP layer only
make test-cov # with coverage report
make bench # full pipeline benchmark| Version | Change |
|---|---|
| v1.0.0 | Initial release |
| v1.0.1 | Fixed build backend, added qp seal --offline, fixed CI coverage |
| v1.0.2 | Wired qp unseal --offline — complete offline round-trip verified |
| v1.0.3 | Published to PyPI — pip install quantum-pulse |
All contributions welcome — see CONTRIBUTING.md.
Especially needed:
- Alternative storage backends (S3, Redis, SQLite)
- Real-world dataset benchmarks → submit to
benchmarks/community/ - Streaming seal/unseal for files > 2 GB
- Language bindings (Rust, Go, Node.js client)
- Security review — crypto implementation audit, fuzzing, side-channel analysis
Open Source
- PyPI package —
pip install quantum-pulse - Argon2id KDF (replacing PBKDF2-SHA256) — v1.1
- S3 / GCS storage backend — v1.2
- Streaming seal for files > 2 GB
- OpenTelemetry tracing
- Benchmark vs Apache Parquet + snappy
- Rust client SDK
- Key rotation without re-sealing (re-wrap mode)
- WASM build for browser-side sealing
Hosted (quantum-pulse.cloud)
- Managed API with metered billing (per GiB sealed)
- Web dashboard — browse, search, audit pulses
- Team workspaces + role-based access
- Webhook on seal/unseal events
- SOC 2 Type II audit
MIT — free for commercial and personal use.
Built on: python-zstandard · cryptography · FastAPI · msgpack-python · Pydantic
