Note: This is a toy project.
BacktraceDB is something I built to understand database internals better. It’s not production-ready, hasn’t been security-audited, and shouldn’t be used for anything serious. Think of it as a lab notebook you can run.
BacktraceDB is a small column-oriented storage engine written to explore how real databases work under the hood.
I wanted something concrete to experiment with:
- how write-ahead logs actually make in-memory state durable,
- how columnar layouts change query performance,
- how metadata (like min/max stats) lets engines skip huge chunks of data,
- and what happens when your dataset is bigger than RAM.
This project is intentionally simple, sometimes naive, and very opinionated. That’s the point.
Recent (“hot”) data lives in memory for fast writes and reads.
Once a block reaches a size threshold, it’s frozen and flushed to disk as an Apache Parquet file.
After that, the block is immutable.
Every row append is written to a WAL before it’s visible to the table.
On startup, the database:
- Scans existing Parquet blocks
- Loads their metadata (min/max stats)
- Replays the WAL to rebuild any in-memory state that wasn’t flushed yet
The filesystem is treated as the source of truth.
Data is stored column-wise, not row-wise. This makes:
- scans cheaper,
- aggregates faster,
- and compression (dictionary encoding for strings) straightforward.
Each Parquet block stores min/max statistics per column.
During a query, these stats are checked first so entire blocks can be skipped without being read from disk.
import (
"backtraceDB/internal/db"
"backtraceDB/internal/schema"
)
func main() {
storage, _ := db.Open("trading_db")
defer storage.Close()
s := schema.Schema{
Name: "orders",
TimeColumn: "ts",
Columns: []schema.Column{
{Name: "ts", Type: schema.Int64},
{Name: "symbol", Type: schema.String},
{Name: "price", Type: schema.Float64},
{Name: "qty", Type: schema.Int64},
},
}
tbl, _ := storage.CreateTable(s)
tbl.MaxBlockSize = 100_000
tbl.UseDiskStorage = true
}If the table already exists on disk, CreateTable will recover it instead of creating a new one.
Filters are chained. The engine applies predicate pushdown so only relevant blocks are scanned.
reader := tbl.Reader().
Filter("symbol", "==", "BTC").
Filter("qty", ">", int64(100)).
Filter("price", ">", 45000.0)
for {
row, ok := reader.Next()
if !ok {
break
}
// process row
}On startup:
- All Parquet files are discovered
- Their metadata is loaded into memory
- The WAL is replayed to restore rows that were still in RAM
If the process crashes mid-ingestion, the worst case is re-reading some WAL entries.
| Package | Purpose |
|---|---|
internal/db |
Database and table lifecycle management |
internal/table |
Core engine logic, block rotation, readers |
internal/wal |
Binary WAL encoding and replay |
internal/schema |
Type system, validation, column indexing |
# Correctness tests
./test.sh
# Basic performance benchmarks
./benchmark.shThere’s also a stress_test.go that pushes ingestion past available RAM to exercise block rotation.
This project exists so I can answer questions like:
- “What actually happens when a database crashes?”
- “Why does columnar storage help analytics?”
- “How much work can metadata save during a scan?”
If you’re curious about those things too, feel free to poke around.
If you want to experiment:
- Fork it
- Break it
- Fix it
- Open a PR
Ideas like basic aggregations, better indexing, or smarter pruning are all fair game.