BacktraceDB

Note: This is a toy project.
BacktraceDB is something I built to understand database internals better. It’s not production-ready, hasn’t been security-audited, and shouldn’t be used for anything serious. Think of it as a lab notebook you can run.

What This Is

BacktraceDB is a small column-oriented storage engine written to explore how real databases work under the hood.

I wanted something concrete to experiment with:

how write-ahead logs actually make in-memory state durable,
how columnar layouts change query performance,
how metadata (like min/max stats) lets engines skip huge chunks of data,
and what happens when your dataset is bigger than RAM.

This project is intentionally simple, sometimes naive, and very opinionated. That’s the point.

Design Overview

Hybrid Storage Model

Recent (“hot”) data lives in memory for fast writes and reads.
Once a block reaches a size threshold, it’s frozen and flushed to disk as an Apache Parquet file.

After that, the block is immutable.

Write-Ahead Logging (WAL)

Every row append is written to a WAL before it’s visible to the table.
On startup, the database:

Scans existing Parquet blocks
Loads their metadata (min/max stats)
Replays the WAL to rebuild any in-memory state that wasn’t flushed yet

The filesystem is treated as the source of truth.

Columnar Layout

Data is stored column-wise, not row-wise. This makes:

scans cheaper,
aggregates faster,
and compression (dictionary encoding for strings) straightforward.

Block Pruning via Metadata

Each Parquet block stores min/max statistics per column.
During a query, these stats are checked first so entire blocks can be skipped without being read from disk.

Example Usage

Opening a Database and Table

import (
    "backtraceDB/internal/db"
    "backtraceDB/internal/schema"
)

func main() {
    storage, _ := db.Open("trading_db")
    defer storage.Close()

    s := schema.Schema{
        Name: "orders",
        TimeColumn: "ts",
        Columns: []schema.Column{
            {Name: "ts",     Type: schema.Int64},
            {Name: "symbol", Type: schema.String},
            {Name: "price",  Type: schema.Float64},
            {Name: "qty",    Type: schema.Int64},
        },
    }

    tbl, _ := storage.CreateTable(s)

    tbl.MaxBlockSize = 100_000
    tbl.UseDiskStorage = true
}

If the table already exists on disk, CreateTable will recover it instead of creating a new one.

Querying Data

Filters are chained. The engine applies predicate pushdown so only relevant blocks are scanned.

reader := tbl.Reader().
    Filter("symbol", "==", "BTC").
    Filter("qty", ">", int64(100)).
    Filter("price", ">", 45000.0)

for {
    row, ok := reader.Next()
    if !ok {
        break
    }
    // process row
}

Recovery Model

On startup:

All Parquet files are discovered
Their metadata is loaded into memory
The WAL is replayed to restore rows that were still in RAM

If the process crashes mid-ingestion, the worst case is re-reading some WAL entries.

Code Layout

Package	Purpose
`internal/db`	Database and table lifecycle management
`internal/table`	Core engine logic, block rotation, readers
`internal/wal`	Binary WAL encoding and replay
`internal/schema`	Type system, validation, column indexing

Testing

# Correctness tests
./test.sh

# Basic performance benchmarks
./benchmark.sh

There’s also a stress_test.go that pushes ingestion past available RAM to exercise block rotation.

Why This Exists

This project exists so I can answer questions like:

“What actually happens when a database crashes?”
“Why does columnar storage help analytics?”
“How much work can metadata save during a scan?”

If you’re curious about those things too, feel free to poke around.

Contributing

If you want to experiment:

Fork it
Break it
Fix it
Open a PR

Ideas like basic aggregations, better indexing, or smarter pruning are all fair game.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
cmd/db		cmd/db
internal		internal
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark.sh		benchmark.sh
benchmark_test.go		benchmark_test.go
go.mod		go.mod
go.sum		go.sum
stress.sh		stress.sh
stress_test.go		stress_test.go
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BacktraceDB

What This Is

Design Overview

Hybrid Storage Model

Write-Ahead Logging (WAL)

Columnar Layout

Block Pruning via Metadata

Example Usage

Opening a Database and Table

Querying Data

Recovery Model

Code Layout

Testing

Why This Exists

Contributing

About

Uh oh!

Releases

Packages

Languages

License

arjunprakash027/backtraceDB

Folders and files

Latest commit

History

Repository files navigation

BacktraceDB

What This Is

Design Overview

Hybrid Storage Model

Write-Ahead Logging (WAL)

Columnar Layout

Block Pruning via Metadata

Example Usage

Opening a Database and Table

Querying Data

Recovery Model

Code Layout

Testing

Why This Exists

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages