This project implements a simplified database engine with support for the Iceberg table format. It includes a catalog, manifest files, and file-based storage.
To interact with the database engine directly, you can explore the interactive CLI or run the test suite.
cargo runAvailable Commands:
populate: Creates sample "Students" and "Courses" tables with data.scan <table_id>: Prints all rows in a table (e.g.,scan 0).create_table <name> <type1> ...: Creates a new table.insert <table_id> <val1> ...: Inserts a row.help: Lists all commands.
To run the unit tests:
cargo testsrc/engine: Core database engine logic, including operators and optimizer.src/iceberg: Implementation of Iceberg metadata (Catalog, Manifest, TableMetadata).src/storage: File-based storage handling.src/value_cmp.rs: Value comparison logic.
- Rust (latest stable version)
- Cargo
The database uses a custom binary format for storing table chunks. Files are stored with a .bin extension and follow a columnar layout.
The header contains metadata about the file's content. Here's how the data is serialized:
- Magic Bytes (8 bytes)
- Row Count (8 bytes): Number of rows in the chunk (column-based).
- Column Count (8 bytes): Number of columns in the file.
- Column Info (per column):
- Type ID (8 bytes)
- Start Index (8 bytes): Byte offset where the column data begins in file.
- Data is stored column by column for efficient aggregation (Columnar Storage).
- Supports metadata through manifests and catalogs (similar to Apache Iceberg).
- Data is stored in chunks with column-based statistics (min/max) for fast pruning.
- Basic support for optimistic concurrency updates.