A single-process columnar query engine that reads TPC-H benchmark tables from CSV files, stores each column as a contiguous std::vector, and executes queries written in a line-oriented DSL. The DSL maps directly to a fixed operator pipeline:
SCAN → JOIN → FILTER → COMPUTE → GROUPBY → SELECT → ORDERBY → LIMIT
The engine performs data analysis including multi-table joins, computed columns with arithmetic expressions, group-by aggregates, and ranked outputs.
- All 22 TPC-H benchmark queries implemented
- Columnar storage for cache-efficient data access
- Custom DSL for query specification
- Integrated test harness with expected output validation
- CSV and console output formats
makeThis produces the executable tpch.
./tpch # Show usage and query list
./tpch unit-test # Run unit tests only
./tpch queries-test # Run all TPC-H query tests (compare against expected output)
./tpch all-test # Run full test suite (unit + queries)
./tpch <1-22> # Run a specific TPC-H query test
./tpch run <1-22> # Execute a query and output results to console
./tpch run-all # Execute all queries with console output- TPC-H data files must be in the
data/directory with corresponding.schemafiles - DSL query files must be in
queries/dsl/directory
tpch-query-engine/
├── core/ # Query engine implementation
│ ├── engine/ # Pipeline, dispatcher, test runner
│ ├── model/ # Table, Column, Catalog, QuerySpec
│ ├── operators/ # Scan, Join, Filter, Project, etc.
│ └── types/ # TypeTag, Date
├── io/ # I/O operations
│ ├── readers/ # CSV, DSL, Schema readers
│ ├── writers/ # CSV, Console, Text writers
│ └── converters/ # Type parsing, row/column conversions
├── data/ # TPC-H benchmark data files
├── queries/
│ ├── dsl/ # Query definition files
│ └── test_data/ # Expected query results
└── Makefile
Table (core/model/Table.h)
- Encapsulates a named collection of typed columns
- Owns all column pointers and manages their lifecycle
Column<T> (core/model/Column.hpp)
- Template class for typed column storage
- Implements
IColumninterface for polymorphic access - Types:
INT64,DOUBLE,STRING,DATE
Operators (core/operators/)
IOperatorinterface withexecute()anddescribe()methodsAUnaryOpbase class: Filter, Project, Compute, GroupBy, OrderBy, LimitAMultiOpbase class: Join- Strategy pattern for Scan (FileScanStrategy, TableScanStrategy)
Queries are specified in .dsl files with one clause per line:
FROM lineitem
FROM orders
JOIN lineitem.l_orderkey = orders.o_orderkey
FILTER l_shipdate >= 1994-01-01
FILTER l_shipdate < 1995-01-01
COMPUTE revenue = l_extendedprice * (1 - l_discount)
GROUPBY l_returnflag, l_linestatus
SELECT l_returnflag, l_linestatus, SUM(revenue) AS sum_revenue
ORDERBY l_returnflag ASC, l_linestatus ASC
LIMIT 100
- New operator: Create a class inheriting from
AUnaryOporAMultiOp, implementexecute()anddescribe() - New filter predicate: Add to
PredicateTypeenum and implement inFilter::evaluatePredicate() - New aggregate function: Add to
AggregateTypeenum and implement inGroupBy::execute() - New file reader: Create a class inheriting from
AFileReader, implementread() - New file writer: Create a class inheriting from
AFileWriter, implementwrite()
-
LEFT/RIGHT/OUTER join support
- Add
JoinTypeenum toJoinConditionclass - Modify
Join::execute()to emit NULL values for non-matching rows
- Add
-
Window functions for ranking queries
- Create
WindowSpecclass with partition/order clauses - Implement
Windowoperator inheriting fromAUnaryOp
- Create
This project is licensed under the MIT License - see the LICENSE file for details.
Lorin D. Dawson