Skip to content

R7L208/tpch-query-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MCQE-DSL: A Mini Columnar Query Engine with a DSL

A single-process columnar query engine that reads TPC-H benchmark tables from CSV files, stores each column as a contiguous std::vector, and executes queries written in a line-oriented DSL. The DSL maps directly to a fixed operator pipeline:

SCAN → JOIN → FILTER → COMPUTE → GROUPBY → SELECT → ORDERBY → LIMIT

The engine performs data analysis including multi-table joins, computed columns with arithmetic expressions, group-by aggregates, and ranked outputs.

Features

  • All 22 TPC-H benchmark queries implemented
  • Columnar storage for cache-efficient data access
  • Custom DSL for query specification
  • Integrated test harness with expected output validation
  • CSV and console output formats

Building

make

This produces the executable tpch.

Usage

./tpch                    # Show usage and query list
./tpch unit-test          # Run unit tests only
./tpch queries-test       # Run all TPC-H query tests (compare against expected output)
./tpch all-test           # Run full test suite (unit + queries)
./tpch <1-22>             # Run a specific TPC-H query test
./tpch run <1-22>         # Execute a query and output results to console
./tpch run-all            # Execute all queries with console output

Requirements

  • TPC-H data files must be in the data/ directory with corresponding .schema files
  • DSL query files must be in queries/dsl/ directory

Project Structure

tpch-query-engine/
├── core/                    # Query engine implementation
│   ├── engine/             # Pipeline, dispatcher, test runner
│   ├── model/              # Table, Column, Catalog, QuerySpec
│   ├── operators/          # Scan, Join, Filter, Project, etc.
│   └── types/              # TypeTag, Date
├── io/                      # I/O operations
│   ├── readers/            # CSV, DSL, Schema readers
│   ├── writers/            # CSV, Console, Text writers
│   └── converters/         # Type parsing, row/column conversions
├── data/                    # TPC-H benchmark data files
├── queries/
│   ├── dsl/               # Query definition files
│   └── test_data/         # Expected query results
└── Makefile

Architecture

Core Classes

Table (core/model/Table.h)

  • Encapsulates a named collection of typed columns
  • Owns all column pointers and manages their lifecycle

Column<T> (core/model/Column.hpp)

  • Template class for typed column storage
  • Implements IColumn interface for polymorphic access
  • Types: INT64, DOUBLE, STRING, DATE

Operators (core/operators/)

  • IOperator interface with execute() and describe() methods
  • AUnaryOp base class: Filter, Project, Compute, GroupBy, OrderBy, Limit
  • AMultiOp base class: Join
  • Strategy pattern for Scan (FileScanStrategy, TableScanStrategy)

DSL Format

Queries are specified in .dsl files with one clause per line:

FROM lineitem
FROM orders
JOIN lineitem.l_orderkey = orders.o_orderkey
FILTER l_shipdate >= 1994-01-01
FILTER l_shipdate < 1995-01-01
COMPUTE revenue = l_extendedprice * (1 - l_discount)
GROUPBY l_returnflag, l_linestatus
SELECT l_returnflag, l_linestatus, SUM(revenue) AS sum_revenue
ORDERBY l_returnflag ASC, l_linestatus ASC
LIMIT 100

Extending the Code

  • New operator: Create a class inheriting from AUnaryOp or AMultiOp, implement execute() and describe()
  • New filter predicate: Add to PredicateType enum and implement in Filter::evaluatePredicate()
  • New aggregate function: Add to AggregateType enum and implement in GroupBy::execute()
  • New file reader: Create a class inheriting from AFileReader, implement read()
  • New file writer: Create a class inheriting from AFileWriter, implement write()

Future Enhancements

  • LEFT/RIGHT/OUTER join support

    • Add JoinType enum to JoinCondition class
    • Modify Join::execute() to emit NULL values for non-matching rows
  • Window functions for ranking queries

    • Create WindowSpec class with partition/order clauses
    • Implement Window operator inheriting from AUnaryOp

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Lorin D. Dawson

About

A C++ columnar query engine implementing all 22 TPC-H benchmark queries using a custom DSL

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published