Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0abb344
feat: generate test substrait plans & basic parse
Dec 5, 2025
c25fdec
Documentation: attempt to complete ticket #37. first draft
Dec 22, 2025
d8317f3
Merge pull request #36 from Rich-T-kid/feature/Parse-SS
MarcoFerreiraPerson Dec 22, 2025
edeb870
feat: laid recusive ground work for parsing json plans into golang st…
Dec 24, 2025
62c36c8
feat: Implement & test Column Resolve
Dec 25, 2025
a549495
Implement Literal Resolve & Scalar Functions
Dec 25, 2025
1d9d8eb
feat:Implement Binary expression & Implement all alias parsing
Dec 26, 2025
e05fa7e
feat:Project operator can be constructed from purely IR format
Dec 27, 2025
4910e16
feat: implement Filter IR parse
Dec 27, 2025
5905cfa
feat: Distinct,Limit,Having
Dec 27, 2025
fffdd78
feat: Implement & unit test all operators from IR format
Dec 28, 2025
5c7585f
Documenation: Intergration test mostly complete. Working on reading f…
Dec 28, 2025
a2f17cc
feat: 95% there, just need to complete join fixeses and this will be …
Dec 29, 2025
28d0cae
feat: Finished parsing from IR format. Next steps are to accept plans…
Dec 30, 2025
fbf7564
include test files
Dec 30, 2025
a734938
feat: grpc server works end to end, can consume plan, emit batches, t…
Dec 30, 2025
70dc63e
feat: version 1.0.0 of backend
Dec 30, 2025
9b91355
Feat: docker images uploaded
Jan 21, 2026
dfdca8a
hot-fix
Jan 29, 2026
bf0bf49
fix: parquet readings fixed
Feb 1, 2026
9877a82
closes #27
Feb 1, 2026
8899c6f
fix: removed run-time panics, replaced print statments with logs
Feb 4, 2026
e2f8442
fix: alias issues(Project-alias, single_aggr,group-by)
Feb 5, 2026
2e92fb9
Merge pull request #48 from Rich-T-kid/fix-crash-state
Rich-T-kid Feb 5, 2026
3290029
Feat: Implement garbage collection for s3
Feb 8, 2026
1d5a987
Merge pull request #49 from Rich-T-kid/garbage-collection-thread
Rich-T-kid Feb 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ src/Backend/test_data/json
!src/Backend/test_data/csv/Mental_Health_and_Social_Media_Balance_Dataset.csv
!src/Backend/test_data/csv/intergration_test_data_1.csv
!src/Backend/test_data/csv/intergration_test_data_2.csv
!src/Backend/test_data/substrait_plans/**
# allow parquet file
!src/Backend/test_data/parquet/
!src/Backend/test_data/parquet/capitals_clean.parquet
49 changes: 47 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,21 @@ A high-performance, in-memory query execution engine.
![Rust Tests](https://github.com/Rich-T-kid/OptiSQL/actions/workflows/rust-test.yml/badge.svg)
![Frontend Tests](https://github.com/Rich-T-kid/OptiSQL/actions/workflows/frontend-test.yml/badge.svg)


## Overview

OptiSQL is a custom in-memory query execution engine. The backend (physical execution) is built using golang and rust.The front end (query parsing & optimization) is built using C++.

**Technologies:**

- Go/Rust (physical optimizer, operators)
- Substrait (logical/physical plan representation)
- C++ (query parser & optimizer)
- ect (make,git,s3)

## Getting Started

### Prerequisites

- Go 1.24+
- Rust 1.70+
- C++23
Expand Down Expand Up @@ -83,6 +85,7 @@ OptiSQL/
Initial development is done in **Go** (`opti-sql-go`), which serves as the primary implementation. The **Rust** version (`opti-sql-rs`) is developed shortly after as a learning exercise and eventual performance-optimized alternative, closely mirroring the Go implementation.

**Key Directories:**

- `/operators` - SQL operator implementations (filter, join, aggregation, project)
- `/physical-optimizer` - Query plan parsing and optimization
- `/substrait` - Substrait plan integration
Expand All @@ -102,6 +105,7 @@ We use a structured branching model to maintain stability and enable smooth coll
This approach prevents unstable code from reaching `main`, simplifies rollbacks, and ensures all changes undergo proper testing and review before deployment. Feature branches isolate work, allowing focused reviews and parallel development without conflicts. The `pre-release` branch acts as a staging area where features are bundled together before being released as a new version.

**Workflow:**

1. Create a feature branch from `pre-release`
2. Implement your changes with tests
3. Open a PR to merge into `pre-release`
Expand All @@ -112,6 +116,7 @@ This approach prevents unstable code from reaching `main`, simplifies rollbacks,
### Code Quality

All code quality checks are automated and enforced by CI:

- **Linting** - `golangci-lint` (Go), `clippy` (Rust)
- **Formatting** - `go fmt` (Go), `cargo fmt` (Rust)
- **Testing** - Unit tests required for all new code
Expand All @@ -133,15 +138,55 @@ Before pushing, verify your changes pass all checks:
make pre-push
```

## How to build

```bash
docker buildx build \
--platform linux/amd64 \
-t rich239/execution-engine:0.9.4 \
-t rich239/execution-engine:latest \
--push \
.

```

## How to run

```bash
docker pull rich239/execution-engine
docker run -p 7024:7024 rich239/execution-engine
```

## Example GRPC body

```bash
{
"id": "97b61a8f-ffe1-4e4a-b6d7-73619698dc7a",
"sql_statement": "select * from table1 where id > 10",
"logical_plan": "ewogICAgIkVtaXQiOiAKICAgIHsKICAgICAgICAiT3BlcmF0b3IiOiAiRmlsdGVyIiwKICAgICAgICAiRmlsdGVyIjogCiAgICAgICAgewogICAgICAgICAgICAiaW5wdXQiOiAKICAgICAgICAgICAgewogICAgICAgICAgICAgICAgIk9wZXJhdG9yIjogIlNvdXJjZSIsCiAgICAgICAgICAgICAgICAiU291cmNlIjogCiAgICAgICAgICAgICAgICB7CiAgICAgICAgICAgICAgICAgICAgImZpbGUtbmFtZSI6ICJ1c2VyX3Rlc3RfZGF0YS5jc3YiLAogICAgICAgICAgICAgICAgICAgICJsb2NhbCI6IGZhbHNlCiAgICAgICAgICAgICAgICB9CiAgICAgICAgICAgIH0sCiAgICAgICAgICAgICJleHByZXNzaW9uIjogCiAgICAgICAgICAgIHsKICAgICAgICAgICAgICAgICJleHByX3R5cGUiOiAiQmluYXJ5RXhwciIsCiAgICAgICAgICAgICAgICAib3AiOiAiR3JlYXRlclRoYW4iLAogICAgICAgICAgICAgICAgImxlZnQiOiAKICAgICAgICAgICAgICAgIHsKICAgICAgICAgICAgICAgICAgICAiZXhwcl90eXBlIjogIkNvbHVtblJlc29sdmUiLAogICAgICAgICAgICAgICAgICAgICJuYW1lIjogImFnZV95ZWFycyIKICAgICAgICAgICAgICAgIH0sCiAgICAgICAgICAgICAgICAicmlnaHQiOiAKICAgICAgICAgICAgICAgIHsKICAgICAgICAgICAgICAgICAgICAiZXhwcl90eXBlIjogIkxpdGVyYWxSZXNvbHZlIiwKICAgICAgICAgICAgICAgICAgICAidmFsdWUiOiAxMCwKICAgICAgICAgICAgICAgICAgICAibGl0X3R5cGUiOiAiaW50IgogICAgICAgICAgICAgICAgfQogICAgICAgICAgICB9CiAgICAgICAgfQogICAgfQp9"
}
```

This runs formatting, linting, and all tests.

## Contributing

Want to contribute? Check out [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines on:

- Writing and running tests
- PR format and commit message conventions
- Development workflow and tooling
- Build and run instructions

## License
This project is licensed under the terms specified in [LICENSE.txt](LICENSE.txt).

This project is licensed under the terms specified in [LICENSE.txt](LICENSE.txt).

docker buildx build \
--platform linux/amd64 \
-t rich239/execution-engine:0.9.5 \ ## bump major/minor
-t rich239/execution-engine:latest \
--push \
.
Comment on lines +185 to +190
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker buildx build command is duplicated outside of a fenced code block at the end of the README, which makes the Markdown render oddly and looks accidental. Consider removing it or wrapping it in a proper code block.

Copilot uses AI. Check for mistakes.

# TODO: remove env stuff
130 changes: 102 additions & 28 deletions src/Backend/opti-sql-go/Expr/expr.go
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import (
"context"
"errors"
"fmt"
"opti-sql-go/config"
"opti-sql-go/operators"
"regexp"
"strings"
Expand All @@ -13,6 +14,7 @@ import (
"github.com/apache/arrow/go/v17/arrow/array"
"github.com/apache/arrow/go/v17/arrow/compute"
"github.com/apache/arrow/go/v17/arrow/memory"
"go.uber.org/zap"
)

var (
Expand All @@ -24,35 +26,35 @@ var (
}
)

type binaryOperator int
type BinaryOperator int

const (
// arithmetic
Addition binaryOperator = 1
Subtraction binaryOperator = 2
Multiplication binaryOperator = 3
Division binaryOperator = 4
Addition BinaryOperator = 1
Subtraction BinaryOperator = 2
Multiplication BinaryOperator = 3
Division BinaryOperator = 4
// comparison
Equal binaryOperator = 6
NotEqual binaryOperator = 7
LessThan binaryOperator = 8
LessThanOrEqual binaryOperator = 9
GreaterThan binaryOperator = 10
GreaterThanOrEqual binaryOperator = 11
Equal BinaryOperator = 6
NotEqual BinaryOperator = 7
LessThan BinaryOperator = 8
LessThanOrEqual BinaryOperator = 9
GreaterThan BinaryOperator = 10
GreaterThanOrEqual BinaryOperator = 11
// logical
And binaryOperator = 12
Or binaryOperator = 13
And BinaryOperator = 12
Or BinaryOperator = 13
// RegEx expressions
Like binaryOperator = 14 // where column_name like "patte%n_with_wi%dcard_"
Like BinaryOperator = 14 // where column_name like "patte%n_with_wi%dcard_"
)

type supportedFunctions int
type SupportedFunctions int

const (
Upper supportedFunctions = 1
Lower supportedFunctions = 2
Abs supportedFunctions = 3
Round supportedFunctions = 4
Upper SupportedFunctions = 1
Lower SupportedFunctions = 2
Abs SupportedFunctions = 3
Round SupportedFunctions = 4
)

type aggFunctions = int
Expand Down Expand Up @@ -91,6 +93,19 @@ type Expression interface {
fmt.Stringer
}

// To_aggr_name extracts the column name from an expression for use in aggregation schema building.
// Returns the alias name if present, otherwise the column name.
func To_aggr_name(expr Expression) string {
switch e := expr.(type) {
case *ColumnResolve:
return e.Name
case *Alias:
return e.Name
default:
return expr.String()
}
}

func EvalExpression(expr Expression, batch *operators.RecordBatch) (arrow.Array, error) {
switch e := expr.(type) {
case *Alias:
Expand Down Expand Up @@ -199,7 +214,7 @@ func EvalColumn(c *ColumnResolve, batch *operators.RecordBatch) (arrow.Array, er
for i, f := range batch.Schema.Fields() {
if f.Name == c.Name {
col := batch.Columns[i]
col.Retain()
//col.Retain()
return col, nil
}
}
Expand All @@ -213,8 +228,7 @@ func (c *ColumnResolve) String() string {
// Evaluates to a column of length = batch-size, filled with this literal.
// sql: select 1
type LiteralResolve struct {
Type arrow.DataType
// dont forget to cast the value. so string("hello") not just "hello"
Type arrow.DataType
Value any
}

Expand Down Expand Up @@ -425,11 +439,11 @@ func (l *LiteralResolve) String() string {

type BinaryExpr struct {
Left Expression
Op binaryOperator
Op BinaryOperator
Right Expression
}

func NewBinaryExpr(left Expression, op binaryOperator, right Expression) *BinaryExpr {
func NewBinaryExpr(left Expression, op BinaryOperator, right Expression) *BinaryExpr {
return &BinaryExpr{
Left: left,
Op: op,
Expand All @@ -438,6 +452,7 @@ func NewBinaryExpr(left Expression, op binaryOperator, right Expression) *Binary
}

func EvalBinary(b *BinaryExpr, batch *operators.RecordBatch) (arrow.Array, error) {
logger := config.GetLogger()
leftArr, err := EvalExpression(b.Left, batch)
if err != nil {
return nil, err
Expand All @@ -446,6 +461,11 @@ func EvalBinary(b *BinaryExpr, batch *operators.RecordBatch) (arrow.Array, error
if err != nil {
return nil, err
}
logger.Debug("Evaluating binary expression",
zap.String("operator", fmt.Sprintf("%v", b.Op)),
zap.Int("left_len", leftArr.Len()),
zap.Int("right_len", rightArr.Len()),
)
ctx := context.Background()
opt := compute.ArithmeticOptions{}
switch b.Op {
Expand Down Expand Up @@ -578,11 +598,11 @@ func unpackDatum(d compute.Datum) (arrow.Array, error) {
}

type ScalarFunction struct {
Function supportedFunctions
Function SupportedFunctions
Arguments Expression // resolve to something you can process IE, literal/coloumn Resolve
}

func NewScalarFunction(function supportedFunctions, Argument Expression) *ScalarFunction {
func NewScalarFunction(function SupportedFunctions, Argument Expression) *ScalarFunction {
return &ScalarFunction{
Function: function,
Arguments: Argument,
Expand Down Expand Up @@ -736,7 +756,7 @@ func lowerImpl(arr arrow.Array) (arrow.Array, error) {
return b.NewArray(), nil
}
}
func inferScalarFunctionType(fn supportedFunctions, argType arrow.DataType) arrow.DataType {
func inferScalarFunctionType(fn SupportedFunctions, argType arrow.DataType) arrow.DataType {
switch fn {

case Upper, Lower:
Expand All @@ -753,7 +773,7 @@ func inferScalarFunctionType(fn supportedFunctions, argType arrow.DataType) arro
}
}

func inferBinaryType(left arrow.DataType, op binaryOperator, right arrow.DataType) arrow.DataType {
func inferBinaryType(left arrow.DataType, op BinaryOperator, right arrow.DataType) arrow.DataType {
switch op {

case Addition, Subtraction, Multiplication, Division:
Expand Down Expand Up @@ -816,3 +836,57 @@ func validRegEx(columnValue, regExExpr string) bool {
return ok

}
func FnToScalarFunction(s string) SupportedFunctions {
switch s {
case "Upper":
return 1
case "Lower":
return 2
case "Abs":
return 3
case "Round":
return 4
}
return 1
Comment on lines +839 to +850
Copy link

Copilot AI Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FnToScalarFunction returns Upper as a default for unknown function names. For a public helper, silently mapping unknown values can hide bugs. Prefer returning a sentinel/second return value indicating failure (or panic) so callers can validate inputs.

Suggested change
func FnToScalarFunction(s string) SupportedFunctions {
switch s {
case "Upper":
return 1
case "Lower":
return 2
case "Abs":
return 3
case "Round":
return 4
}
return 1
func FnToScalarFunction(s string) (SupportedFunctions, error) {
switch s {
case "Upper":
return 1, nil
case "Lower":
return 2, nil
case "Abs":
return 3, nil
case "Round":
return 4, nil
default:
return 0, fmt.Errorf("unsupported scalar function: %s", s)
}

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Feb 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FnToScalarFunction returns Upper (1) for unknown function names. That can silently turn invalid input into a different operation. Prefer returning a sentinel (e.g., SupportedFunctions(-1)) and/or a (SupportedFunctions, bool) result so callers can handle unknown names explicitly.

Suggested change
return 1
return SupportedFunctions(-1)

Copilot uses AI. Check for mistakes.
}

// matchesBinaryOperator returns true if `name` matches the binaryOperator constant
// represented by `opInt`, using ONLY the exact names in your const block.
func MatchesBinaryOperator(name string, opInt int) bool {
want := BinaryOperator(opInt)

switch name {
case "Addition":
return want == Addition
case "Subtraction":
return want == Subtraction
case "Multiplication":
return want == Multiplication
case "Division":
return want == Division

case "Equal":
return want == Equal
case "NotEqual":
return want == NotEqual
case "LessThan":
return want == LessThan
case "LessThanOrEqual":
return want == LessThanOrEqual
case "GreaterThan":
return want == GreaterThan
case "GreaterThanOrEqual":
return want == GreaterThanOrEqual

case "And":
return want == And
case "Or":
return want == Or

case "Like":
return want == Like

default:
return false
}
}
2 changes: 1 addition & 1 deletion src/Backend/opti-sql-go/Expr/expr_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1117,7 +1117,7 @@ func TestInferScalarFunctionType(t *testing.T) {
t.Fatalf("expected panic for unknown function, got none")
}
}()
_ = inferScalarFunctionType(supportedFunctions(9999), arrow.PrimitiveTypes.Int32)
_ = inferScalarFunctionType(SupportedFunctions(9999), arrow.PrimitiveTypes.Int32)
})
}

Expand Down
Loading