Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
a588008
feat: Add DataFusion optimizer for INTERSECTS join algorithm selection
conradbzura Mar 26, 2026
86f7545
fix: Handle StringViewArray in interval join exec plans
conradbzura Mar 26, 2026
9aaf36e
test: Add unit and integration tests for INTERSECTS optimizer
conradbzura Mar 26, 2026
42e9054
fix: Correct sweep-line active set, partition collection, and filter …
conradbzura Mar 26, 2026
fc4b43b
feat: Rewrite binned join to use DataFusion parallel HashJoinExec
conradbzura Mar 26, 2026
66b514e
build: Add bench_intersects binary for optimizer benchmarking
conradbzura Mar 26, 2026
29f172e
perf: Parallelize sweep-line join by chromosome with vectorized output
conradbzura Mar 26, 2026
ca15a98
feat: Select build side by row count and declare sort requirements
conradbzura Mar 26, 2026
531902c
refactor: Rewrite sweep-line as streaming build/probe state machine
conradbzura Mar 26, 2026
c0032a4
refactor: Fold canonical-bin dedup into HashJoinExec filter
conradbzura Mar 26, 2026
98b740f
build: Add --force-binned flag to bench_intersects binary
conradbzura Mar 26, 2026
6bf6bc5
build: Upgrade DataFusion from v47 to v53
conradbzura Mar 26, 2026
4f8a991
build: Add --sql-binned flag to bench_intersects binary
conradbzura Mar 26, 2026
95004c8
refactor: Defer binned strategy to DataFusion default join
conradbzura Mar 27, 2026
4d830b0
feat: Add experimental logical optimizer rule for binned joins
conradbzura Mar 27, 2026
5161ef2
fix: Resolve logical rule schema bug and enable by default
conradbzura Mar 27, 2026
a195bdd
feat: Adaptive bin sizing from Parquet metadata in logical rule
conradbzura Mar 27, 2026
b4c6eca
perf: Remove redundant DISTINCT from binned join rewrite
conradbzura Mar 27, 2026
1241853
test: Add 27 tests for the logical optimizer rule
conradbzura Mar 27, 2026
e6702a5
fix: Address PR review — schema-based join side detection, remove deb…
conradbzura Mar 27, 2026
4e9965c
feat: Add datafusion dialect to GIQL transpiler
conradbzura Mar 27, 2026
3418e0f
refactor!: Replace heuristic overlap detection with giql_intersects UDF
conradbzura Mar 27, 2026
4426aea
fix: Improve bin size heuristic and harden logical rule
conradbzura Mar 28, 2026
aced50c
build: Add gitignore for Cargo.lock and target directory
conradbzura Mar 28, 2026
abdc201
test: Add dialect, self-join, and compound predicate tests
conradbzura Mar 28, 2026
85d4063
feat: Add lightweight Parquet sampling for bin size selection
conradbzura Mar 28, 2026
db0fbf5
fix: Address review — harden sampling, assert skip tests, clean imports
conradbzura Mar 29, 2026
09fe087
feat: Add COI tree interval join as default INTERSECTS strategy
conradbzura Mar 30, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,10 @@
# macOS
.DS_store

# Rust
target/
Cargo.lock

# Python
__pycache__/
*.py[cod]
Expand Down
2 changes: 2 additions & 0 deletions crates/giql-datafusion/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
/target/
Cargo.lock
19 changes: 19 additions & 0 deletions crates/giql-datafusion/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[package]
name = "giql-datafusion"
version = "0.1.0"
edition = "2021"
description = "DataFusion optimizer for genomic interval (INTERSECTS) joins"
license = "MIT"

[dependencies]
arrow = { version = "58", default-features = false, features = ["prettyprint"] }
async-trait = "0.1.89"
coitrees = "0.4.0"
datafusion = "53"
futures = "0.3.32"
log = "0.4"
parquet = "58"

[dev-dependencies]
tempfile = "3"
tokio = { version = "1", features = ["rt-multi-thread", "macros"] }
Loading
Loading