Variant extract pushdown #150

Tishj · 2025-12-16T10:23:17Z

No description provided.

Three minor fixes: * one test code that was wrong * one a detail of Window's interator interface that was off * one just cleaning up the code a bit, with clearer (to me) iteration

* Rename the window self-join optimizer files to match their contents

* Add support for less than 2 or more, equal to 1 filter conditions.

* Replace some switch statements with state machine operations * Fix the max threads to use the maximum number of tasks. * Rename variable with legacy name

Follow-up on duckdb#19906 This PR allows eagerly executing ungrouped min/max aggregates with Parquet row group statistics analogous to the DuckDB file format.

…are thrown)

When `duckdb.exe` shell is used in default Windows terminal and `odbc_scanner` extension is used to connect to Oracle DB - the unicode output gets broken in console (for all subsequent queries), example: ```sql SELECT 'Здравейте' AS hello; ``` ``` UÄÄÄÄÄÄÄÄÄÄÄ¿ 3 hello 3 3 varchar 3 AÄÄÄÄÄÄÄÄÄÄÄ' 3 ????????? 3 AÄÄÄÄÄÄÄÄÄÄÄU ``` Expected: ``` ┌───────────┐ │ hello │ │ varchar │ ├───────────┤ │ Здравейте │ └───────────┘ ``` The problem is originally reported in duckdb/odbc-scanner#86 . It appeared that, when Oracle ODBC driver is loaded it changes the system locale, as returned by `setlocale(LC_ALL, NULL)`, from `C` to: ``` LC_COLLATE=C;LC_CTYPE=English_United States.1252;LC_MONETARY=C;LC_NUMERIC=C;LC_TIME=C ``` The original idea was, in `odbc_scanner`, to save the locale value before loading new ODBC drivers and restore the locale after the `odbc_connect` call returns. But it appeared that `setlocale` on Windows is not process-wide, but CRT (MSVC C runtime library) -wide ([ref](https://learn.microsoft.com/en-us/cpp/c-runtime-library/global-state?view=msvc-170)). And because after duckdb/odbc-scanner#87 the `odbc_scanner` uses its own copy of C runtime lib - it cannot access/change the locale set by Oracle driver. For the same reason `main` branch builds of DuckDB are not affected, but the problem is present in `v1.4-andium`. While the problem happens only for a minor number of users and is largely non-blocking (only Unicode data is broken, ASCII data is displayed correctly), it breaks the UX for Oracle users. And Oracle (along with MSSQL) is the main target for `odbc_scanner`, thus the intention is to fix this for `v1.4-andium`. It was found that changing the translation mode for stdout from `_O_TEXT` to `_O_BINARY` ([ref](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170)) can be used as a workaround: ```c++ _setmode(_fileno(stdout), _O_BINARY) ``` But this call also appeared to be CRT-wide, so cannot be applied selectively from `odbc_scanner`. It was also observed, that historically on `v1.4-andium` the `fputs` call is used to print unicode to console. Incoming UTF-8 text is first converted to UTF-16 (with `MultiByteToWideChar`) and then converted back to UTF-8 (perhaps in some cases it can be different multibyte encoding here, not UTF-8) with `WideCharToMultiByte` before passing it to `fputs` . While on `main` this was changed to use `WriteConsoleW` passing it UTF-16 directly. And it appeared that `WriteConsoleW` is not affected by this problem. It is understood that recent shell enhancements in `main` are not intended for `v1.4-andium`, so this PR makes the minimal backport only changing the part of `utf8_printf()` call to use `WriteConsoleW` instead of the `fputs`. Testing: with manual smoke checks I cannot see any differences in console output for ordinary queries. Though I have limited experience with the DuckDB shell (mostly use other clients) so can miss some use-cases. Fixes: duckdb/odbc-scanner#86

The pointer was used incorrectly, but it was checked against NULL. Fixes: d4f7b54 ("add support for opening duckdb filesystem from c-api")

…nish reading to throw, stop reading the current chunk instead of waiting in a busy loop

…r if the rows are not present in the index (duckdb#20430) This PR cleans up the RowGroup scan code - in particular `ScanCommitted`. This method was originally intended to scan only committed rows, but had a bunch of options bolted onto it (e.g. scanning all rows including any deleted rows, only including deletes that are no longer referenced by any transactions). This also lead to a bunch of code duplication and added complexity. This PR refactors this and cleans up these methods, also removing a bunch of unnecessary / unused methods. These scans can now be performed by passing in `ScanOptions` that determines how the data should be scanned. This was all just yak shaving when trying to fix a bug uncovered by making `BoundIndex::Delete` throw an error when attempting to delete entries from an index that were not present in the index. This PR also introduces that and fixes two issues uncovered by that change.

…et throw (duckdb#20434) We cannot always immediately throw errors in the JSON reader as we might need to wait for previous reads to finish to (1) ensure we throw the first error in the file, and (2) know the exact line number where the error occurs. However, in the current implementation, when an error is found that we cannot throw we keep on looping and re-processing the error. This PR instead breaks out of the loop. When the reader of the previous chunk is finished it will then actually throw the error.

* Add the RHS bindings when we are doing SEMI or ANTI ASOF joins with a predicate.

* Disallow using arbitrary predicates in AsOf with RIGHT/FULL/SEMI joins.

* Convert the semi-join to an inner join and import the count directly

* Disallow using arbitrary predicates in AsOf with ANTI joins.

* Convert the semi-join to an inner join and import the count directly

* Disallow using arbitrary predicates in AsOf with RIGHT/FULL/SEMI joins.

* Remove the predicate test and relocation (join predicate push-down will take care of it) * Update test plans and add correctness tests for new cases.

* Enforce ordering in test

Follow-up to duckdb#20348. Related issue: duckdblabs/duckdb-internal#7002

* Remove the predicate test and relocation (join predicate push-down will take care of it) * Update test plans and add correctness tests for new cases.

Bumped while building duckdb-wasm, I would expect other clients or packagers of duckdb might also hit this, and fix is simple.

lasanaka-jumptrading and others added 30 commits December 27, 2025 14:59

Trigger CI

079bcc8

add coordinate reference system to geometry type

4d36b1b

wip

da33ce1

add cast support

55e2b03

dont always instantiate geo type info

5149e7f

add support for CRS in parquet

876b5ca

fix test

ac6431e

format, fix tests, feedback

629283b

Fix includes

874c310

fix relative include

9198ab2

yes plase

ae1a3c7

Fixes raised by cppcheck (duckdb#20323)

546f357

Three minor fixes: * one test code that was wrong * one a detail of Window's interator interface that was off * one just cleaning up the code a bit, with clearer (to me) iteration

Fixup QueryProfiler::GetBytesRead and Written (duckdb#20318)

c3d4821

Internal duckdb#6974: Window Self-Join Files (duckdb#20317)

8bb20fc

* Rename the window self-join optimizer files to match their contents

Internal duckdb#6999: Window TopN Comparisons (duckdb#20316)

0f1edb7

* Add support for less than 2 or more, equal to 1 filter conditions.

Internal duckdb#6943: IEJoin Code Cleanup (duckdb#20315)

748cdd5

* Replace some switch statements with state machine operations * Fix the max threads to use the maximum number of tasks. * Rename variable with legacy name

Use eager min/max aggregation on parquet statistics (duckdb#20301)

d62ae80

Follow-up on duckdb#19906 This PR allows eagerly executing ungrouped min/max aggregates with Parquet row group statistics analogous to the DuckDB file format.

fix some compiler warnings on apple clang 17

138ef03

merge

fd384e6

Merge branch 'main' into hjiang/parquet-metadata-memory-control

127112a

Improve LRU cache: unify entry deletion

476fc3e

Skip destructor logic when !initialized (for when various exceptions …

a8c421b

…are thrown)

Merge branch 'duckdb:main' into try_expr_20006

6a332d3

defensive programming

20b2bd8

Increase reserved size for paths in SetPathsInternal

0d54200

c-api: adding out_file against NULL check (duckdb#20303)

b91d518

The pointer was used incorrectly, but it was checked against NULL. Fixes: d4f7b54 ("add support for opening duckdb filesystem from c-api")

set vector type appropriately

d657283

Fix formatting

5e9493a

Merge branch 'duckdb:main' into try_expr_20006

3477c30

Mytherin and others added 29 commits January 7, 2026 15:42

Clean up CollectionScanState::ScanCommitted

9950ef2

Fixup quack test

ad6c4e3

Rename TScanType to ScanOptions

ae4b979

Move TransactionData into ScanOptions

6382c19

Rework CommittedDeleteOperator to also include transaction-local deletes

3d3e9d9

Remove OMIT_COMMITTED_DELETES

8654711

Rename OMIT_FULLY_COMMITTED_DELETES to OMIT_COMMITTED_DELETES

ec4ecdf

Make BoundIndex::Delete throw an error if we couldn't delete all rows

546a80c

Avoid reverting index appends that were never appended to the index

2b06475

Exit JSON reader loop when an error is encountered

4943d81

Upon registering an error, if we are waiting for another thread to fi…

c8d6a89

…nish reading to throw, stop reading the current chunk instead of waiting in a busy loop

Add atomic to see if tidy gets happier

8951ec3

CMake: export also duckdb_generated_extension_loader

e74eae4

Merge V1.4 -> V1.5 (duckdb#20350)

604718a

Issue duckdb#20413: ASOF SEMI/ANTI Bindings

e6ac703

* Add the RHS bindings when we are doing SEMI or ANTI ASOF joins with a predicate.

Issue duckdb#20413: ASOF Arbitrary Predicates

5ecf5d0

* Disallow using arbitrary predicates in AsOf with RIGHT/FULL/SEMI joins.

Internal duckdb#6975: Window Inner Self-Join

4ed12c7

* Convert the semi-join to an inner join and import the count directly

Issue duckdb#20413: ASOF Arbitrary Predicates

81ed27d

* Disallow using arbitrary predicates in AsOf with ANTI joins.

Internal duckdb#6975: Window Inner Self-Join (duckdb#20459)

ba0a7e6

* Convert the semi-join to an inner join and import the count directly

expose safe string assign function

3702353

Issue duckdb#20413: ASOF Arbitrary Predicates (duckdb#20456)

13e1f22

* Disallow using arbitrary predicates in AsOf with RIGHT/FULL/SEMI joins.

Internal duckdb#6976: Window Self-Join Predicate

223544c

* Remove the predicate test and relocation (join predicate push-down will take care of it) * Update test plans and add correctness tests for new cases.

Internal duckdb#6976: Window Self-Join Predicate

c7776c0

* Enforce ordering in test

[C API] Expose safe string assign function (duckdb#20467)

f4a8fa8

Follow-up to duckdb#20348. Related issue: duckdblabs/duckdb-internal#7002

Internal duckdb#6976: Window Self-Join Predicate (duckdb#20473)

740d25a

* Remove the predicate test and relocation (join predicate push-down will take care of it) * Update test plans and add correctness tests for new cases.

CMake: export also duckdb_generated_extension_loader (duckdb#20449)

d18dfbf

Bumped while building duckdb-wasm, I would expect other clients or packagers of duckdb might also hit this, and fix is simple.

squashed to a patch to correctly rebase to v1.5

b5d59e3

Tishj force-pushed the variant_extract_pushdown branch from bd03819 to b5d59e3 Compare January 12, 2026 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Variant extract pushdown #150

Variant extract pushdown #150

Uh oh!

Tishj commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Variant extract pushdown #150

Are you sure you want to change the base?

Variant extract pushdown #150

Uh oh!

Conversation

Tishj commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants