Skip to content

Conversation

@Tishj
Copy link
Owner

@Tishj Tishj commented Dec 16, 2025

No description provided.

lasanaka-jumptrading and others added 30 commits December 27, 2025 14:59
Three minor fixes:
* one test code that was wrong
* one a detail of Window's interator interface that was off
* one just cleaning up the code a bit, with clearer (to me) iteration
* Rename the window self-join optimizer files to match their contents
* Add support for less than 2 or more, equal to 1 filter conditions.
* Replace some switch statements with state machine operations
* Fix the max threads to use the maximum number of tasks.
* Rename variable with legacy name
Follow-up on duckdb#19906

This PR allows eagerly executing ungrouped min/max aggregates with
Parquet row group statistics analogous to the DuckDB file format.
When `duckdb.exe` shell is used in default Windows terminal and
`odbc_scanner` extension is used to connect to Oracle DB - the unicode
output gets broken in console (for all subsequent queries), example:

```sql
SELECT 'Здравейте' AS hello;
```
```
UÄÄÄÄÄÄÄÄÄÄÄ¿
3   hello   3
3  varchar  3
AÄÄÄÄÄÄÄÄÄÄÄ'
3 ????????? 3
AÄÄÄÄÄÄÄÄÄÄÄU
```

Expected:

```
┌───────────┐
│   hello   │
│  varchar  │
├───────────┤
│ Здравейте │
└───────────┘
```

The problem is originally reported in duckdb/odbc-scanner#86 .

It appeared that, when Oracle ODBC driver is loaded it changes the
system locale, as returned by `setlocale(LC_ALL, NULL)`, from `C` to:

```
LC_COLLATE=C;LC_CTYPE=English_United States.1252;LC_MONETARY=C;LC_NUMERIC=C;LC_TIME=C
```

The original idea was, in `odbc_scanner`, to save the locale value
before loading new ODBC drivers and restore the locale after the
`odbc_connect` call returns. But it appeared that `setlocale` on
Windows is not process-wide, but CRT (MSVC C runtime library) -wide
([ref](https://learn.microsoft.com/en-us/cpp/c-runtime-library/global-state?view=msvc-170)).
And because after duckdb/odbc-scanner#87 the `odbc_scanner` uses its
own copy of C runtime lib - it cannot access/change the locale set by
Oracle driver.

For the same reason `main` branch builds of DuckDB are not affected,
but the problem is present in `v1.4-andium`. While the problem happens
only for a minor number of users and is largely non-blocking (only
Unicode data is broken, ASCII data is displayed correctly), it breaks
the UX for Oracle users. And Oracle (along with MSSQL) is the main
target for `odbc_scanner`, thus the intention is to fix this for
`v1.4-andium`.

It was found that changing the translation mode for stdout from
`_O_TEXT` to `_O_BINARY` ([ref](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/setmode?view=msvc-170))
can be used as a workaround:

```c++
_setmode(_fileno(stdout), _O_BINARY)
```

But this call also appeared to be CRT-wide, so cannot be applied
selectively from `odbc_scanner`.

It was also observed, that historically on `v1.4-andium` the `fputs`
call is used to print unicode to console. Incoming UTF-8 text is first
converted to UTF-16 (with `MultiByteToWideChar`) and then converted back
to UTF-8 (perhaps in some cases it can be different multibyte encoding
here, not UTF-8) with `WideCharToMultiByte` before passing it to `fputs`
.

While on `main` this was changed to use `WriteConsoleW` passing it
UTF-16 directly. And it appeared that `WriteConsoleW` is not affected
by this problem.

It is understood that recent shell enhancements in `main` are not
intended for `v1.4-andium`, so this PR makes the minimal backport only
changing the part of `utf8_printf()` call to use `WriteConsoleW`
instead of the `fputs`.

Testing: with manual smoke checks I cannot see any differences in
console output for ordinary queries. Though I have limited experience
with the DuckDB shell (mostly use other clients) so can miss some
use-cases.

Fixes: duckdb/odbc-scanner#86
The pointer was used incorrectly, but it was checked against NULL.

Fixes: d4f7b54 ("add support for opening duckdb filesystem from
c-api")
Mytherin and others added 29 commits January 7, 2026 15:42
…nish reading to throw, stop reading the current chunk instead of waiting in a busy loop
…r if the rows are not present in the index (duckdb#20430)

This PR cleans up the RowGroup scan code - in particular
`ScanCommitted`. This method was originally intended to scan only
committed rows, but had a bunch of options bolted onto it (e.g. scanning
all rows including any deleted rows, only including deletes that are no
longer referenced by any transactions). This also lead to a bunch of
code duplication and added complexity. This PR refactors this and cleans
up these methods, also removing a bunch of unnecessary / unused methods.
These scans can now be performed by passing in `ScanOptions` that
determines how the data should be scanned.

This was all just yak shaving when trying to fix a bug uncovered by
making `BoundIndex::Delete` throw an error when attempting to delete
entries from an index that were not present in the index. This PR also
introduces that and fixes two issues uncovered by that change.
…et throw (duckdb#20434)

We cannot always immediately throw errors in the JSON reader as we might
need to wait for previous reads to finish to (1) ensure we throw the
first error in the file, and (2) know the exact line number where the
error occurs. However, in the current implementation, when an error is
found that we cannot throw we keep on looping and re-processing the
error. This PR instead breaks out of the loop. When the reader of the
previous chunk is finished it will then actually throw the error.
* Add the RHS bindings when we are doing SEMI or ANTI ASOF joins with a predicate.
* Disallow using arbitrary predicates in AsOf with RIGHT/FULL/SEMI joins.
* Convert the semi-join to an inner join and import the count directly
* Disallow using arbitrary predicates in AsOf with ANTI joins.
* Convert the semi-join to an inner join and import the count directly
* Disallow using arbitrary predicates in AsOf with RIGHT/FULL/SEMI
joins.
* Remove the predicate test and relocation (join predicate push-down will take care of it)
* Update test plans and add correctness tests for new cases.
* Remove the predicate test and relocation (join predicate push-down
will take care of it)
* Update test plans and add correctness tests for new cases.
Bumped while building duckdb-wasm, I would expect other clients or
packagers of duckdb might also hit this, and fix is simple.
@Tishj Tishj force-pushed the variant_extract_pushdown branch from bd03819 to b5d59e3 Compare January 12, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.