Skip to content

Conversation

@krleonid
Copy link
Owner

@krleonid krleonid commented Nov 3, 2025

Adds a Clear() method to the DuckDB Appender API that clears buffered data without flushing it to the database.
This allows users to discard uncommitted appends and reuse the appender instance.

The clear operation resets the appender’s internal state — including its chunk, buffered collection, and column index — without committing data to the database.
Includes tests verifying that cleared data is not flushed and that the appender remains usable after clearing.

This feature is useful when data needs to be discarded before committing.

- Added duckdb_appender_clear to appender JSON with detailed description
- Added implementation in appender-c.cpp that calls BaseAppender::Clear()
- Added duckdb_appender_clear to v1.2.0 API struct
- Added comprehensive test in test_capi_appender.cpp
- Regenerated headers using make generate-files
krleonid pushed a commit that referenced this pull request Nov 20, 2025
…uckdb#19680) (duckdb#19811)

Fixes duckdb#19680

This fixes a bug where queries using `NOT EXISTS` with `IS DISTINCT
FROM` returned incorrect results due to improper handling of NULL
semantics in the optimizer.

The issue was that the optimizer's deliminator incorrectly treated
`DISTINCT FROM` variants the same as regular equality/inequality
comparisons, which have different NULL handling:
  - `IS DISTINCT FROM`: NULL-aware (NULL IS DISTINCT FROM NULL = FALSE)
  - != or =: NULL-unaware (NULL != NULL = NULL, filters out NULLs)


### Incorrect Query Plan

```
┌───────────────────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             c2            │
│                           │
│          ~0 rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             #5            │
│__internal_decompress_integ│
│     ral_integer(#3, 1)    │
│             #1            │
│                           │
│          ~0 rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      NESTED_LOOP_JOIN     │
│    ────────────────────   │
│      Join Type: ANTI      │
│    Conditions: c2 != c2   ├──────────────┐
│                           │              │
│          ~0 rows          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   │
│            NULL           ││            NULL           │
│             #2            ││             #2            │
│            NULL           ││            NULL           │
│             #1            ││             #1            │
│            NULL           ││            NULL           │
│             #0            ││             #0            │
│            NULL           ││            NULL           │
│                           ││                           │
│          ~2 rows          ││           ~1 row          │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   │
│             #0            ││             #0            │
│__internal_compress_integra││__internal_compress_integra│
│     l_utinyint(#1, 1)     ││     l_utinyint(#1, 1)     │
│             #2            ││             #2            │
│                           ││                           │
│          ~2 rows          ││           ~1 row          │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   │
│            NULL           ││            NULL           │
│             #0            ││             #0            │
│            NULL           ││            NULL           │
│                           ││                           │
│          ~2 rows          ││           ~1 row          │
└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││           FILTER          │
│    ────────────────────   ││    ────────────────────   │
│         Table: t0         ││     (col0 IS NOT NULL)    │
│   Type: Sequential Scan   ││                           │
│      Projections: c2      ││                           │
│                           ││                           │
│          ~2 rows          ││           ~1 row          │
└───────────────────────────┘└─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │         SEQ_SCAN          │
                             │    ────────────────────   │
                             │         Table: t0         │
                             │   Type: Sequential Scan   │
                             │      Projections: c2      │
                             │                           │
                             │          ~2 rows          │
                             └───────────────────────────┘
```

  The buggy plan shows two critical issues:
```
  ┌─────────────┴─────────────┐
  │      NESTED_LOOP_JOIN     │
  │      Join Type: ANTI      │
  │    Conditions: c2 != c2   │  ← ❌ Wrong(the join conditions should be c2 IS DISTINCT FROM c2)
  │          ~0 rows          │
  └─────────────┬─────────────┘
                │
                └─────────────┐
                             ┌┴─────────────┐
                             │   FILTER     │
                             │ (col0 IS NOT │  ← ❌ Wrong(the filter should be removed)
                             │    NULL)     │
                             └──────────────┘
```

### Solution

This PR adds proper support for DISTINCT FROM operators throughout the
optimization pipeline:

1. Preserve DISTINCT FROM semantics in join
conversion.(src/optimizer/deliminator.cpp)
```
// NOTE: We should NOT convert DISTINCT FROM to != in general
// Only convert if the ORIGINAL join had != or = (not DISTINCT FROM variants)
if (delim_join.join_type != JoinType::MARK &&
    original_join_comparison != ExpressionType::COMPARE_DISTINCT_FROM &&
    original_join_comparison != ExpressionType::COMPARE_NOT_DISTINCT_FROM) {
    // Safe to convert
}
```
2. Skip NULL filters for DISTINCT FROM
variants.(src/optimizer/deliminator.cpp)
```
// Only add IS NOT NULL filter for regular equality/inequality comparisons
// Do NOT add for DISTINCT FROM variants, as they handle NULL correctly
if (cond.comparison != ExpressionType::COMPARE_NOT_DISTINCT_FROM &&
    cond.comparison != ExpressionType::COMPARE_DISTINCT_FROM) {
    // Add IS NOT NULL filter
}
```
3. Added negation support for COMPARE_DISTINCT_FROM and
COMPARE_NOT_DISTINCT_FROM
    in expression type handling.(src/common/enums/expression_type.cpp)
4. Updated parser to properly negate IS DISTINCT FROM expressions when
wrapped with NOT.
(src/parser/transform/expression/transform_bool_expr.cpp)
5. Added regression test in
test/sql/subquery/exists/test_correlated_exists_with_derived_table.test
krleonid pushed a commit that referenced this pull request Nov 27, 2025
)

We found this issue when using the python client (because of the
`.show() method propagating a LIMIT), that large limit optimizations
where getting in the way of filter pushdowns. The idea is to push the
filter before applying the limit whenever there is a filter. The idea is
from @Mytherin, I just added a test.

**Original Optimized logical plan:**

```text
┌─────────────────────────────┐
│┌───────────────────────────┐│
││  Optimized Logical Plan   ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│    ────────────────────   │
│       Expressions: a      │
│                           │
│          ~0 rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│       Expressions: a      │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│        Expressions:       │
│__internal_decompress_integ│
│     ral_bigint(#0, 0)     │
│             #1            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│          ORDER_BY         │
│    ────────────────────   │
│           rowid           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│        Expressions:       │
│__internal_compress_integra│
│     l_uinteger(#0, 0)     │
│             #1            │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      COMPARISON_JOIN      │
│    ────────────────────   │
│      Join Type: SEMI      │
│                           ├──────────────┐
│        Conditions:        │              │
│      (rowid = rowid)      │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│          SEQ_SCAN         ││           LIMIT           │
│    ────────────────────   ││    ────────────────────   │
│          Table: t         ││                           │
│   Type: Sequential Scan   ││                           │
└───────────────────────────┘└─────────────┬─────────────┘
                             ┌─────────────┴─────────────┐
                             │          SEQ_SCAN         │
                             │    ────────────────────   │
                             │       Filters: a<50       │
                             │          Table: t         │
                             │   Type: Sequential Scan   │
                             │                           │
                             │       ~400,000 rows       │
                             └───────────────────────────┘
```

Logical plan after this PR:

```text
┌─────────────────────────────┐
│┌───────────────────────────┐│
││  Optimized Logical Plan   ││
│└───────────────────────────┘│
└─────────────────────────────┘
┌───────────────────────────┐
│         PROJECTION        │
│    ────────────────────   │
│       Expressions: a      │
│                           │
│          ~0 rows          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           LIMIT           │
│    ────────────────────   │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│          SEQ_SCAN         │
│    ────────────────────   │
│       Filters: a<50       │
│          Table: t         │
│   Type: Sequential Scan   │
│                           │
│       ~400,000 rows       │
└───────────────────────────┘
```
krleonid pushed a commit that referenced this pull request Dec 30, 2025
…kdb#20052)

Follow-up from duckdb#19937

This PR enables commits to continue while a checkpoint is happening.
Currently this is limited to commits that exclusively insert data, as
any other changes (deletes, updates, catalog changes, alters, etc) still
eagerly grab the checkpoint lock which will prevent a checkpoint from
starting while these changes are pending, and vice versa. It is also
limited to inserting data into tables **that do not have indexes**. As
part of this PR, appending to a table that has indexes now grabs the
checkpoint lock again.

Enabling commits while checkpointing has two consequences for the system
that need to be dealt with:

* Checkpointing no longer checkpoints the latest commit. 
* While checkpointing, new commits that happen need to be written
somewhere in order for them to be durable. We can no longer write them
to the old WAL as we want to truncate it after our checkpoint is
finished.

### Pinned Checkpoint Commit

Previously checkpointing code assumed we were always checkpointing the
latest commit. This is no longer correct since what is the "latest
committed data" might now change *while a checkpoint is running*.
Instead, what we need to do is choose a commit id on which we will
checkpoint. When starting a checkpoint we get the latest commit id and
checkpoint based on that commit. Subsequent commits are not written as
part of the checkpoint, but can then be written as part of a future
checkpoint.

In order to simplify this - we ensure that after starting a checkpoint
any new data that is written is always written to *new row groups*. This
is managed in `DataTable::AppendLock`. Due to this, when performing the
checkpoint, we only need to know "do we need to checkpoint this row
group or not", rather than having to checkpoint a part of a row group.
This is handled in the new method `RowGroup::ShouldCheckpointRowGroup`.

#### Free Blocks

Another challenge with the pinned checkpoint commit is how to manage the
list of free blocks - i.e. blocks that are present in the file but are
not used. Block usage is tracked globally in the
`SingleFileBlockManager`. With optimistic writes, we can write to blocks
in the storage layer (i.e. make them no longer free blocks). However, if
a checkpoint happens at a pinned commit, any optimistic writes that
happen after the commit is pinned do not belong to that checkpoint. If
we don't write these blocks in the free block list, we might get
dangling blocks in case an abort or rollback happens. However, the
blocks are not actually free in-memory, as they are being used by the
optimistically written data.

In order to solve this issue we introduce a new set in the block manager
- `newly_used_blocks`. This tracks blocks that are in-use, but are not
yet part of a given checkpoint.

### Checkpoint WAL

New commits that happen while checkpointing have to be written
somewhere. In order to still allow for the checkpoint to truncate the
WAL to prevent it from growing indefinitely, we introduce the concept of
a **checkpoint WAL**. This is a secondary WAL that can be written to
only by concurrent commits while a checkpoint is happening.

When a checkpoint is started, the checkpoint flag is written to the
original WAL. The checkpoint flag contains the root metadata block
pointer that will be written **when the checkpoint is successful**.


```
main.db.wal
[INSERT #1][COMMIT][CHECKPOINT: NEW_ROOT: #2]
```

The checkpoint flag allows us to, during recovery, figure out if a
checkpoint was completed or if the checkpoint was not completed. This
determines if we need to replay the WAL. This is already done in the
current version to deal with a crash between flipping the root block
pointer and truncating the WAL, however, in the new version this happens
before **any** data is written instead of only happening at the end.

After this is written, we set any new commits to write to the checkpoint
WAL. For example, assume a new commit comes in that inserts some data.
We will now have the following situation:

```
main.db.wal
[INSERT #1][COMMIT][CHECKPOINT: NEW_ROOT: #2]

main.db.checkpoint.wal
[INSERT #2][COMMIT]
```

After the checkpoint is finished, we have flushed all changes in
`main.db.wal` to the main database file, while the changes in
`main.db.checkpoint.wal` have not been flushed. All we need to do is
move over the checkpoint WAL and have it replace the original WAL. This
will lead us to the following final result after the checkpoint:

```
main.db.wal
[INSERT #2][COMMIT]
```


#### Recovery

In order to provide ACID compliance all commits that have succeeded must
be persisted even across failures. That means that any commits that are
written to the checkpoint WAL need to be persisted no matter where we
crash. Below is a list of failure modes:

###### Crash Before Checkpoint Complete

Our situation is like this:

```
main.db
[ROOT #1]

main.db.wal
[INSERT #1][COMMIT][CHECKPOINT: NEW_ROOT: #2]

main.db.checkpoint.wal
[INSERT #2][COMMIT]
```

In order to recover in this situation, we need to replay both
`main.db.wal` and `main.db.checkpoint.wal`. The recovering process sees
that the checkpoint root does not match the root in the database, and
now also checks for the presence of a checkpoint WAL. It then replays
them in order (`main.db.wal` -> `main.db.checkpoint.wal`).

If this is a `READ_WRITE` connection it merges the two WALs **except for
the checkpoint node** by writing a new WAL that contains the content of
both WALs:

```
main.db.recovery.wal
[INSERT #1][COMMIT][INSERT #2][COMMIT]
```

After that completes, it overwrites the main WAL with the recovery WAL.
Finally, it removes the checkpoint WAL.

```
mv main.db.recovery.wal main.db.wal
rm main.db.checkpoint.wal
```

###### Crash During Recovery

If we crash during the above recovery process (after mv, before rm) we
would have this situation:

```
main.db
[ROOT #1]

main.db.wal
[INSERT #1][COMMIT][INSERT #2][COMMIT]

main.db.checkpoint.wal
[INSERT #2][COMMIT]
```

This is safe to recover from because `main.db.wal` does not contain a
`CHECKPOINT` node. As such, we will not replay the checkpoint WAL, and
only `main.db.wal` will be replayed.

###### Crash After Checkpoint Complete, Before WAL Move


Our situation is like this:

```
main.db
[ROOT #2]

main.db.wal
[INSERT #1][COMMIT][CHECKPOINT: NEW_ROOT: #2]

main.db.checkpoint.wal
[INSERT #2][COMMIT]
```

In order to recover in this situation, we need to replay only
`main.db.checkpoint.wal`. The recovering process sees that the
checkpoint root matches the root in the database, so it knows it does
not need to replay `main.db.wal`. It checks for the presence of the
checkpoint WAL. It is present - and replays it.

If this is a `READ_WRITE` connection it then completes the checkpoint by
finalizing the move (i.e. `mv main.db.checkpoint.wal main.db.wal`).


### Other Changes / Fixes


#### Windows: make `FileSystem::MoveFile` behave like Linux/MacOS

On Linux/MacOS, `MoveFile` is used to mean "move and override the target
file". On Windows, this would previously fail if the target exists
already. This PR makes Windows behave like Linux/MacOS by using
`MOVEFILE_REPLACE_EXISTING` in `MoveFileExW`. In addition, because we
tend to use `MoveFile` to mean "we want to be certain this file was
moved", we also enable the `MOVEFILE_WRITE_THROUGH` flag.


#### SQLLogicTest 

While testing this PR, I realized the sqllogictest runner was swallowing
exceptions thrown in certain locations and incorrectly reporting tests
that should fail as succeeded. This PR fixes that and now makes these
exceptions fail the test run. This revealed a bunch of failing tests, in
particular around the config runners `peg_parser.json` and
`encryption.json`, and a few tests in `httpfs`. A few tests were fixed,
but others were skipped in the config pending looking at them in the
future.
krleonid pushed a commit that referenced this pull request Dec 30, 2025
…CHAR in `shell_renderer.cpp` (duckdb#20096)

Hi DuckDB Team,

When I used Linux to build the main branch of DuckDB with my
`adbc_scanner` extension I often encountered this link error in CI. This
PR resolves it by not using `LogicalType::VARCHAR` instead using
`LogicalType(LogicalTypeId::VARCHAR)`. The code that is changed is in
`shell_renderer.cpp` so not anywhere in my extension code. With this
change the compilation succeeds.

Build error:

```
[2/3] Linking CXX executable duckdb
FAILED: duckdb 
: && /usr/bin/c++ -g -g -O0 -DDEBUG -Wall    -fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -Wunused -Werror=vla -Wnarrowing -pedantic  tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/highlighting.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/history.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/linenoise.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/linenoise-c.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/rendering.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/terminal.cpp.o extension/CMakeFiles/duckdb_generated_extension_loader.dir/__/codegen/src/generated_extension_loader.cpp.o tools/shell/CMakeFiles/shell.dir/shell_command_line_option.cpp.o tools/shell/CMakeFiles/shell.dir/shell_extension.cpp.o tools/shell/CMakeFiles/shell.dir/shell.cpp.o tools/shell/CMakeFiles/shell.dir/shell_helpers.cpp.o tools/shell/CMakeFiles/shell.dir/shell_metadata_command.cpp.o tools/shell/CMakeFiles/shell.dir/shell_prompt.cpp.o tools/shell/CMakeFiles/shell.dir/shell_renderer.cpp.o tools/shell/CMakeFiles/shell.dir/shell_highlight.cpp.o tools/shell/CMakeFiles/shell.dir/shell_progress_bar.cpp.o tools/shell/CMakeFiles/shell.dir/shell_render_table_metadata.cpp.o tools/shell/CMakeFiles/shell.dir/shell_windows.cpp.o -o duckdb  src/libduckdb_static.a  extension/adbc_scanner/libadbc_scanner_extension.a  extension/core_functions/libcore_functions_extension.a  extension/parquet/libparquet_extension.a  extension/jemalloc/libjemalloc_extension.a  third_party/utf8proc/libduckdb_utf8proc.a  vcpkg_installed/x64-linux/debug/lib/libadbc_driver_manager.a  vcpkg_installed/x64-linux/debug/lib/libtomlplusplus.a  vcpkg_installed/x64-linux/debug/lib/libnanoarrow_static.a  src/libduckdb_static.a  -ldl && :
/usr/bin/ld: src/libduckdb_static.a(ub_duckdb_common.cpp.o):(.rodata+0x4460): multiple definition of `duckdb::LogicalType::VARCHAR'; tools/shell/CMakeFiles/shell.dir/shell_renderer.cpp.o:(.rodata._ZN6duckdb11LogicalType7VARCHARE[_ZN6duckdb11LogicalType7VARCHARE]+0x0): first defined here
collect2: error: ld returned 1 exit status
```

Distro information: Amazon Linux 2023 on AWS

Linux ip-172-16-3-91.ec2.internal 6.1.158-178.288.amzn2023.x86_64 #1 SMP
PREEMPT_DYNAMIC Mon Nov 3 18:38:36 UTC 2025 x86_64 x86_64 x86_64
GNU/Linux

```
[ec2-user@ip-172-16-3-91 duckdb]$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-amazon-linux/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-amazon-linux
Configured with: ../configure --enable-bootstrap --enable-host-pie --enable-host-bind-now --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://github.com/amazonlinux/amazon-linux-2022 --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-11.5.0-20240719/obj-x86_64-amazon-linux/isl-install --enable-multilib --with-linker-hash-style=gnu --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_64=x86-64-v2 --with-arch_32=x86-64 --build=x86_64-amazon-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.5.0 20240719 (Red Hat 11.5.0-5) (GCC) 
```

I only knew how to resolve it from past patterns where I've seen the
same problem in other extensions.

You can see a full build failure here:


https://github.com/Query-farm/adbc_scanner/actions/runs/20048370767/job/57498881373

Thanks,

Rusty
krleonid pushed a commit that referenced this pull request Dec 30, 2025
This PR improves the `CommonSubplanOptimizer`, and should put it in
"maintenance mode", at least for now, and I think I'm done improving it
for a while.

## Multiple Nested Matching

This PR allows multiple subplans to be matched, rather than just a
single subplan, and allows nesting. Take the following query, for
example:

```sql
explain
select distinct range from range(10)
union all
select distinct range from range(10)
union all
select range % 2 as range from (select distinct range from range(10)) group by range
union all
select range % 2 as range from (select distinct range from range(10)) group by range
union all
select count(*) from (select range % 2 as range from (select distinct range from range(10)) group by range)
union all
select count(*) from (select range % 2 as range from (select distinct range from range(10)) group by range);
```

This query unions 6 queries together. Each query occurs twice, and is
nested in the next query. With the optimizer disabled, this yields the
following plan:
```
┌───────────────────────────┐
│           UNION           ├──────────────┬────────────────────────────┬────────────────────────────┬────────────────────────────┬────────────────────────────┐
└─────────────┬─────────────┘              │                            │                            │                            │                            │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│       HASH_GROUP_BY       ││       HASH_GROUP_BY       ││         PROJECTION        ││         PROJECTION        ││    UNGROUPED_AGGREGATE    ││    UNGROUPED_AGGREGATE    │
│    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   │
│         Groups: #0        ││         Groups: #0        ││           range           ││           range           ││        Aggregates:        ││        Aggregates:        │
│                           ││                           ││                           ││                           ││        count_star()       ││        count_star()       │
│          ~10 rows         ││          ~10 rows         ││          ~6 rows          ││          ~6 rows          ││                           ││                           │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         PROJECTION        ││         PROJECTION        ││       HASH_GROUP_BY       ││       HASH_GROUP_BY       ││         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   │
│           range           ││           range           ││         Groups: #0        ││         Groups: #0        ││             42            ││             42            │
│                           ││                           ││                           ││                           ││                           ││                           │
│          ~10 rows         ││          ~10 rows         ││          ~6 rows          ││          ~6 rows          ││          ~6 rows          ││          ~6 rows          │
└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│           RANGE           ││           RANGE           ││         PROJECTION        ││         PROJECTION        ││       HASH_GROUP_BY       ││       HASH_GROUP_BY       │
│    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   │
│      Function: RANGE      ││      Function: RANGE      ││           range           ││           range           ││         Groups: #0        ││         Groups: #0        │
│                           ││                           ││                           ││                           ││                           ││                           │
│          ~10 rows         ││          ~10 rows         ││          ~10 rows         ││          ~10 rows         ││          ~6 rows          ││          ~6 rows          │
└───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
                                                          ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                                                          │       HASH_GROUP_BY       ││       HASH_GROUP_BY       ││         PROJECTION        ││         PROJECTION        │
                                                          │    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   │
                                                          │         Groups: #0        ││         Groups: #0        ││           range           ││           range           │
                                                          │                           ││                           ││                           ││                           │
                                                          │          ~10 rows         ││          ~10 rows         ││          ~10 rows         ││          ~10 rows         │
                                                          └─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
                                                          ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                                                          │         PROJECTION        ││         PROJECTION        ││       HASH_GROUP_BY       ││       HASH_GROUP_BY       │
                                                          │    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   │
                                                          │           range           ││           range           ││         Groups: #0        ││         Groups: #0        │
                                                          │                           ││                           ││                           ││                           │
                                                          │          ~10 rows         ││          ~10 rows         ││          ~10 rows         ││          ~10 rows         │
                                                          └─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
                                                          ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                                                          │           RANGE           ││           RANGE           ││         PROJECTION        ││         PROJECTION        │
                                                          │    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   │
                                                          │      Function: RANGE      ││      Function: RANGE      ││           range           ││           range           │
                                                          │                           ││                           ││                           ││                           │
                                                          │          ~10 rows         ││          ~10 rows         ││          ~10 rows         ││          ~10 rows         │
                                                          └───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
                                                                                                                    ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                                                                                                                    │           RANGE           ││           RANGE           │
                                                                                                                    │    ────────────────────   ││    ────────────────────   │
                                                                                                                    │      Function: RANGE      ││      Function: RANGE      │
                                                                                                                    │                           ││                           │
                                                                                                                    │          ~10 rows         ││          ~10 rows         │
                                                                                                                    └───────────────────────────┘└───────────────────────────┘
```
As we can see, there is a lot of redundance here. With this PR (and the
optimizer enabled, of course), we get the following plan:
```
┌───────────────────────┐
│          CTE          │
│    ────────────────   │
│       CTE Name:       │
│   __common_subplan_1  │
│                       ├────────────┐
│    Table Index: 79    │            │
│                       │            │
│        ~0 rows        │            │
└───────────┬───────────┘            │
┌───────────┴───────────┐┌───────────┴───────────┐
│     HASH_GROUP_BY     ││          CTE          │
│    ────────────────   ││    ────────────────   │
│       Groups: #0      ││       CTE Name:       │
│                       ││   __common_subplan_2  │
│                       ││                       ├────────────┐
│                       ││    Table Index: 86    │            │
│                       ││                       │            │
│        ~10 rows       ││        ~0 rows        │            │
└───────────┬───────────┘└───────────┬───────────┘            │
┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐
│       PROJECTION      ││     HASH_GROUP_BY     ││          CTE          │
│    ────────────────   ││    ────────────────   ││    ────────────────   │
│         range         ││       Groups: #0      ││       CTE Name:       │
│                       ││                       ││   __common_subplan_3  │
│                       ││                       ││                       ├────────────┐
│                       ││                       ││    Table Index: 91    │            │
│                       ││                       ││                       │            │
│        ~10 rows       ││        ~6 rows        ││        ~0 rows        │            │
└───────────┬───────────┘└───────────┬───────────┘└───────────┬───────────┘            │
┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐
│         RANGE         ││       PROJECTION      ││  UNGROUPED_AGGREGATE  ││         UNION         │
│    ────────────────   ││    ────────────────   ││    ────────────────   ││                       │
│    Function: RANGE    ││         range         ││      Aggregates:      ││                       ├────────────┬────────────────────────┬────────────────────────┬────────────────────────┬────────────────────────┐
│                       ││                       ││      count_star()     ││                       │            │                        │                        │                        │                        │
│        ~10 rows       ││        ~10 rows       ││                       ││                       │            │                        │                        │                        │                        │
└───────────────────────┘└───────────┬───────────┘└───────────┬───────────┘└───────────┬───────────┘            │                        │                        │                        │                        │
                         ┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐
                         │        CTE_SCAN       ││       PROJECTION      ││        CTE_SCAN       ││        CTE_SCAN       ││       PROJECTION      ││       PROJECTION      ││        CTE_SCAN       ││        CTE_SCAN       │
                         │    ────────────────   ││    ────────────────   ││    ────────────────   ││    ────────────────   ││    ────────────────   ││    ────────────────   ││    ────────────────   ││    ────────────────   │
                         │     CTE Index: 79     ││           42          ││     CTE Index: 79     ││     CTE Index: 79     ││         range         ││         range         ││     CTE Index: 91     ││     CTE Index: 91     │
                         │                       ││                       ││                       ││                       ││                       ││                       ││                       ││                       │
                         │        ~10 rows       ││        ~6 rows        ││        ~10 rows       ││        ~10 rows       ││        ~6 rows        ││        ~6 rows        ││         ~1 row        ││         ~1 row        │
                         └───────────────────────┘└───────────┬───────────┘└───────────────────────┘└───────────────────────┘└───────────┬───────────┘└───────────┬───────────┘└───────────────────────┘└───────────────────────┘
                                                  ┌───────────┴───────────┐                                                  ┌───────────┴───────────┐┌───────────┴───────────┐
                                                  │        CTE_SCAN       │                                                  │        CTE_SCAN       ││        CTE_SCAN       │
                                                  │    ────────────────   │                                                  │    ────────────────   ││    ────────────────   │
                                                  │     CTE Index: 86     │                                                  │     CTE Index: 86     ││     CTE Index: 86     │
                                                  │                       │                                                  │                       ││                       │
                                                  │        ~6 rows        │                                                  │        ~6 rows        ││        ~6 rows        │
                                                  └───────────────────────┘                                                  └───────────────────────┘└───────────────────────┘
```
Which has 3 CTEs, 8 CTE scans, and only 3 aggregations (instead of the
original 12!).

## Fuzzy Plan Matching

Something that can show up in query plans is an "almost" exact subplan
match. Before this PR, an exact match was required. With this PR, we can
do "fuzzy" matching, where the plan is mostly the same, save for some
selected columns. If we have the following query, for example:

```sql
-- Create build table
create table build as
select range a, range * 2 b, range * 3 c, range * 4 d
from (select range::utinyint as range from range(11))
where range % 5 = 0;
-- Create probe table
create table probe as
select range e, range * 2 f, range * 3 g, range * 4 h
from (select range::utinyint as range from range(11));
-- View that joins the two
create view my_view as
from probe join build on (build.a = probe.e);
-- Select greatest of all columns, unioned with greatest of just two columns
explain
select greatest(a, b, c, d, e, f, g, h) from my_view
union all
select greatest(b, f) from my_view;
```
Here, one of the unioned queries selects all columns, and the other
selects just two columns. As we can see, the join (coming from the view)
in the second query is "contained" in the join in first query, as the
first query selects all columns that the second query needs.

We currently get the following plan (the optimizer doesn't trigger):
```
┌───────────────────────────┐
│           UNION           ├───────────────────────────────────────────┐
└─────────────┬─────────────┘                                           │
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
│         PROJECTION        │                             │         PROJECTION        │
│    ────────────────────   │                             │    ────────────────────   │
│ greatest(a, b, c, d, e, f,│                             │       greatest(b, f)      │
│            g, h)          │                             │                           │
│                           │                             │                           │
│          ~3 rows          │                             │          ~3 rows          │
└─────────────┬─────────────┘                             └─────────────┬─────────────┘
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
│         PROJECTION        │                             │         HASH_JOIN         │
│    ────────────────────   │                             │    ────────────────────   │
│             e             │                             │      Join Type: INNER     │
│             f             │                             │     Conditions: e = a     │
│             g             │                             │                           │
│             h             │                             │                           │
│             a             │                             │                           ├──────────────┐
│             b             │                             │                           │              │
│             c             │                             │                           │              │
│             d             │                             │                           │              │
│                           │                             │                           │              │
│          ~3 rows          │                             │          ~3 rows          │              │
└─────────────┬─────────────┘                             └─────────────┬─────────────┘              │
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         HASH_JOIN         │                             │         SEQ_SCAN          ││         SEQ_SCAN          │
│    ────────────────────   │                             │    ────────────────────   ││    ────────────────────   │
│      Join Type: INNER     │                             │        Table: probe       ││        Table: build       │
│     Conditions: e = a     │                             │   Type: Sequential Scan   ││   Type: Sequential Scan   │
│                           │                             │                           ││                           │
│                           ├──────────────┐              │        Projections:       ││        Projections:       │
│                           │              │              │             e             ││             a             │
│                           │              │              │             f             ││             b             │
│                           │              │              │                           ││                           │
│          ~3 rows          │              │              │          ~11 rows         ││          ~3 rows          │
└─────────────┬─────────────┘              │              └───────────────────────────┘└───────────────────────────┘
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          │
│    ────────────────────   ││    ────────────────────   │
│        Table: probe       ││        Table: build       │
│   Type: Sequential Scan   ││   Type: Sequential Scan   │
│                           ││                           │
│        Projections:       ││        Projections:       │
│             e             ││             a             │
│             f             ││             b             │
│             g             ││             c             │
│             h             ││             d             │
│                           ││                           │
│          ~11 rows         ││          ~3 rows          │
└───────────────────────────┘└───────────────────────────┘
```

With the improvements to the optimizer in this PR, we now get this plan:
```
┌───────────────────────────┐
│            CTE            │
│    ────────────────────   │
│         CTE Name:         │
│     __common_subplan_1    │
│                           ├───────────────────────────────────────────┐
│      Table Index: 29      │                                           │
│                           │                                           │
│          ~0 rows          │                                           │
└─────────────┬─────────────┘                                           │
┌─────────────┴─────────────┐                             ┌─────────────┴─────────────┐
│         HASH_JOIN         │                             │           UNION           │
│    ────────────────────   │                             │                           │
│      Join Type: INNER     │                             │                           │
│     Conditions: e = a     ├──────────────┐              │                           ├──────────────┐
│                           │              │              │                           │              │
│          ~3 rows          │              │              │                           │              │
└─────────────┬─────────────┘              │              └─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         SEQ_SCAN          ││         SEQ_SCAN          ││         PROJECTION        ││         PROJECTION        │
│    ────────────────────   ││    ────────────────────   ││    ────────────────────   ││    ────────────────────   │
│        Table: probe       ││        Table: build       ││ greatest(a, b, c, d, e, f,││       greatest(b, f)      │
│   Type: Sequential Scan   ││   Type: Sequential Scan   ││            g, h)          ││                           │
│                           ││                           ││                           ││                           │
│        Projections:       ││        Projections:       ││                           ││                           │
│             e             ││             a             ││                           ││                           │
│             f             ││             b             ││                           ││                           │
│             g             ││             c             ││                           ││                           │
│             h             ││             d             ││                           ││                           │
│                           ││                           ││                           ││                           │
│          ~11 rows         ││          ~3 rows          ││          ~3 rows          ││          ~3 rows          │
└───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘
                                                          ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                                                          │         PROJECTION        ││         PROJECTION        │
                                                          │    ────────────────────   ││    ────────────────────   │
                                                          │             e             ││             #1            │
                                                          │             f             ││             #4            │
                                                          │             g             ││                           │
                                                          │             h             ││                           │
                                                          │             a             ││                           │
                                                          │             b             ││                           │
                                                          │             c             ││                           │
                                                          │             d             ││                           │
                                                          │                           ││                           │
                                                          │          ~3 rows          ││          ~0 rows          │
                                                          └─────────────┬─────────────┘└─────────────┬─────────────┘
                                                          ┌─────────────┴─────────────┐┌─────────────┴─────────────┐
                                                          │          CTE_SCAN         ││          CTE_SCAN         │
                                                          │    ────────────────────   ││    ────────────────────   │
                                                          │       CTE Index: 29       ││       CTE Index: 29       │
                                                          │                           ││                           │
                                                          │          ~3 rows          ││          ~3 rows          │
                                                          └───────────────────────────┘└───────────────────────────┘
```
As we can see, the join is materialized as a CTE, and scanned twice.
Columns that aren't needed are projected out after the CTE scan.

## Benchmark Improvements

With the changes, we now trigger more subplan elimination on TPC-DS and
TPC-H. Here are some results that I collected on my laptop.

TPC-DS SF100 improvements:
Q61: 0.61s -> 0.33s (~1.8x)
Q70: 0.81s -> 0.55s (~1.5x)

TPC-H SF100 improvements:
Q11: 0.16s -> 0.12s (~1.3x)

## Notes

I needed a lot of indirection to get the column bindings to match with
the fuzzy plan matching, which required a lot of `unordered_map`s. To
avoid doing so many allocations, I've also implemented
`arena_unordered_map`, so that the number of allocations grow
logarithmically. I'm not sure how much this helped, but it's just
something I wanted to do as we are trying to reduce allocations.
Overall, this optimizer now takes ~10% of total optimization time.
krleonid pushed a commit that referenced this pull request Jan 14, 2026
…0283)

Fix for: duckdblabs/duckdb-internal#6809 ,
duckdb#20086

I would like someone to take a look at this before I run CI, to see if
the fix makes sense.

In ConstantOrNullFunction, there is a bug where if the first loop
iteration is a FLAT_VECTOR, the result validity mask is created as a
reference to the validity mask of args.data[idx]. If the subsequent
iteration is the default branch (say, a DICTIONARY_VECTOR), and we call
result_mask.SetInvalid(i), this is now overwriting the validity mask of
the first input column where the reference was created.

I believe the fix for this is to call EnsureWritable in the FLAT_VECTOR
case, to make sure the validity mask is not a reference to the input's
validity mask before we call

```cpp
result_mask.Combine(input_mask, args.size()) 
```
(which is where the alias is actually created). 

The reproducer hits this case -- a specific scenario of unique index +
update + no checkpointing was leading to the this scenario.

For reference, here is the query plan of the last query in the
reproducer, where the bug was occuring. The t1.c0 column is being passed
as a FLAT_VECTOR to constantOrNullFunction, and the t0.c1 column is
being passed in as a dictionary vector. Since the argument at index 1 in
ConstantOrNullFunction is the c0 column in the output, we were
overwriting NULLs into the ouput since the filter was overwriting the
validity mask in ConstantOrNullFunction:

```
┌───────┬───────┬───────┐
│  c0   │  c0   │  c1   │
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│  NULL │     1 │  NULL │
│  NULL │    -1 │  NULL │
└───────┴───────┴───────┘
```

Whereas it should be: 

```
┌───────┬───────┬───────┐
│  c0   │  c0   │  c1   │
│ int32 │ int32 │ int32 │
├───────┼───────┼───────┤
│     0 │     1 │  NULL │
│  NULL │    -1 │  NULL │
└───────┴───────┴───────┘
```

```┌────────────────────────────────────────────────┐
│┌──────────────────────────────────────────────┐│
││               Total Time: 9.18s              ││
│└──────────────────────────────────────────────┘│
└────────────────────────────────────────────────┘
┌───────────────────────────┐
│           QUERY           │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      EXPLAIN_ANALYZE      │
│    ────────────────────   │
│           0 rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             c0            │
│             c0            │
│             c1            │
│                           │
│           2 rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│             #3            │
│             #7            │
│            #11            │
│                           │
│           2 rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│           FILTER          │
│    ────────────────────   │
│  (constant_or_null(false, │
│      c0, c1) IS NULL)     │
│                           │
│           2 rows          │
│          (1.82s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│            NULL           │
│             #6            │
│            NULL           │
│             #5            │
│            NULL           │
│             #4            │
│            NULL           │
│             #3            │
│            NULL           │
│             #2            │
│            NULL           │
│             #1            │
│            NULL           │
│             #0            │
│            NULL           │
│                           │
│           2 rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│         PROJECTION        │
│    ────────────────────   │
│            NULL           │
│             #2            │
│            NULL           │
│             #1            │
│            NULL           │
│             #0            │
│            NULL           │
│                           │
│           2 rows          │
│          (0.00s)          │
└─────────────┬─────────────┘
┌─────────────┴─────────────┐
│      POSITIONAL_SCAN      │
│    ────────────────────   │
│           2 rows          ├──────────────┐
│          (7.30s)          │              │
└─────────────┬─────────────┘              │
┌─────────────┴─────────────┐┌─────────────┴─────────────┐
│         TABLE_SCAN        ││         TABLE_SCAN        │
│    ────────────────────   ││    ────────────────────   │
│         Table: t1         ││         Table: t0         │
│   Type: Sequential Scan   ││   Type: Sequential Scan   │
│      Projections: c0      ││                           │
│                           ││        Projections:       │
│                           ││             c1            │
│                           ││             c0            │
│                           ││                           │
│           0 rows          ││           0 rows          │
│          (0.00s)          ││          (0.00s)          │
└───────────────────────────┘└───────────────────────────┘
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants