Add appender clear functionality #1

krleonid · 2025-11-03T14:12:18Z

Adds a Clear() method to the DuckDB Appender API that clears buffered data without flushing it to the database.
This allows users to discard uncommitted appends and reuse the appender instance.

The clear operation resets the appender’s internal state — including its chunk, buffered collection, and column index — without committing data to the database.
Includes tests verifying that cleared data is not flushed and that the appender remains usable after clearing.

This feature is useful when data needs to be discarded before committing.

- Added duckdb_appender_clear to appender JSON with detailed description - Added implementation in appender-c.cpp that calls BaseAppender::Clear() - Added duckdb_appender_clear to v1.2.0 API struct - Added comprehensive test in test_capi_appender.cpp - Regenerated headers using make generate-files

…uckdb#19680) (duckdb#19811) Fixes duckdb#19680 This fixes a bug where queries using `NOT EXISTS` with `IS DISTINCT FROM` returned incorrect results due to improper handling of NULL semantics in the optimizer. The issue was that the optimizer's deliminator incorrectly treated `DISTINCT FROM` variants the same as regular equality/inequality comparisons, which have different NULL handling: - `IS DISTINCT FROM`: NULL-aware (NULL IS DISTINCT FROM NULL = FALSE) - != or =: NULL-unaware (NULL != NULL = NULL, filters out NULLs) ### Incorrect Query Plan ``` ┌───────────────────────────┐ │ PROJECTION │ │ ──────────────────── │ │ c2 │ │ │ │ ~0 rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ #5 │ │__internal_decompress_integ│ │ ral_integer(#3, 1) │ │ #1 │ │ │ │ ~0 rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ NESTED_LOOP_JOIN │ │ ──────────────────── │ │ Join Type: ANTI │ │ Conditions: c2 != c2 ├──────────────┐ │ │ │ │ ~0 rows │ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ PROJECTION ││ PROJECTION │ │ ──────────────────── ││ ──────────────────── │ │ NULL ││ NULL │ │ #2 ││ #2 │ │ NULL ││ NULL │ │ #1 ││ #1 │ │ NULL ││ NULL │ │ #0 ││ #0 │ │ NULL ││ NULL │ │ ││ │ │ ~2 rows ││ ~1 row │ └─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ PROJECTION ││ PROJECTION │ │ ──────────────────── ││ ──────────────────── │ │ #0 ││ #0 │ │__internal_compress_integra││__internal_compress_integra│ │ l_utinyint(#1, 1) ││ l_utinyint(#1, 1) │ │ #2 ││ #2 │ │ ││ │ │ ~2 rows ││ ~1 row │ └─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ PROJECTION ││ PROJECTION │ │ ──────────────────── ││ ──────────────────── │ │ NULL ││ NULL │ │ #0 ││ #0 │ │ NULL ││ NULL │ │ ││ │ │ ~2 rows ││ ~1 row │ └─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ SEQ_SCAN ││ FILTER │ │ ──────────────────── ││ ──────────────────── │ │ Table: t0 ││ (col0 IS NOT NULL) │ │ Type: Sequential Scan ││ │ │ Projections: c2 ││ │ │ ││ │ │ ~2 rows ││ ~1 row │ └───────────────────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ SEQ_SCAN │ │ ──────────────────── │ │ Table: t0 │ │ Type: Sequential Scan │ │ Projections: c2 │ │ │ │ ~2 rows │ └───────────────────────────┘ ``` The buggy plan shows two critical issues: ``` ┌─────────────┴─────────────┐ │ NESTED_LOOP_JOIN │ │ Join Type: ANTI │ │ Conditions: c2 != c2 │ ← ❌ Wrong(the join conditions should be c2 IS DISTINCT FROM c2) │ ~0 rows │ └─────────────┬─────────────┘ │ └─────────────┐ ┌┴─────────────┐ │ FILTER │ │ (col0 IS NOT │ ← ❌ Wrong(the filter should be removed) │ NULL) │ └──────────────┘ ``` ### Solution This PR adds proper support for DISTINCT FROM operators throughout the optimization pipeline: 1. Preserve DISTINCT FROM semantics in join conversion.(src/optimizer/deliminator.cpp) ``` // NOTE: We should NOT convert DISTINCT FROM to != in general // Only convert if the ORIGINAL join had != or = (not DISTINCT FROM variants) if (delim_join.join_type != JoinType::MARK && original_join_comparison != ExpressionType::COMPARE_DISTINCT_FROM && original_join_comparison != ExpressionType::COMPARE_NOT_DISTINCT_FROM) { // Safe to convert } ``` 2. Skip NULL filters for DISTINCT FROM variants.(src/optimizer/deliminator.cpp) ``` // Only add IS NOT NULL filter for regular equality/inequality comparisons // Do NOT add for DISTINCT FROM variants, as they handle NULL correctly if (cond.comparison != ExpressionType::COMPARE_NOT_DISTINCT_FROM && cond.comparison != ExpressionType::COMPARE_DISTINCT_FROM) { // Add IS NOT NULL filter } ``` 3. Added negation support for COMPARE_DISTINCT_FROM and COMPARE_NOT_DISTINCT_FROM in expression type handling.(src/common/enums/expression_type.cpp) 4. Updated parser to properly negate IS DISTINCT FROM expressions when wrapped with NOT. (src/parser/transform/expression/transform_bool_expr.cpp) 5. Added regression test in test/sql/subquery/exists/test_correlated_exists_with_derived_table.test

@Mytherin

) We found this issue when using the python client (because of the `.show() method propagating a LIMIT), that large limit optimizations where getting in the way of filter pushdowns. The idea is to push the filter before applying the limit whenever there is a filter. The idea is from @Mytherin, I just added a test. **Original Optimized logical plan:** ```text ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Optimized Logical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ PROJECTION │ │ ──────────────────── │ │ Expressions: a │ │ │ │ ~0 rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ Expressions: a │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ Expressions: │ │__internal_decompress_integ│ │ ral_bigint(#0, 0) │ │ #1 │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ ORDER_BY │ │ ──────────────────── │ │ rowid │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ Expressions: │ │__internal_compress_integra│ │ l_uinteger(#0, 0) │ │ #1 │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ COMPARISON_JOIN │ │ ──────────────────── │ │ Join Type: SEMI │ │ ├──────────────┐ │ Conditions: │ │ │ (rowid = rowid) │ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ SEQ_SCAN ││ LIMIT │ │ ──────────────────── ││ ──────────────────── │ │ Table: t ││ │ │ Type: Sequential Scan ││ │ └───────────────────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ SEQ_SCAN │ │ ──────────────────── │ │ Filters: a<50 │ │ Table: t │ │ Type: Sequential Scan │ │ │ │ ~400,000 rows │ └───────────────────────────┘ ``` Logical plan after this PR: ```text ┌─────────────────────────────┐ │┌───────────────────────────┐│ ││ Optimized Logical Plan ││ │└───────────────────────────┘│ └─────────────────────────────┘ ┌───────────────────────────┐ │ PROJECTION │ │ ──────────────────── │ │ Expressions: a │ │ │ │ ~0 rows │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ LIMIT │ │ ──────────────────── │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ SEQ_SCAN │ │ ──────────────────── │ │ Filters: a<50 │ │ Table: t │ │ Type: Sequential Scan │ │ │ │ ~400,000 rows │ └───────────────────────────┘ ```

…kdb#20052) Follow-up from duckdb#19937 This PR enables commits to continue while a checkpoint is happening. Currently this is limited to commits that exclusively insert data, as any other changes (deletes, updates, catalog changes, alters, etc) still eagerly grab the checkpoint lock which will prevent a checkpoint from starting while these changes are pending, and vice versa. It is also limited to inserting data into tables **that do not have indexes**. As part of this PR, appending to a table that has indexes now grabs the checkpoint lock again. Enabling commits while checkpointing has two consequences for the system that need to be dealt with: * Checkpointing no longer checkpoints the latest commit. * While checkpointing, new commits that happen need to be written somewhere in order for them to be durable. We can no longer write them to the old WAL as we want to truncate it after our checkpoint is finished. ### Pinned Checkpoint Commit Previously checkpointing code assumed we were always checkpointing the latest commit. This is no longer correct since what is the "latest committed data" might now change *while a checkpoint is running*. Instead, what we need to do is choose a commit id on which we will checkpoint. When starting a checkpoint we get the latest commit id and checkpoint based on that commit. Subsequent commits are not written as part of the checkpoint, but can then be written as part of a future checkpoint. In order to simplify this - we ensure that after starting a checkpoint any new data that is written is always written to *new row groups*. This is managed in `DataTable::AppendLock`. Due to this, when performing the checkpoint, we only need to know "do we need to checkpoint this row group or not", rather than having to checkpoint a part of a row group. This is handled in the new method `RowGroup::ShouldCheckpointRowGroup`. #### Free Blocks Another challenge with the pinned checkpoint commit is how to manage the list of free blocks - i.e. blocks that are present in the file but are not used. Block usage is tracked globally in the `SingleFileBlockManager`. With optimistic writes, we can write to blocks in the storage layer (i.e. make them no longer free blocks). However, if a checkpoint happens at a pinned commit, any optimistic writes that happen after the commit is pinned do not belong to that checkpoint. If we don't write these blocks in the free block list, we might get dangling blocks in case an abort or rollback happens. However, the blocks are not actually free in-memory, as they are being used by the optimistically written data. In order to solve this issue we introduce a new set in the block manager - `newly_used_blocks`. This tracks blocks that are in-use, but are not yet part of a given checkpoint. ### Checkpoint WAL New commits that happen while checkpointing have to be written somewhere. In order to still allow for the checkpoint to truncate the WAL to prevent it from growing indefinitely, we introduce the concept of a **checkpoint WAL**. This is a secondary WAL that can be written to only by concurrent commits while a checkpoint is happening. When a checkpoint is started, the checkpoint flag is written to the original WAL. The checkpoint flag contains the root metadata block pointer that will be written **when the checkpoint is successful**. ``` main.db.wal [INSERT #1][COMMIT][CHECKPOINT: NEW_ROOT: #2] ``` The checkpoint flag allows us to, during recovery, figure out if a checkpoint was completed or if the checkpoint was not completed. This determines if we need to replay the WAL. This is already done in the current version to deal with a crash between flipping the root block pointer and truncating the WAL, however, in the new version this happens before **any** data is written instead of only happening at the end. After this is written, we set any new commits to write to the checkpoint WAL. For example, assume a new commit comes in that inserts some data. We will now have the following situation: ``` main.db.wal [INSERT #1][COMMIT][CHECKPOINT: NEW_ROOT: #2] main.db.checkpoint.wal [INSERT #2][COMMIT] ``` After the checkpoint is finished, we have flushed all changes in `main.db.wal` to the main database file, while the changes in `main.db.checkpoint.wal` have not been flushed. All we need to do is move over the checkpoint WAL and have it replace the original WAL. This will lead us to the following final result after the checkpoint: ``` main.db.wal [INSERT #2][COMMIT] ``` #### Recovery In order to provide ACID compliance all commits that have succeeded must be persisted even across failures. That means that any commits that are written to the checkpoint WAL need to be persisted no matter where we crash. Below is a list of failure modes: ###### Crash Before Checkpoint Complete Our situation is like this: ``` main.db [ROOT #1] main.db.wal [INSERT #1][COMMIT][CHECKPOINT: NEW_ROOT: #2] main.db.checkpoint.wal [INSERT #2][COMMIT] ``` In order to recover in this situation, we need to replay both `main.db.wal` and `main.db.checkpoint.wal`. The recovering process sees that the checkpoint root does not match the root in the database, and now also checks for the presence of a checkpoint WAL. It then replays them in order (`main.db.wal` -> `main.db.checkpoint.wal`). If this is a `READ_WRITE` connection it merges the two WALs **except for the checkpoint node** by writing a new WAL that contains the content of both WALs: ``` main.db.recovery.wal [INSERT #1][COMMIT][INSERT #2][COMMIT] ``` After that completes, it overwrites the main WAL with the recovery WAL. Finally, it removes the checkpoint WAL. ``` mv main.db.recovery.wal main.db.wal rm main.db.checkpoint.wal ``` ###### Crash During Recovery If we crash during the above recovery process (after mv, before rm) we would have this situation: ``` main.db [ROOT #1] main.db.wal [INSERT #1][COMMIT][INSERT #2][COMMIT] main.db.checkpoint.wal [INSERT #2][COMMIT] ``` This is safe to recover from because `main.db.wal` does not contain a `CHECKPOINT` node. As such, we will not replay the checkpoint WAL, and only `main.db.wal` will be replayed. ###### Crash After Checkpoint Complete, Before WAL Move Our situation is like this: ``` main.db [ROOT #2] main.db.wal [INSERT #1][COMMIT][CHECKPOINT: NEW_ROOT: #2] main.db.checkpoint.wal [INSERT #2][COMMIT] ``` In order to recover in this situation, we need to replay only `main.db.checkpoint.wal`. The recovering process sees that the checkpoint root matches the root in the database, so it knows it does not need to replay `main.db.wal`. It checks for the presence of the checkpoint WAL. It is present - and replays it. If this is a `READ_WRITE` connection it then completes the checkpoint by finalizing the move (i.e. `mv main.db.checkpoint.wal main.db.wal`). ### Other Changes / Fixes #### Windows: make `FileSystem::MoveFile` behave like Linux/MacOS On Linux/MacOS, `MoveFile` is used to mean "move and override the target file". On Windows, this would previously fail if the target exists already. This PR makes Windows behave like Linux/MacOS by using `MOVEFILE_REPLACE_EXISTING` in `MoveFileExW`. In addition, because we tend to use `MoveFile` to mean "we want to be certain this file was moved", we also enable the `MOVEFILE_WRITE_THROUGH` flag. #### SQLLogicTest While testing this PR, I realized the sqllogictest runner was swallowing exceptions thrown in certain locations and incorrectly reporting tests that should fail as succeeded. This PR fixes that and now makes these exceptions fail the test run. This revealed a bunch of failing tests, in particular around the config runners `peg_parser.json` and `encryption.json`, and a few tests in `httpfs`. A few tests were fixed, but others were skipped in the config pending looking at them in the future.

…CHAR in `shell_renderer.cpp` (duckdb#20096) Hi DuckDB Team, When I used Linux to build the main branch of DuckDB with my `adbc_scanner` extension I often encountered this link error in CI. This PR resolves it by not using `LogicalType::VARCHAR` instead using `LogicalType(LogicalTypeId::VARCHAR)`. The code that is changed is in `shell_renderer.cpp` so not anywhere in my extension code. With this change the compilation succeeds. Build error: ``` [2/3] Linking CXX executable duckdb FAILED: duckdb : && /usr/bin/c++ -g -g -O0 -DDEBUG -Wall -fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -Wunused -Werror=vla -Wnarrowing -pedantic tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/highlighting.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/history.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/linenoise.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/linenoise-c.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/rendering.cpp.o tools/shell/linenoise/CMakeFiles/duckdb_linenoise.dir/terminal.cpp.o extension/CMakeFiles/duckdb_generated_extension_loader.dir/__/codegen/src/generated_extension_loader.cpp.o tools/shell/CMakeFiles/shell.dir/shell_command_line_option.cpp.o tools/shell/CMakeFiles/shell.dir/shell_extension.cpp.o tools/shell/CMakeFiles/shell.dir/shell.cpp.o tools/shell/CMakeFiles/shell.dir/shell_helpers.cpp.o tools/shell/CMakeFiles/shell.dir/shell_metadata_command.cpp.o tools/shell/CMakeFiles/shell.dir/shell_prompt.cpp.o tools/shell/CMakeFiles/shell.dir/shell_renderer.cpp.o tools/shell/CMakeFiles/shell.dir/shell_highlight.cpp.o tools/shell/CMakeFiles/shell.dir/shell_progress_bar.cpp.o tools/shell/CMakeFiles/shell.dir/shell_render_table_metadata.cpp.o tools/shell/CMakeFiles/shell.dir/shell_windows.cpp.o -o duckdb src/libduckdb_static.a extension/adbc_scanner/libadbc_scanner_extension.a extension/core_functions/libcore_functions_extension.a extension/parquet/libparquet_extension.a extension/jemalloc/libjemalloc_extension.a third_party/utf8proc/libduckdb_utf8proc.a vcpkg_installed/x64-linux/debug/lib/libadbc_driver_manager.a vcpkg_installed/x64-linux/debug/lib/libtomlplusplus.a vcpkg_installed/x64-linux/debug/lib/libnanoarrow_static.a src/libduckdb_static.a -ldl && : /usr/bin/ld: src/libduckdb_static.a(ub_duckdb_common.cpp.o):(.rodata+0x4460): multiple definition of `duckdb::LogicalType::VARCHAR'; tools/shell/CMakeFiles/shell.dir/shell_renderer.cpp.o:(.rodata._ZN6duckdb11LogicalType7VARCHARE[_ZN6duckdb11LogicalType7VARCHARE]+0x0): first defined here collect2: error: ld returned 1 exit status ``` Distro information: Amazon Linux 2023 on AWS Linux ip-172-16-3-91.ec2.internal 6.1.158-178.288.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Nov 3 18:38:36 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux ``` [ec2-user@ip-172-16-3-91 duckdb]$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-amazon-linux/11/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-amazon-linux Configured with: ../configure --enable-bootstrap --enable-host-pie --enable-host-bind-now --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://github.com/amazonlinux/amazon-linux-2022 --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-plugin --enable-initfini-array --with-isl=/builddir/build/BUILD/gcc-11.5.0-20240719/obj-x86_64-amazon-linux/isl-install --enable-multilib --with-linker-hash-style=gnu --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_64=x86-64-v2 --with-arch_32=x86-64 --build=x86_64-amazon-linux --with-build-config=bootstrap-lto --enable-link-serialization=1 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.5.0 20240719 (Red Hat 11.5.0-5) (GCC) ``` I only knew how to resolve it from past patterns where I've seen the same problem in other extensions. You can see a full build failure here: https://github.com/Query-farm/adbc_scanner/actions/runs/20048370767/job/57498881373 Thanks, Rusty

This PR improves the `CommonSubplanOptimizer`, and should put it in "maintenance mode", at least for now, and I think I'm done improving it for a while. ## Multiple Nested Matching This PR allows multiple subplans to be matched, rather than just a single subplan, and allows nesting. Take the following query, for example: ```sql explain select distinct range from range(10) union all select distinct range from range(10) union all select range % 2 as range from (select distinct range from range(10)) group by range union all select range % 2 as range from (select distinct range from range(10)) group by range union all select count(*) from (select range % 2 as range from (select distinct range from range(10)) group by range) union all select count(*) from (select range % 2 as range from (select distinct range from range(10)) group by range); ``` This query unions 6 queries together. Each query occurs twice, and is nested in the next query. With the optimizer disabled, this yields the following plan: ``` ┌───────────────────────────┐ │ UNION ├──────────────┬────────────────────────────┬────────────────────────────┬────────────────────────────┬────────────────────────────┐ └─────────────┬─────────────┘ │ │ │ │ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ HASH_GROUP_BY ││ HASH_GROUP_BY ││ PROJECTION ││ PROJECTION ││ UNGROUPED_AGGREGATE ││ UNGROUPED_AGGREGATE │ │ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │ │ Groups: #0 ││ Groups: #0 ││ range ││ range ││ Aggregates: ││ Aggregates: │ │ ││ ││ ││ ││ count_star() ││ count_star() │ │ ~10 rows ││ ~10 rows ││ ~6 rows ││ ~6 rows ││ ││ │ └─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ PROJECTION ││ PROJECTION ││ HASH_GROUP_BY ││ HASH_GROUP_BY ││ PROJECTION ││ PROJECTION │ │ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │ │ range ││ range ││ Groups: #0 ││ Groups: #0 ││ 42 ││ 42 │ │ ││ ││ ││ ││ ││ │ │ ~10 rows ││ ~10 rows ││ ~6 rows ││ ~6 rows ││ ~6 rows ││ ~6 rows │ └─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ RANGE ││ RANGE ││ PROJECTION ││ PROJECTION ││ HASH_GROUP_BY ││ HASH_GROUP_BY │ │ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │ │ Function: RANGE ││ Function: RANGE ││ range ││ range ││ Groups: #0 ││ Groups: #0 │ │ ││ ││ ││ ││ ││ │ │ ~10 rows ││ ~10 rows ││ ~10 rows ││ ~10 rows ││ ~6 rows ││ ~6 rows │ └───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ HASH_GROUP_BY ││ HASH_GROUP_BY ││ PROJECTION ││ PROJECTION │ │ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │ │ Groups: #0 ││ Groups: #0 ││ range ││ range │ │ ││ ││ ││ │ │ ~10 rows ││ ~10 rows ││ ~10 rows ││ ~10 rows │ └─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ PROJECTION ││ PROJECTION ││ HASH_GROUP_BY ││ HASH_GROUP_BY │ │ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │ │ range ││ range ││ Groups: #0 ││ Groups: #0 │ │ ││ ││ ││ │ │ ~10 rows ││ ~10 rows ││ ~10 rows ││ ~10 rows │ └─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ RANGE ││ RANGE ││ PROJECTION ││ PROJECTION │ │ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │ │ Function: RANGE ││ Function: RANGE ││ range ││ range │ │ ││ ││ ││ │ │ ~10 rows ││ ~10 rows ││ ~10 rows ││ ~10 rows │ └───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ RANGE ││ RANGE │ │ ──────────────────── ││ ──────────────────── │ │ Function: RANGE ││ Function: RANGE │ │ ││ │ │ ~10 rows ││ ~10 rows │ └───────────────────────────┘└───────────────────────────┘ ``` As we can see, there is a lot of redundance here. With this PR (and the optimizer enabled, of course), we get the following plan: ``` ┌───────────────────────┐ │ CTE │ │ ──────────────── │ │ CTE Name: │ │ __common_subplan_1 │ │ ├────────────┐ │ Table Index: 79 │ │ │ │ │ │ ~0 rows │ │ └───────────┬───────────┘ │ ┌───────────┴───────────┐┌───────────┴───────────┐ │ HASH_GROUP_BY ││ CTE │ │ ──────────────── ││ ──────────────── │ │ Groups: #0 ││ CTE Name: │ │ ││ __common_subplan_2 │ │ ││ ├────────────┐ │ ││ Table Index: 86 │ │ │ ││ │ │ │ ~10 rows ││ ~0 rows │ │ └───────────┬───────────┘└───────────┬───────────┘ │ ┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐ │ PROJECTION ││ HASH_GROUP_BY ││ CTE │ │ ──────────────── ││ ──────────────── ││ ──────────────── │ │ range ││ Groups: #0 ││ CTE Name: │ │ ││ ││ __common_subplan_3 │ │ ││ ││ ├────────────┐ │ ││ ││ Table Index: 91 │ │ │ ││ ││ │ │ │ ~10 rows ││ ~6 rows ││ ~0 rows │ │ └───────────┬───────────┘└───────────┬───────────┘└───────────┬───────────┘ │ ┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐ │ RANGE ││ PROJECTION ││ UNGROUPED_AGGREGATE ││ UNION │ │ ──────────────── ││ ──────────────── ││ ──────────────── ││ │ │ Function: RANGE ││ range ││ Aggregates: ││ ├────────────┬────────────────────────┬────────────────────────┬────────────────────────┬────────────────────────┐ │ ││ ││ count_star() ││ │ │ │ │ │ │ │ ~10 rows ││ ~10 rows ││ ││ │ │ │ │ │ │ └───────────────────────┘└───────────┬───────────┘└───────────┬───────────┘└───────────┬───────────┘ │ │ │ │ │ ┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐┌───────────┴───────────┐ │ CTE_SCAN ││ PROJECTION ││ CTE_SCAN ││ CTE_SCAN ││ PROJECTION ││ PROJECTION ││ CTE_SCAN ││ CTE_SCAN │ │ ──────────────── ││ ──────────────── ││ ──────────────── ││ ──────────────── ││ ──────────────── ││ ──────────────── ││ ──────────────── ││ ──────────────── │ │ CTE Index: 79 ││ 42 ││ CTE Index: 79 ││ CTE Index: 79 ││ range ││ range ││ CTE Index: 91 ││ CTE Index: 91 │ │ ││ ││ ││ ││ ││ ││ ││ │ │ ~10 rows ││ ~6 rows ││ ~10 rows ││ ~10 rows ││ ~6 rows ││ ~6 rows ││ ~1 row ││ ~1 row │ └───────────────────────┘└───────────┬───────────┘└───────────────────────┘└───────────────────────┘└───────────┬───────────┘└───────────┬───────────┘└───────────────────────┘└───────────────────────┘ ┌───────────┴───────────┐ ┌───────────┴───────────┐┌───────────┴───────────┐ │ CTE_SCAN │ │ CTE_SCAN ││ CTE_SCAN │ │ ──────────────── │ │ ──────────────── ││ ──────────────── │ │ CTE Index: 86 │ │ CTE Index: 86 ││ CTE Index: 86 │ │ │ │ ││ │ │ ~6 rows │ │ ~6 rows ││ ~6 rows │ └───────────────────────┘ └───────────────────────┘└───────────────────────┘ ``` Which has 3 CTEs, 8 CTE scans, and only 3 aggregations (instead of the original 12!). ## Fuzzy Plan Matching Something that can show up in query plans is an "almost" exact subplan match. Before this PR, an exact match was required. With this PR, we can do "fuzzy" matching, where the plan is mostly the same, save for some selected columns. If we have the following query, for example: ```sql -- Create build table create table build as select range a, range * 2 b, range * 3 c, range * 4 d from (select range::utinyint as range from range(11)) where range % 5 = 0; -- Create probe table create table probe as select range e, range * 2 f, range * 3 g, range * 4 h from (select range::utinyint as range from range(11)); -- View that joins the two create view my_view as from probe join build on (build.a = probe.e); -- Select greatest of all columns, unioned with greatest of just two columns explain select greatest(a, b, c, d, e, f, g, h) from my_view union all select greatest(b, f) from my_view; ``` Here, one of the unioned queries selects all columns, and the other selects just two columns. As we can see, the join (coming from the view) in the second query is "contained" in the join in first query, as the first query selects all columns that the second query needs. We currently get the following plan (the optimizer doesn't trigger): ``` ┌───────────────────────────┐ │ UNION ├───────────────────────────────────────────┐ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐ │ PROJECTION │ │ PROJECTION │ │ ──────────────────── │ │ ──────────────────── │ │ greatest(a, b, c, d, e, f,│ │ greatest(b, f) │ │ g, h) │ │ │ │ │ │ │ │ ~3 rows │ │ ~3 rows │ └─────────────┬─────────────┘ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐ │ PROJECTION │ │ HASH_JOIN │ │ ──────────────────── │ │ ──────────────────── │ │ e │ │ Join Type: INNER │ │ f │ │ Conditions: e = a │ │ g │ │ │ │ h │ │ │ │ a │ │ ├──────────────┐ │ b │ │ │ │ │ c │ │ │ │ │ d │ │ │ │ │ │ │ │ │ │ ~3 rows │ │ ~3 rows │ │ └─────────────┬─────────────┘ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ HASH_JOIN │ │ SEQ_SCAN ││ SEQ_SCAN │ │ ──────────────────── │ │ ──────────────────── ││ ──────────────────── │ │ Join Type: INNER │ │ Table: probe ││ Table: build │ │ Conditions: e = a │ │ Type: Sequential Scan ││ Type: Sequential Scan │ │ │ │ ││ │ │ ├──────────────┐ │ Projections: ││ Projections: │ │ │ │ │ e ││ a │ │ │ │ │ f ││ b │ │ │ │ │ ││ │ │ ~3 rows │ │ │ ~11 rows ││ ~3 rows │ └─────────────┬─────────────┘ │ └───────────────────────────┘└───────────────────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ SEQ_SCAN ││ SEQ_SCAN │ │ ──────────────────── ││ ──────────────────── │ │ Table: probe ││ Table: build │ │ Type: Sequential Scan ││ Type: Sequential Scan │ │ ││ │ │ Projections: ││ Projections: │ │ e ││ a │ │ f ││ b │ │ g ││ c │ │ h ││ d │ │ ││ │ │ ~11 rows ││ ~3 rows │ └───────────────────────────┘└───────────────────────────┘ ``` With the improvements to the optimizer in this PR, we now get this plan: ``` ┌───────────────────────────┐ │ CTE │ │ ──────────────────── │ │ CTE Name: │ │ __common_subplan_1 │ │ ├───────────────────────────────────────────┐ │ Table Index: 29 │ │ │ │ │ │ ~0 rows │ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐ │ HASH_JOIN │ │ UNION │ │ ──────────────────── │ │ │ │ Join Type: INNER │ │ │ │ Conditions: e = a ├──────────────┐ │ ├──────────────┐ │ │ │ │ │ │ │ ~3 rows │ │ │ │ │ └─────────────┬─────────────┘ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ SEQ_SCAN ││ SEQ_SCAN ││ PROJECTION ││ PROJECTION │ │ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │ │ Table: probe ││ Table: build ││ greatest(a, b, c, d, e, f,││ greatest(b, f) │ │ Type: Sequential Scan ││ Type: Sequential Scan ││ g, h) ││ │ │ ││ ││ ││ │ │ Projections: ││ Projections: ││ ││ │ │ e ││ a ││ ││ │ │ f ││ b ││ ││ │ │ g ││ c ││ ││ │ │ h ││ d ││ ││ │ │ ││ ││ ││ │ │ ~11 rows ││ ~3 rows ││ ~3 rows ││ ~3 rows │ └───────────────────────────┘└───────────────────────────┘└─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ PROJECTION ││ PROJECTION │ │ ──────────────────── ││ ──────────────────── │ │ e ││ #1 │ │ f ││ #4 │ │ g ││ │ │ h ││ │ │ a ││ │ │ b ││ │ │ c ││ │ │ d ││ │ │ ││ │ │ ~3 rows ││ ~0 rows │ └─────────────┬─────────────┘└─────────────┬─────────────┘ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ CTE_SCAN ││ CTE_SCAN │ │ ──────────────────── ││ ──────────────────── │ │ CTE Index: 29 ││ CTE Index: 29 │ │ ││ │ │ ~3 rows ││ ~3 rows │ └───────────────────────────┘└───────────────────────────┘ ``` As we can see, the join is materialized as a CTE, and scanned twice. Columns that aren't needed are projected out after the CTE scan. ## Benchmark Improvements With the changes, we now trigger more subplan elimination on TPC-DS and TPC-H. Here are some results that I collected on my laptop. TPC-DS SF100 improvements: Q61: 0.61s -> 0.33s (~1.8x) Q70: 0.81s -> 0.55s (~1.5x) TPC-H SF100 improvements: Q11: 0.16s -> 0.12s (~1.3x) ## Notes I needed a lot of indirection to get the column bindings to match with the fuzzy plan matching, which required a lot of `unordered_map`s. To avoid doing so many allocations, I've also implemented `arena_unordered_map`, so that the number of allocations grow logarithmically. I'm not sure how much this helped, but it's just something I wanted to do as we are trying to reduce allocations. Overall, this optimizer now takes ~10% of total optimization time.

…0283) Fix for: duckdblabs/duckdb-internal#6809 , duckdb#20086 I would like someone to take a look at this before I run CI, to see if the fix makes sense. In ConstantOrNullFunction, there is a bug where if the first loop iteration is a FLAT_VECTOR, the result validity mask is created as a reference to the validity mask of args.data[idx]. If the subsequent iteration is the default branch (say, a DICTIONARY_VECTOR), and we call result_mask.SetInvalid(i), this is now overwriting the validity mask of the first input column where the reference was created. I believe the fix for this is to call EnsureWritable in the FLAT_VECTOR case, to make sure the validity mask is not a reference to the input's validity mask before we call ```cpp result_mask.Combine(input_mask, args.size()) ``` (which is where the alias is actually created). The reproducer hits this case -- a specific scenario of unique index + update + no checkpointing was leading to the this scenario. For reference, here is the query plan of the last query in the reproducer, where the bug was occuring. The t1.c0 column is being passed as a FLAT_VECTOR to constantOrNullFunction, and the t0.c1 column is being passed in as a dictionary vector. Since the argument at index 1 in ConstantOrNullFunction is the c0 column in the output, we were overwriting NULLs into the ouput since the filter was overwriting the validity mask in ConstantOrNullFunction: ``` ┌───────┬───────┬───────┐ │ c0 │ c0 │ c1 │ │ int32 │ int32 │ int32 │ ├───────┼───────┼───────┤ │ NULL │ 1 │ NULL │ │ NULL │ -1 │ NULL │ └───────┴───────┴───────┘ ``` Whereas it should be: ``` ┌───────┬───────┬───────┐ │ c0 │ c0 │ c1 │ │ int32 │ int32 │ int32 │ ├───────┼───────┼───────┤ │ 0 │ 1 │ NULL │ │ NULL │ -1 │ NULL │ └───────┴───────┴───────┘ ``` ```┌────────────────────────────────────────────────┐ │┌──────────────────────────────────────────────┐│ ││ Total Time: 9.18s ││ │└──────────────────────────────────────────────┘│ └────────────────────────────────────────────────┘ ┌───────────────────────────┐ │ QUERY │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ EXPLAIN_ANALYZE │ │ ──────────────────── │ │ 0 rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ c0 │ │ c0 │ │ c1 │ │ │ │ 2 rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ #3 │ │ #7 │ │ #11 │ │ │ │ 2 rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ FILTER │ │ ──────────────────── │ │ (constant_or_null(false, │ │ c0, c1) IS NULL) │ │ │ │ 2 rows │ │ (1.82s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ NULL │ │ #6 │ │ NULL │ │ #5 │ │ NULL │ │ #4 │ │ NULL │ │ #3 │ │ NULL │ │ #2 │ │ NULL │ │ #1 │ │ NULL │ │ #0 │ │ NULL │ │ │ │ 2 rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ NULL │ │ #2 │ │ NULL │ │ #1 │ │ NULL │ │ #0 │ │ NULL │ │ │ │ 2 rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ POSITIONAL_SCAN │ │ ──────────────────── │ │ 2 rows ├──────────────┐ │ (7.30s) │ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ TABLE_SCAN ││ TABLE_SCAN │ │ ──────────────────── ││ ──────────────────── │ │ Table: t1 ││ Table: t0 │ │ Type: Sequential Scan ││ Type: Sequential Scan │ │ Projections: c0 ││ │ │ ││ Projections: │ │ ││ c1 │ │ ││ c0 │ │ ││ │ │ 0 rows ││ 0 rows │ │ (0.00s) ││ (0.00s) │ └───────────────────────────┘└───────────────────────────┘ ```

krleonid added 4 commits November 3, 2025 15:52

Add appender clear functionality

29d4bd2

Merge branch 'main' of github.com:krleonid/duckdb into appender-clear

4e7aecd

Update appender clear tests

d0ed0c4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add appender clear functionality #1

Add appender clear functionality #1

Uh oh!

krleonid commented Nov 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add appender clear functionality #1

Are you sure you want to change the base?

Add appender clear functionality #1

Uh oh!

Conversation

krleonid commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

krleonid commented Nov 3, 2025 •

edited

Loading