Skip to content

Conversation

@Tishj
Copy link
Owner

@Tishj Tishj commented Dec 16, 2025

No description provided.

Mytherin and others added 21 commits December 16, 2025 12:33
Doing a try_cast on a variant value containing a struct that had no
matching members with the cast target struct was failing with `Binder
Error: STRUCT to STRUCT cast must have at least one matching member`.

This adds a test for that case and handles it correctly by setting it to
null.
Hi DuckDB Team,

When using the profiler for queries that have the same table name in
different schemas, it was impossible to determine which node was
scanning which table. This PR adds the catalog and schema name to the
`TableScanToString` method so that the particular table in a catalog and
schema can be identified.

Thanks,

Rusty
Hi DuckDB Team,

When reviewing `EXPLAIN ANALYZE` output, operations backed by
`MultiFileReader` are often reported like this:

```json
{
  "system_peak_temp_dir_size": 0,
  "system_peak_buffer_memory": 0,
  "result_set_size": 480000,
  "operator_cardinality": 60000,
  "operator_rows_scanned": 469392,
  "cpu_time": 0.135243666,
  "operator_name": "READ_CSV_AUTO ",
  "extra_info": {
    "Function": "READ_CSV_AUTO",
    "Projections": "range",
    "Estimated Cardinality": "58674",
    "Total Files Read": "6"
  },
  "operator_type": "TABLE_SCAN",
  "cumulative_rows_scanned": 469392,
  "operator_timing": 0.135243666,
  "cumulative_cardinality": 60000,
  "children": []
}
```

This PR extends `extra_info` with a new key, `"Filename(s)"`, which
reports the first five filenames provided by the user. This makes it
much easier to distinguish between multiple calls to `read_csv_auto` (or
other functions built on the `MultiFileReader` infrastructure) within
the same query, since the specific files being read are now visible.

A few notes:

1. I intentionally avoided logging the fully expanded filenames; this
might be worth reconsidering.
2. The limit of five filenames is arbitrary and mainly intended to keep
the output concise.
3. It would be nice if `extra_info` could accept a full `duckdb::Value`
instead of only strings—this would allow representing filenames as a
`LIST()` of strings, which would be more natural in the JSON output.

Thank you!

Rusty
After this we only need support for compressed execution (e.g.
constant/flat/dict-vectors) in scalar functions to start moving over
some more complex extensions to the C-API. (e.g half of spatial).
* Add missing dictionary indirection
… (performance) improvements to `.mode duckbox` (duckdb#20223)

This PR adds the `.last` command to the CLI that can be used to
re-render the last query result without any limits.

This is useful in particular when using the default (duckbox) rendering
- since that truncates rows and values in order to quickly show a
summary of the result. If there is information missing that you want to
see, you can now quickly type `.last` to re-render the result in a pager
without these limits.


For example:

```sql
memory D from lineitem.parquet;
┌────────────┬───────────┬───────────┬──────────────┬───┬───────────────┬───────────────────┬────────────┬───────────────────────────┐
│ l_orderkey │ l_partkey │ l_suppkey │ l_linenumber │ … │ l_receiptdate │  l_shipinstruct   │ l_shipmode │         l_comment         │
│   int64    │   int64   │   int64   │    int64     │ … │     date      │      varchar      │  varchar   │          varchar          │
├────────────┼───────────┼───────────┼──────────────┼───┼───────────────┼───────────────────┼────────────┼───────────────────────────┤
│          1 │    155190 │      7706 │            1 │ … │ 1996-03-22    │ DELIVER IN PERSON │ TRUCK      │ to beans x-ray carefull   │
│          1 │     67310 │      7311 │            2 │ … │ 1996-04-20    │ TAKE BACK RETURN  │ MAIL       │  according to the final … │
│          1 │     63700 │      3701 │            3 │ … │ 1996-01-31    │ TAKE BACK RETURN  │ REG AIR    │ ourts cajole above the f… │
│          1 │      2132 │      4633 │            4 │ … │ 1996-05-16    │ NONE              │ AIR        │ s cajole busily above t   │
│          1 │     24027 │      1534 │            5 │ … │ 1996-04-01    │ NONE              │ FOB        │  the regular, regular pa  │
│          · │        ·  │        ·  │            · │ … │ ·             │  ·                │  ·         │     ·                     │
│          · │        ·  │        ·  │            · │ … │ ·             │  ·                │  ·         │     ·                     │
│          · │        ·  │        ·  │            · │ … │ ·             │  ·                │  ·         │     ·                     │
│    5999975 │      7272 │      2273 │            1 │ … │ 1993-10-21    │ COLLECT COD       │ REG AIR    │ ld deposits aga           │
│    5999975 │      6452 │      1453 │            2 │ … │ 1993-11-19    │ DELIVER IN PERSON │ SHIP       │ ffily along the sly       │
│    5999975 │     37131 │      2138 │            3 │ … │ 1993-12-08    │ DELIVER IN PERSON │ FOB        │ counts cajole evenly? sl… │
│    6000000 │     32255 │      2256 │            1 │ … │ 1996-12-01    │ TAKE BACK RETURN  │ MAIL       │ riously pe                │
│    6000000 │     96127 │      6128 │            2 │ … │ 1996-10-21    │ NONE              │ AIR        │ pecial excuses nag evenl… │
├────────────┴───────────┴───────────┴──────────────┴───┴───────────────┴───────────────────┴────────────┴───────────────────────────┤
│ 6001215 rows (6.00 million rows, 10 shown)                                                                    16 columns (8 shown) │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
memory D .last
┌────────────┬───────────┬───────────┬──────────────┬───────────────┬─────────────────┬───────────────┬───────────────┬──────────────>
│ l_orderkey │ l_partkey │ l_suppkey │ l_linenumber │  l_quantity   │ l_extendedprice │  l_discount   │     l_tax     │ l_returnflag >
│   int64    │   int64   │   int64   │    int64     │ decimal(15,2) │  decimal(15,2)  │ decimal(15,2) │ decimal(15,2) │   varchar    >
├────────────┼───────────┼───────────┼──────────────┼───────────────┼─────────────────┼───────────────┼───────────────┼──────────────>
│          1 │    155190 │      7706 │            1 │         17.00 │        21168.23 │          0.04 │          0.02 │ N            >
│          1 │     67310 │      7311 │            2 │         36.00 │        45983.16 │          0.09 │          0.06 │ N            >
│          1 │     63700 │      3701 │            3 │          8.00 │        13309.60 │          0.10 │          0.02 │ N            >
│          1 │      2132 │      4633 │            4 │         28.00 │        28955.64 │          0.09 │          0.06 │ N            >
│          1 │     24027 │      1534 │            5 │         24.00 │        22824.48 │          0.10 │          0.04 │ N            >
│          1 │     15635 │       638 │            6 │         32.00 │        49620.16 │          0.07 │          0.02 │ N            >
│          2 │    106170 │      1191 │            1 │         38.00 │        44694.46 │          0.00 │          0.05 │ N            >
│          3 │      4297 │      1798 │            1 │         45.00 │        54058.05 │          0.06 │          0.00 │ R            >
│          3 │     19036 │      6540 │            2 │         49.00 │        46796.47 │          0.10 │          0.00 │ R            >
│          3 │    128449 │      3474 │            3 │         27.00 │        39890.88 │          0.06 │          0.07 │ A            >
│          3 │     29380 │      1883 │            4 │          2.00 │         2618.76 │          0.01 │          0.06 │ A            >
│          3 │    183095 │       650 │            5 │         28.00 │        32986.52 │          0.04 │          0.00 │ R            >
│          3 │     62143 │      9662 │            6 │         26.00 │        28733.64 │          0.10 │          0.02 │ A            >
│          4 │     88035 │      5560 │            1 │         30.00 │        30690.90 │          0.03 │          0.08 │ N            >
│          5 │    108570 │      8571 │            1 │         15.00 │        23678.55 │          0.02 │          0.04 │ R            >
│          5 │    123927 │      3928 │            2 │         26.00 │        50723.92 │          0.07 │          0.08 │ R            >
│          5 │     37531 │        35 │            3 │         50.00 │        73426.50 │          0.08 │          0.03 │ A            >
│          6 │    139636 │      2150 │            1 │         37.00 │        61998.31 │          0.08 │          0.03 │ A            >
│          7 │    182052 │      9607 │            1 │         12.00 │        13608.60 │          0.07 │          0.03 │ N            >
│          7 │    145243 │      7758 │            2 │          9.00 │        11594.16 │          0.08 │          0.08 │ N            >
```

Note that the `.last` view is opened in a pager, and the user can use
e.g. the arrow keys to scroll to see more columns, or various pager
commands to scroll through the result set.


#### Performance

In order to facilitate this command working quickly, we make several
improvements to the `BoxRenderer`. In particular, we want the rendering
to be more "stream-y" because we want the first bytes to hit the screen
ASAP as that matters for the pager. The following improvements are made:

* Split up the `Analyze` step from the rendering step. This determines
the column widths based on the result before actually constructing any
render values.
* Introduce `max_analyze_rows` - the maximum rows to analyze for column
widths. This is by default set to 100K, and can be specified as the
second parameter to `.maxrows` (i.e. `.maxrows [MAX_ROWS]
[MAX_ANALYZE_ROWS]`).

Note that this relates to *rendered rows*, not to rows in the result
set, so this will only matter when large result sets are rendered.

* Render in a streaming manner. Avoid first constructing all rows to
render to then render them. Instead do the `prepare -> render` cycle in
a loop to allow faster rendering.

* Several performance bottlenecks removed from rendering, e.g. avoid
copying types for every value, reduce unnecessary copies / scans of the
result

Because of these changes `.last` is pretty fast. For example, running
`FROM lineitem` with TPC-H SF10, and then running `.last` starts
rendering the result in ~100ms. There's still a bunch of relatively easy
performance wins here, e.g. switching from `std::string` to `string_t` -
but for now this seems sufficient.
For duckdblabs/duckdb-internal#6841 -- not an
important bug fix, just improving error output to make it throw instead
of crash, so this is going on main.

Throw an exception if there is an ARTKey mismatch. Before, it was
returning an INVALID_INDEX constant which was not being handled
anywhere, which led to a SIGSEGV failure.

Technically, this could be handled higher up, but this function is only
invoked from one place, and the likely scenario is attempting to insert
a duplicate rowid on a corrupted ART (since to get to the end of the
loop, all the bytes have to match). Throwing here avoids having to
either 1) do a check on the return value of the function in the majority
of cases where the ART is not corrupted or 2) only having the check work
on debug mode.
I get the following compilation warnings under debug build.
```sh
/home/vscode/duckdb/tools/shell/shell.cpp: In member function 'int duckdb_shell::ShellState::DoMetaCommand(const std::string&)':
/home/vscode/duckdb/tools/shell/shell.cpp:2462:95: warning: comparison of integer expressions of different signedness: 'int' and 'std::vector<std::__cxx11::basic_string<char>, std::allocator<std::__cxx11::basic_string<char> > >::size_type' {aka 'long unsigned int'} [-Wsign-compare]
 2462 |                         } else if (command.argument_count == 0 || int(command.argument_count) == args.size()) {
      |                                                                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
/home/vscode/duckdb/tools/shell/shell_renderer.cpp: In member function 'void duckdb_shell::ModeBoxRenderer::PrintBoxRowSeparator(duckdb_shell::PrintStream&, duckdb_shell::idx_t, const char*, const char*, const char*, const duckdb::vector<long unsigned int>&)':
/home/vscode/duckdb/tools/shell/shell_renderer.cpp:662:39: warning: comparison of integer expressions of different signedness: 'int' and 'duckdb_shell::idx_t' {aka 'long unsigned int'} [-Wsign-compare]
  662 |                         for (i = 1; i < nArg; i++) {
      |                                     ~~^~~~~~

/home/vscode/duckdb/extension/parquet/column_reader.cpp:393:42: warning: comparison of integer expressions of different signedness: 'uint32_t' {aka 'unsigned int'} and 'int32_t' {aka 'int'} [-Wsign-compare]
  393 |                 if (compressed_page_size != page_hdr.uncompressed_page_size) {
      |                     ~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
* Implement streaming with IGNORE NULLs for first/last_value
* Fix overwrite issue found by VS=2.
* Add missing dictionary indirection
…ckdb#20063)

This PR changes the struct projection pushdown to push the extract down
to the storage.
I'll explain the differences in the different levels:

## Current Behavior

#### Expression:
The `struct_extract(s, 'a')` expression remains unchanged

#### Storage:
We push down `scan_child_column` booleans, to indicate which fields are
left unread from the STRUCT.
The fields that aren't touched are left empty (null) so we skip reading
them.
What is emitted from the storage is a column of type STRUCT.

#### Execution:
The execution of the `struct_extract` function takes out the field it
needs.

## New Behavior

#### Expression:
The `struct_extract(s, 'a')` is replaced with a new
`BoundColumnRefExpression`, referencing the field we emit directly from
the `LogicalGet`

#### Storage:
We don't emit a `STRUCT` column if only a subset of the fields are
needed by the query, instead we emit the individually referenced
field(s)
What is emitted from the storage is now a column of the type of the
field.

#### Execution:
The `BoundReferenceExpression` just references the field that was
emitted directly by the storage.

## Technical details

We communicate this new "pushdown extract" behavior through the
`ColumnIndex` down to the `StorageIndex`, as well as the `LogicalType`
of the field we're scanning.

## Limitations (Future Work)

We disable the pushdown of the extract if fields of the struct are
referenced by a filter.
To make that work correctly, we would need to rewrite the
TableFilterSet, which I suspect has far-reaching consequences I'm not
ready to deal with yet.
Vector intermediate(child_type, target_count);
field.Scan(transaction, vector_index, field_state, intermediate, target_count);
//! FIXME: this is an optional ptr.. when is it not set??
VectorOperations::Cast(*state.context.GetClientContext(), intermediate, target_vector, target_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be a generic expression instead of a cast - where we keep the cast state in the ColumnScanState. This could prevent re-initializing and such, and would be more flexible in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, one optimization this could enable in the future is pushing expressions into the projection which is useful if they exist in both the filter and the projection, e.g.:

SELECT regexp_extract('...') AS extracted
FROM tbl
WHERE extracted = ...

Currently we need to execute regexp_extract twice (once in the filter, once in the projection) - but allowing expressions to be pushed would avoid this.

Vector intermediate(child_type, target_count);
field.Scan(transaction, vector_index, field_state, intermediate, target_count);
//! FIXME: this is an optional ptr.. when is it not set??
VectorOperations::Cast(*state.context.GetClientContext(), intermediate, target_vector, target_count);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the ClientContext can be made mandatory for scans.

if (child_type != result.GetType()) {
Vector intermediate(child_type, 1);
sub_column.FetchRow(transaction, state, child_storage_index, row_id, intermediate, 0);
auto context = transaction.transaction->context.lock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should also be mandatory

lnkuiper and others added 8 commits December 17, 2025 14:32
Fixes: duckdblabs/duckdb-internal#5904

This PR introduces the `SWITCH` expression as syntactic sugar for the
standard `CASE` `WHEN/THEN` expression.

This is a first version of the `SWITCH` statement, so I am also looking
for feedback and see what others think of this syntax :)

~~### General Syntax~~

~~```sql~~
SWITCH(expression, value THEN result [, value THEN result ...] [, ELSE
default])
~~```~~

~~### Example

~~```sql~~
-- This syntax:
SELECT SWITCH(x, 1 THEN 'a', 2 THEN 'b' ELSE 'c') FROM tbl;
-- Is equivalent to:
SELECT CASE x WHEN 1 THEN 'a' WHEN 2 THEN 'b' ELSE 'c' END FROM tbl;
~~```~~

~~Implementation Notes~~
~~`SWITCH` has been added as a **reserved** keyword.~~
~~The logic is handled entirely within the parser. The `SWITCH`
expression is immediately rewritten into a `CaseExpression` node. This
means no changes were required for the transformer or binder.~~
~~I utilized the `THEN` keyword to separate the comparison value from
the result (e.g., `1 THEN 'a'`).~~
~~I attempted to use a comma (`,`) or a colon-style syntax (`CASE val :
result`), but these approaches caused shift/reduce conflicts in the
grammar. Reusing `THEN` resolved these ambiguities cleanly.~~

See comment below for reworked syntax.
…ve, to make sure we only perform the cast at the deepest level
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.