21577: feat: Extend sqllogictest framework to uptake custom datafusion.format.* settings#308
21577: feat: Extend sqllogictest framework to uptake custom datafusion.format.* settings#308martin-augment wants to merge 2 commits intomainfrom
sqllogictest framework to uptake custom datafusion.format.* settings#308Conversation
WalkthroughThe pull request adds repository-wide rules for AI agent behavior in code reviews, constraining them from linking to GitHub issues/pull requests and from reviewing AI configuration files. It also refactors the DataFusion sqllogictest formatting pipeline by introducing a new ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces support for session-level formatting options in sqllogictest, allowing query results to reflect settings like datafusion.format.date_format and datafusion.format.timestamp_format. While the infrastructure for passing FormatOptions is implemented, several issues were identified: the cell_to_string function still uses a hardcoded NULL string instead of the provided format options, and get_format_options explicitly overrides user-defined null labels. Additionally, the current implementation creates an ArrayFormatter for every cell, which is inefficient and should be optimized to occur once per column. New tests were added to set_variable.slt, but they currently only verify that configuration variables can be set, not that they actually affect the output of null values.
| pub fn cell_to_string( | ||
| col: &ArrayRef, | ||
| row: usize, | ||
| is_spark_path: bool, | ||
| format_options: &FormatOptions<'_>, | ||
| ) -> Result<String> { |
There was a problem hiding this comment.
While format_options is now passed to cell_to_string, the function's implementation (specifically for null values) still uses the hardcoded NULL_STR ("NULL"). To fully support the datafusion.format.null setting as intended by this PR, the implementation should be updated to use format_options.null_label for both top-level and nested null values.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Gemini AI reviewer is correct! There are several usages of the NULL_STR constant still in use. They should be replaced with format_options.null to make it following the configured formatting.
| (&datafusion_format_options).try_into().unwrap(); | ||
|
|
||
| let f = ArrayFormatter::try_new(col.as_ref(), &arrow_format_options)?; | ||
| let f = ArrayFormatter::try_new(col.as_ref(), format_options)?; |
There was a problem hiding this comment.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The same formatter could be used for all values in a column, so it could be reused per column instead of re-created for each value. This will reduce the memory pressure for tests with many results.
| // Get passed custom FormatOptions by SessionContext to be used for sqllogictest | ||
| pub fn get_format_options(ctx: &SessionContext) -> Result<FormatOptions> { | ||
| let mut df_format = ctx.state().config().options().format.clone(); | ||
| df_format.set("null", "NULL")?; |
There was a problem hiding this comment.
This line explicitly overrides the user's datafusion.format.null setting with "NULL". While this ensures compatibility with standard .slt files, it prevents users from testing custom null representations, which seems to be part of the goal of this PR (as evidenced by the new tests in set_variable.slt). Consider only setting this to "NULL" if the user hasn't provided a custom value, or clarify if null is intentionally excluded from the custom formatting support.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Gemini AI reviewer is correct! This will set the format.null option to NULL, to be backward compatible with the NULL_STR constant. But to make this option customizable like the date/time formatting this should be done only if the current value is "", i.e. the default value from FormatOptions::default(). A custom value set with SET datafusion.format.null=... should be not be overwritten.
|
|
||
| # null display string (default: (empty)) | ||
| statement ok | ||
| SET datafusion.format.null = 'NuLL' |
There was a problem hiding this comment.
This test verifies that the datafusion.format.null setting can be changed and shown, but it doesn't verify that the setting actually affects query output. Given the current implementation in util.rs and normalize.rs, query results will still use the hardcoded "NULL" string regardless of this setting. Consider adding a query test case here once the implementation is fixed.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The test just verifies that the formatting option is properly set but it does not verify that it is being used when a SELECT query returns a null value. An addition test should be added.
Code ReviewThe overall approach is sound — passing A few issues worth addressing: Bug: Unused import will fail
|
🤖 Augment PR SummarySummary: This PR updates the DataFusion sqllogictest runners to respect session-level Changes:
Technical Notes: Uses 🤖 Was this summary useful? React with 👍 or 👎 |
| query P | ||
| SELECT TIMESTAMPTZ '2026-04-07 13:31:00'; | ||
| ---- | ||
| 2026-04-07T13:31:00 |
There was a problem hiding this comment.
timestamp_tz_format is set to a pattern that includes %z, but the expected output for the TIMESTAMPTZ query doesn’t include any timezone/offset. This may not actually validate that datafusion.format.timestamp_tz_format is being applied (or could become flaky if TIMESTAMPTZ formatting starts including the offset).
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! The test validates that the formatting setting is not used. It should be fixed to verify that the custom date formatting is used!
| // Get passed custom FormatOptions by SessionContext to be used for sqllogictest | ||
| pub fn get_format_options(ctx: &SessionContext) -> Result<FormatOptions> { | ||
| let mut df_format = ctx.state().config().options().format.clone(); | ||
| df_format.set("null", "NULL")?; |
There was a problem hiding this comment.
get_format_options overwrites the session’s datafusion.format.null to "NULL", so SET datafusion.format.null = ... won’t affect how NULLs are displayed in query output even though other datafusion.format.* settings now do. Is that intentional for sqllogictest semantics, and should it be documented/tests reflect that behavior?
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Augment AI reviewer is correct! This will set the format.null option to NULL, to be backward compatible with the NULL_STR constant. But to make this option customizable like the date/time formatting this should be done only if the current value is "", i.e. the default value from FormatOptions::default(). A custom value set with SET datafusion.format.null=... should be not be overwritten.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 6623349. Configure here.
| query P | ||
| SELECT TIMESTAMPTZ '2026-04-07 13:31:00'; | ||
| ---- | ||
| 2026-04-07T13:31:00 |
There was a problem hiding this comment.
Test for timestamp_tz_format doesn't verify format is applied
Medium Severity
The timestamp_tz_format test sets the format to '%Y-%m-%d %H:%M:%S %z' but expects 2026-04-07T13:31:00, which matches the default timestamp_format, not the custom tz format. This happens because datafusion.execution.time_zone is NULL at this point, causing TIMESTAMPTZ '...' to produce Timestamp(Nanosecond, None) — a non-tz type — so timestamp_tz_format is never applied. The test passes but doesn't actually validate that the timestamp_tz_format setting affects output, giving false confidence that the feature works through the sqllogictest framework.
Reviewed by Cursor Bugbot for commit 6623349. Configure here.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! The test validates that the formatting setting is not used. It should be fixed to verify that the custom date formatting is used!
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
datafusion/sqllogictest/test_files/set_variable.slt (1)
504-520: Inconsistent query type annotation for TIME.The TIME query uses
query D(DateTime type) on lines 504 and 517, but TIME values might be better represented with a different type annotation. This appears to be pre-existing behavior and not directly related to this PR's changes, so it's a minor observation.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@datafusion/sqllogictest/test_files/set_variable.slt` around lines 504 - 520, The TIME tests are annotated with "query D" (DateTime) but should use a time-specific annotation; update the two TIME test blocks that start with "query D" for "SELECT TIME '01:02:12.123' AS time;" to use "query T" (or whatever test label your test harness expects for TIME) so the expected type matches the value, and keep the RESET/SHOW lines as-is (verify both occurrences around the RESET datafusion.format.time_format and after the SHOW remain consistent).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@datafusion/sqllogictest/src/engines/datafusion_substrait_roundtrip_engine/runner.rs`:
- Line 29: The import "ConfigField" is unused in runner.rs; remove the line "use
datafusion::config::ConfigField;" from that file (search for the symbol
ConfigField to confirm there are no references), then run a build to ensure no
other code depends on it and tidy imports (cargo fmt/cargo check) to finalize
the cleanup.
---
Nitpick comments:
In `@datafusion/sqllogictest/test_files/set_variable.slt`:
- Around line 504-520: The TIME tests are annotated with "query D" (DateTime)
but should use a time-specific annotation; update the two TIME test blocks that
start with "query D" for "SELECT TIME '01:02:12.123' AS time;" to use "query T"
(or whatever test label your test harness expects for TIME) so the expected type
matches the value, and keep the RESET/SHOW lines as-is (verify both occurrences
around the RESET datafusion.format.time_format and after the SHOW remain
consistent).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 38522c38-d63b-4bb7-8e6f-788f7b581e64
📒 Files selected for processing (7)
.cursor/rules.mdAGENTS.mddatafusion/sqllogictest/src/engines/datafusion_engine/normalize.rsdatafusion/sqllogictest/src/engines/datafusion_engine/runner.rsdatafusion/sqllogictest/src/engines/datafusion_substrait_roundtrip_engine/runner.rsdatafusion/sqllogictest/src/util.rsdatafusion/sqllogictest/test_files/set_variable.slt
| }; | ||
| use arrow::record_batch::RecordBatch; | ||
| use async_trait::async_trait; | ||
| use datafusion::config::ConfigField; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Description: Check if ConfigField is used in this file beyond the import
rg -n 'ConfigField' datafusion/sqllogictest/src/engines/datafusion_substrait_roundtrip_engine/runner.rsRepository: martin-augment/datafusion
Length of output: 108
🏁 Script executed:
cd datafusion/sqllogictest && cargo clippy --all-targets --all-features -- -D warnings 2>&1 | grep -A 5 -B 5 "ConfigField\|unused" | head -100Repository: martin-augment/datafusion
Length of output: 155
🏁 Script executed:
# Read the entire file to check for ConfigField usage in context
cat -n datafusion/sqllogictest/src/engines/datafusion_substrait_roundtrip_engine/runner.rs | head -60Repository: martin-augment/datafusion
Length of output: 2777
🏁 Script executed:
# Also search more broadly for ConfigField in the entire directory
rg 'ConfigField' datafusion/sqllogictest/src/engines/datafusion_substrait_roundtrip_engine/Repository: martin-augment/datafusion
Length of output: 189
Remove unused import.
The ConfigField import on line 29 is not used in this file. It appears this import was added but is not referenced anywhere in the runner.rs file.
Proposed fix
-use datafusion::config::ConfigField;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| use datafusion::config::ConfigField; | |
| (This line should be removed from the imports section) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@datafusion/sqllogictest/src/engines/datafusion_substrait_roundtrip_engine/runner.rs`
at line 29, The import "ConfigField" is unused in runner.rs; remove the line
"use datafusion::config::ConfigField;" from that file (search for the symbol
ConfigField to confirm there are no references), then run a build to ensure no
other code depends on it and tidy imports (cargo fmt/cargo check) to finalize
the cleanup.
value:useful; category:bug; feedback: The Claude AI reviewer is correct! The test validates that the formatting setting is not used. It should be fixed to verify that the custom date formatting is used! |
value:useful; category:bug; feedback: The Claude AI reviewer is correct! This will set the format.null option to NULL, to be backward compatible with the NULL_STR constant. But to make this option customizable like the date/time formatting this should be done only if the current value is "", i.e. the default value from FormatOptions::default(). A custom value set with SET datafusion.format.null=... should be not be overwritten. |
value:good-to-have; category:documentation; feedback: The Claude AI reviewer is correct! The new public function has a doc comment instead of a proper docstring that will be rendered in the produced documentation. The doc comment will be visible only in the source code. |


21577: To review by AI