Skip to content

21387: feat: add is_nullable scalar UDF#305

Open
martin-augment wants to merge 3 commits intomainfrom
pr-21387-2026-04-07-12-56-57
Open

21387: feat: add is_nullable scalar UDF#305
martin-augment wants to merge 3 commits intomainfrom
pr-21387-2026-04-07-12-56-57

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

21387: To review by AI

adriangb and others added 3 commits April 5, 2026 13:54
…ection

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 7, 2026

Walkthrough

The changes introduce a new is_nullable scalar UDF to DataFusion that returns a boolean indicating whether an expression's field is nullable at the schema level. The implementation includes the UDF definition, module registration, comprehensive test coverage via Sqllogictest, and user-facing documentation. Additionally, new rules were added to AI configuration files restricting AI agents from linking to GitHub issues or pull requests during code reviews and reviewing AI configuration files.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-21387-2026-04-07-12-56-57

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Apr 7, 2026

Code Review

The implementation is clean and follows the established pattern of ArrowTypeOfFunc closely. Here are my findings:

Correctness

The core logic is correct: args.arg_fields[0].is_nullable() correctly reads the Arrow field's schema-level nullability, not a runtime NULL check. The use of Volatility::Immutable is appropriate since nullability is fixed for any given expression.

The test table schema confirms the expected results:

  • id / name / l_name: nullable = true
  • ts / nonnull_name: nullable = false

Issues

1. Missed optimization: no simplify implementation

Since nullability is a planning-time property (like the type returned by arrow_typeof), the result is always a compile-time constant. Implementing ScalarUDFImpl::simplify would allow the optimizer to fold is_nullable(col) into a Boolean literal at plan time, rather than evaluating it at every row during execution. This is the preferred pattern for schema-inspection functions.

Example sketch:

fn simplify(
    &self,
    args: Vec<Expr>,
    info: &dyn SimplifyInfo,
) -> Result<ExprSimplifyResult> {
    let nullable = info.nullable(&args[0])?;
    Ok(ExprSimplifyResult::Simplified(lit(nullable)))
}

2. SQL example table has mismatched column widths (in both is_nullable.rs and scalar_functions.md)

The separator line has 28/24 dashes per column, but the actual column headers are 38/36 characters wide, producing a misaligned table. They should be:

+----------------------------------------+--------------------------------------+
| is_nullable(table_with_metadata.name)  | is_nullable(table_with_metadata.ts)  |
+----------------------------------------+--------------------------------------+
| true                                   | false                                |
+----------------------------------------+--------------------------------------+

Minor Observations

  • The unused _arg destructure in invoke_with_args is intentional (needed for argument-count validation via take_function_args) — the leading underscore correctly suppresses the warning.
  • Test coverage is solid: literals (non-null and NULL), nullable columns, non-nullable columns, and propagation through arithmetic expressions.
  • l_name (also nullable) is untested, but the other nullable-column cases make this a minor gap.
  • The mod.rs wiring, test_context.rs change, and docs addition all look correct.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
datafusion/functions/src/core/is_nullable.rs (1)

56-56: Consider documenting the immutability semantics.

Volatility::Immutable is correct here since schema-level nullability doesn't change during query execution. This allows the optimizer to potentially constant-fold is_nullable(col) expressions when the schema is known. The current choice is appropriate.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion/functions/src/core/is_nullable.rs` at line 56, Add a brief doc
comment explaining why Volatility::Immutable is used for the is_nullable
function/operator: note that schema-level nullability does not change during
query execution, so Signature::any(1, Volatility::Immutable) allows the
optimizer to constant-fold is_nullable(col) when the schema is known; place this
comment near the is_nullable function (or the location where Signature::any is
called) and reference Volatility::Immutable and Signature::any in the note.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@datafusion/functions/src/core/is_nullable.rs`:
- Line 56: Add a brief doc comment explaining why Volatility::Immutable is used
for the is_nullable function/operator: note that schema-level nullability does
not change during query execution, so Signature::any(1, Volatility::Immutable)
allows the optimizer to constant-fold is_nullable(col) when the schema is known;
place this comment near the is_nullable function (or the location where
Signature::any is called) and reference Volatility::Immutable and Signature::any
in the note.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6ab4f094-1ffc-4172-9a09-51048a40b447

📥 Commits

Reviewing files that changed from the base of the PR and between 0bbab34 and 09c37c2.

📒 Files selected for processing (7)
  • .cursor/rules.md
  • AGENTS.md
  • datafusion/functions/src/core/is_nullable.rs
  • datafusion/functions/src/core/mod.rs
  • datafusion/sqllogictest/src/test_context.rs
  • datafusion/sqllogictest/test_files/is_nullable.slt
  • docs/source/user-guide/sql/scalar_functions.md

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the is_nullable scalar function to DataFusion, which returns whether an expression's field is nullable based on the schema. The implementation includes the function logic, registration, SQL logic tests, and documentation. Feedback includes removing the "Hash" trait from the "IsNullableFunc" struct to avoid compilation errors, correcting the ASCII table alignment in the documentation example, and avoiding manual edits to the auto-generated "scalar_functions.md" file.

description = "Expression to evaluate. The expression can be a constant, column, or function, and any combination of operators."
)
)]
#[derive(Debug, PartialEq, Eq, Hash)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Hash trait cannot be derived for IsNullableFunc because the signature field of type Signature does not implement Hash. Since ScalarUDFImpl does not require the implementation to be Hash, and ScalarUDF hashes based on the function name, you should remove Hash from the derive list to avoid a compilation error.

Suggested change
#[derive(Debug, PartialEq, Eq, Hash)]
#[derive(Debug, PartialEq, Eq)]

Comment on lines +30 to +34
+----------------------------+------------------------+
| is_nullable(table_with_metadata.name) | is_nullable(table_with_metadata.ts) |
+----------------------------+------------------------+
| true | false |
+----------------------------+------------------------+
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The ASCII table in the SQL example is misaligned. The column headers are significantly longer than the separator lines and the data rows, which makes the documentation harder to read.

Suggested change
+----------------------------+------------------------+
| is_nullable(table_with_metadata.name) | is_nullable(table_with_metadata.ts) |
+----------------------------+------------------------+
| true | false |
+----------------------------+------------------------+
+---------------------------------------+-------------------------------------+
| is_nullable(table_with_metadata.name) | is_nullable(table_with_metadata.ts) |
+---------------------------------------+-------------------------------------+
| true | false |
+---------------------------------------+-------------------------------------+

- [arrow_try_cast](#arrow_try_cast)
- [arrow_typeof](#arrow_typeof)
- [get_field](#get_field)
- [is_nullable](#is_nullable)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file is automatically generated by the dev/update_function_docs.sh script, as noted in the header of the file. You should avoid editing it manually. Instead, ensure the documentation in the ScalarUDFImpl implementation is correct and run the update script to regenerate this file. This ensures consistency and prevents your changes from being overwritten in the future.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Apr 7, 2026

🤖 Augment PR Summary

Summary: This PR adds a new core scalar function is_nullable(expr) that reports whether an expression’s output field is nullable based on schema/type inference (not whether a particular row’s value is NULL).

Changes:

  • Introduced IsNullableFunc as a ScalarUDFImpl and exposed it via the core function registry.
  • Added the function to the Rust expr_fn helpers for ergonomic expression construction.
  • Extended SQLLogicTest context setup so the new test file can reuse the existing metadata tables.
  • Added a new SQLLogicTest file covering literals, NULL literals, nullable vs non-nullable metadata columns, and nullability propagation through expressions.
  • Documented is_nullable in the user guide scalar function reference, including arguments and an example query.

Technical Notes: The implementation reads nullability from ScalarFunctionArgs.arg_fields and returns a constant boolean scalar per batch (volatility: immutable).

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

&self.signature
}

fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/functions/src/core/is_nullable.rs:70: ScalarUDFImpl’s default return_field_from_args marks outputs as nullable, but is_nullable always returns a non-NULL boolean (ScalarValue::Boolean(Some(...))), so the planned output field metadata may be incorrect.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

arg1
),(
is_nullable,
"Returns whether the input expression is nullable",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/functions/src/core/mod.rs:126: The expr_fn docs say “Returns whether the input expression is nullable”, which could be read as runtime NULL-ness; the function actually reports schema/field nullability as documented elsewhere.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants