Skip to content

Conversation

@robinskil
Copy link
Collaborator

This pull request introduces two new utility scalar user-defined functions (UDFs) for DataFusion: mask_if_null and mask_if_not_null. These functions are designed to conditionally mask values based on their nullability, and are now registered for use throughout the codebase.

New conditional masking UDFs:

  • Added the mask_if_null UDF in mask_if_null.rs, which returns the second argument if the first argument is null, otherwise returns null. This function includes logic for simplification and type checking.
  • Added the mask_if_not_null UDF in mask_if_not_null.rs, which returns the second argument if the first argument is not null, otherwise returns null. This function also supports simplification and type checking.

Integration with utility UDF registry:

  • Registered both new UDFs in the util_udfs function in mod.rs, making them available for use wherever utility UDFs are loaded.

@robinskil robinskil self-assigned this Sep 30, 2025
Copilot AI review requested due to automatic review settings September 30, 2025 20:42
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request introduces two new utility scalar user-defined functions (UDFs) for DataFusion that provide conditional masking based on null values: mask_if_null and mask_if_not_null. These functions are designed to return the second argument conditionally based on the nullability of the first argument.

  • Added mask_if_null UDF that returns the second argument when the first is null, otherwise returns null
  • Added mask_if_not_null UDF that returns the second argument when the first is not null, otherwise returns null
  • Registered both UDFs in the utility function registry for availability throughout the codebase

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
beacon-functions/src/util/mod.rs Added module declarations and registered the new UDFs in the util_udfs function
beacon-functions/src/util/mask_if_null.rs Implemented mask_if_null UDF with conditional logic, type checking, and simplification
beacon-functions/src/util/mask_if_not_null.rs Implemented mask_if_not_null UDF with conditional logic, type checking, and simplification
Comments suppressed due to low confidence (4)

beacon-functions/src/util/mask_if_null.rs:1

  • The CaseExpr import is unused in this file. Consider removing it to keep imports clean.
use std::sync::Arc;

beacon-functions/src/util/mask_if_null.rs:1

  • The is_null import is unused in this file. The code uses left.is_null() method instead. Consider removing the unused import.
use std::sync::Arc;

beacon-functions/src/util/mask_if_not_null.rs:1

  • The CaseExpr import is unused in this file. Consider removing it to keep imports clean.
use std::sync::Arc;

beacon-functions/src/util/mask_if_not_null.rs:1

  • The is_null import is unused in this file. The code uses left.is_not_null() method instead. Consider removing the unused import.
use std::sync::Arc;

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@@ -0,0 +1,99 @@
use std::sync::Arc;
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::sync::Arc import is unused in this file. Consider removing it to keep imports clean.

Suggested change
use std::sync::Arc;

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,99 @@
use std::sync::Arc;

use arrow::datatypes::{DataType, Field, FieldRef};
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Field and FieldRef imports from arrow::datatypes are unused in this file. Consider removing them to keep imports clean.

Suggested change
use arrow::datatypes::{DataType, Field, FieldRef};
use arrow::datatypes::DataType;

Copilot uses AI. Check for mistakes.
logical_expr::{
conditional_expressions::CaseBuilder,
simplify::{ExprSimplifyResult, SimplifyInfo},
ColumnarValue, ExprSchemable, ReturnFieldArgs, ScalarUDF, ScalarUDFImpl, Signature,
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExprSchemable and ReturnFieldArgs imports are unused in this file. Consider removing them to keep imports clean.

Suggested change
ColumnarValue, ExprSchemable, ReturnFieldArgs, ScalarUDF, ScalarUDFImpl, Signature,
ColumnarValue, ScalarUDF, ScalarUDFImpl, Signature,

Copilot uses AI. Check for mistakes.
Comment on lines +74 to +76
// If the first argument is known to be non-null, we can simplify to the second argument
if let Ok(false) = info.nullable(&left) {
return Ok(ExprSimplifyResult::Simplified(right));
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simplification logic is incorrect for mask_if_null. When the first argument is known to be non-null, the function should return null, not the second argument. The correct simplification should be Ok(ExprSimplifyResult::Simplified(Expr::Literal(datafusion::scalar::ScalarValue::Null))).

Suggested change
// If the first argument is known to be non-null, we can simplify to the second argument
if let Ok(false) = info.nullable(&left) {
return Ok(ExprSimplifyResult::Simplified(right));
// If the first argument is known to be non-null, we can simplify to null
if let Ok(false) = info.nullable(&left) {
return Ok(ExprSimplifyResult::Simplified(Expr::Literal(
datafusion::scalar::ScalarValue::Null,
None,
)));

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,99 @@
use std::sync::Arc;
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The std::sync::Arc import is unused in this file. Consider removing it to keep imports clean.

Suggested change
use std::sync::Arc;

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,99 @@
use std::sync::Arc;

use arrow::datatypes::{DataType, Field, FieldRef};
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Field and FieldRef imports from arrow::datatypes are unused in this file. Consider removing them to keep imports clean.

Suggested change
use arrow::datatypes::{DataType, Field, FieldRef};
use arrow::datatypes::DataType;

Copilot uses AI. Check for mistakes.
logical_expr::{
conditional_expressions::CaseBuilder,
simplify::{ExprSimplifyResult, SimplifyInfo},
ColumnarValue, ExprSchemable, ReturnFieldArgs, ScalarUDF, ScalarUDFImpl, Signature,
Copy link

Copilot AI Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ExprSchemable and ReturnFieldArgs imports are unused in this file. Consider removing them to keep imports clean.

Suggested change
ColumnarValue, ExprSchemable, ReturnFieldArgs, ScalarUDF, ScalarUDFImpl, Signature,
ColumnarValue, ScalarUDF, ScalarUDFImpl, Signature,

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants