-
Notifications
You must be signed in to change notification settings - Fork 0
21389: Add arrow_field(expr) scalar UDF
#302
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| The AI agents should never link to any issue or a pull request | ||
| in any GitHub repository in the code reviews! | ||
|
|
||
| The AI agents should not review AI agents' config files like CLAUDE.md or AGENTS.md! | ||
|
|
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,162 @@ | ||||||||||||||||||||||||||
| // Licensed to the Apache Software Foundation (ASF) under one | ||||||||||||||||||||||||||
| // or more contributor license agreements. See the NOTICE file | ||||||||||||||||||||||||||
| // distributed with this work for additional information | ||||||||||||||||||||||||||
| // regarding copyright ownership. The ASF licenses this file | ||||||||||||||||||||||||||
| // to you under the Apache License, Version 2.0 (the | ||||||||||||||||||||||||||
| // "License"); you may not use this file except in compliance | ||||||||||||||||||||||||||
| // with the License. You may obtain a copy of the License at | ||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||
| // http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||||||||||||||||||
| // | ||||||||||||||||||||||||||
| // Unless required by applicable law or agreed to in writing, | ||||||||||||||||||||||||||
| // software distributed under the License is distributed on an | ||||||||||||||||||||||||||
| // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||||||||||||||||||||||||||
| // KIND, either express or implied. See the License for the | ||||||||||||||||||||||||||
| // specific language governing permissions and limitations | ||||||||||||||||||||||||||
| // under the License. | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| use arrow::array::{ | ||||||||||||||||||||||||||
| Array, BooleanArray, MapBuilder, StringArray, StringBuilder, StructArray, | ||||||||||||||||||||||||||
| }; | ||||||||||||||||||||||||||
| use arrow::datatypes::{DataType, Field, Fields}; | ||||||||||||||||||||||||||
| use datafusion_common::{Result, ScalarValue, utils::take_function_args}; | ||||||||||||||||||||||||||
| use datafusion_expr::{ | ||||||||||||||||||||||||||
| ColumnarValue, Documentation, ScalarFunctionArgs, ScalarUDFImpl, Signature, | ||||||||||||||||||||||||||
| Volatility, | ||||||||||||||||||||||||||
| }; | ||||||||||||||||||||||||||
| use datafusion_macros::user_doc; | ||||||||||||||||||||||||||
| use std::sync::Arc; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| #[user_doc( | ||||||||||||||||||||||||||
| doc_section(label = "Other Functions"), | ||||||||||||||||||||||||||
| description = "Returns a struct containing the Arrow field information of the expression, including name, data type, nullability, and metadata.", | ||||||||||||||||||||||||||
| syntax_example = "arrow_field(expression)", | ||||||||||||||||||||||||||
| sql_example = r#"```sql | ||||||||||||||||||||||||||
| > select arrow_field(1); | ||||||||||||||||||||||||||
| +----------------------------------------------+ | ||||||||||||||||||||||||||
| | arrow_field(Int64(1)) | | ||||||||||||||||||||||||||
| +----------------------------------------------+ | ||||||||||||||||||||||||||
| | {name: Int64(1), data_type: Int64, ...} | | ||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Severity: low 🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:documentation; feedback: The Augment AI reviewer is correct! The .slt file confirms that the name of the literal columns is lit. The documentation has to be updated to match with the reality. |
||||||||||||||||||||||||||
| +----------------------------------------------+ | ||||||||||||||||||||||||||
|
Comment on lines
+35
to
+40
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix the documented example output for The SQL example currently shows 📝 Suggested doc fix-| {name: Int64(1), data_type: Int64, ...} |
+| {name: lit, data_type: Int64, ...} |📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:useful; category:documentation; feedback: The CodeRabbit AI reviewer is correct! The .slt file confirms that the name of the literal columns is lit. The documentation has to be updated to match with the reality. |
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| > select arrow_field(1)['data_type']; | ||||||||||||||||||||||||||
| +-----------------------------------+ | ||||||||||||||||||||||||||
| | arrow_field(Int64(1))[data_type] | | ||||||||||||||||||||||||||
| +-----------------------------------+ | ||||||||||||||||||||||||||
| | Int64 | | ||||||||||||||||||||||||||
| +-----------------------------------+ | ||||||||||||||||||||||||||
| ```"#, | ||||||||||||||||||||||||||
| argument( | ||||||||||||||||||||||||||
| name = "expression", | ||||||||||||||||||||||||||
| description = "Expression to evaluate. The expression can be a constant, column, or function, and any combination of operators." | ||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||
| )] | ||||||||||||||||||||||||||
| #[derive(Debug, PartialEq, Eq, Hash)] | ||||||||||||||||||||||||||
| pub struct ArrowFieldFunc { | ||||||||||||||||||||||||||
| signature: Signature, | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| impl Default for ArrowFieldFunc { | ||||||||||||||||||||||||||
| fn default() -> Self { | ||||||||||||||||||||||||||
| Self::new() | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| impl ArrowFieldFunc { | ||||||||||||||||||||||||||
| pub fn new() -> Self { | ||||||||||||||||||||||||||
| Self { | ||||||||||||||||||||||||||
| signature: Signature::any(1, Volatility::Immutable), | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
Comment on lines
+67
to
+69
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The signature should be
Suggested change
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:annoying; category:bug; feedback: The Gemini AI reviewer is not correct! |
||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| fn return_struct_type() -> DataType { | ||||||||||||||||||||||||||
| DataType::Struct(Fields::from(vec![ | ||||||||||||||||||||||||||
| Field::new("name", DataType::Utf8, false), | ||||||||||||||||||||||||||
| Field::new("data_type", DataType::Utf8, false), | ||||||||||||||||||||||||||
| Field::new("nullable", DataType::Boolean, false), | ||||||||||||||||||||||||||
| Field::new( | ||||||||||||||||||||||||||
| "metadata", | ||||||||||||||||||||||||||
| DataType::Map( | ||||||||||||||||||||||||||
| Arc::new(Field::new( | ||||||||||||||||||||||||||
| "entries", | ||||||||||||||||||||||||||
| DataType::Struct(Fields::from(vec![ | ||||||||||||||||||||||||||
| Field::new("keys", DataType::Utf8, false), | ||||||||||||||||||||||||||
| Field::new("values", DataType::Utf8, true), | ||||||||||||||||||||||||||
| ])), | ||||||||||||||||||||||||||
| false, | ||||||||||||||||||||||||||
| )), | ||||||||||||||||||||||||||
| false, | ||||||||||||||||||||||||||
| ), | ||||||||||||||||||||||||||
| false, | ||||||||||||||||||||||||||
| ), | ||||||||||||||||||||||||||
| ])) | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| impl ScalarUDFImpl for ArrowFieldFunc { | ||||||||||||||||||||||||||
| fn name(&self) -> &str { | ||||||||||||||||||||||||||
| "arrow_field" | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| fn signature(&self) -> &Signature { | ||||||||||||||||||||||||||
| &self.signature | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| fn return_type(&self, _arg_types: &[DataType]) -> Result<DataType> { | ||||||||||||||||||||||||||
| Ok(Self::return_struct_type()) | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> { | ||||||||||||||||||||||||||
| let [_arg] = take_function_args(self.name(), args.args)?; | ||||||||||||||||||||||||||
| let field = &args.arg_fields[0]; | ||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In Severity: medium 🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:annoying; category:bug; feedback: The Augment AI reviewer is correct! The function won't be executed at all if the parameters do not match the expected signature, i.e. this method won't be called at all if there is no exactly one argument passed to the function. |
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| // Build the name array | ||||||||||||||||||||||||||
| let name_array = | ||||||||||||||||||||||||||
| Arc::new(StringArray::from(vec![field.name().as_str()])) as Arc<dyn Array>; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| // Build the data_type array | ||||||||||||||||||||||||||
| let data_type_str = format!("{}", field.data_type()); | ||||||||||||||||||||||||||
| let data_type_array = | ||||||||||||||||||||||||||
| Arc::new(StringArray::from(vec![data_type_str.as_str()])) as Arc<dyn Array>; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| // Build the nullable array | ||||||||||||||||||||||||||
| let nullable_array = | ||||||||||||||||||||||||||
| Arc::new(BooleanArray::from(vec![field.is_nullable()])) as Arc<dyn Array>; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| // Build the metadata map array (same pattern as arrow_metadata.rs) | ||||||||||||||||||||||||||
| let metadata = field.metadata(); | ||||||||||||||||||||||||||
| let mut map_builder = | ||||||||||||||||||||||||||
| MapBuilder::new(None, StringBuilder::new(), StringBuilder::new()); | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| let mut entries: Vec<_> = metadata.iter().collect(); | ||||||||||||||||||||||||||
| entries.sort_by_key(|(k, _)| *k); | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| for (k, v) in entries { | ||||||||||||||||||||||||||
| map_builder.keys().append_value(k); | ||||||||||||||||||||||||||
| map_builder.values().append_value(v); | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
| map_builder.append(true)?; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| let metadata_array = Arc::new(map_builder.finish()) as Arc<dyn Array>; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| // Build the struct | ||||||||||||||||||||||||||
| let DataType::Struct(fields) = Self::return_struct_type() else { | ||||||||||||||||||||||||||
| unreachable!() | ||||||||||||||||||||||||||
| }; | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| let struct_array = StructArray::new( | ||||||||||||||||||||||||||
| fields, | ||||||||||||||||||||||||||
| vec![name_array, data_type_array, nullable_array, metadata_array], | ||||||||||||||||||||||||||
| None, | ||||||||||||||||||||||||||
| ); | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| Ok(ColumnarValue::Scalar(ScalarValue::try_from_array( | ||||||||||||||||||||||||||
| &struct_array, | ||||||||||||||||||||||||||
| 0, | ||||||||||||||||||||||||||
| )?)) | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
|
Comment on lines
+109
to
+157
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This implementation can be simplified and made more robust. Instead of building single-element Additionally, the argument handling fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
if args.args.len() != 1 {
return internal_err!(
"{} expected 1 argument but got {}",
self.name(),
args.args.len()
);
}
let field = &args.arg_fields[0];
let name_scalar = ScalarValue::Utf8(Some(field.name().clone()));
let data_type_scalar =
ScalarValue::Utf8(Some(format!("{}", field.data_type())));
let nullable_scalar = ScalarValue::Boolean(Some(field.is_nullable()));
// Build the metadata map scalar
let metadata = field.metadata();
let mut map_builder =
MapBuilder::new(None, StringBuilder::new(), StringBuilder::new());
let mut entries: Vec<_> = metadata.iter().collect();
entries.sort_by_key(|(k, _)| *k);
for (k, v) in entries {
map_builder.keys().append_value(k);
map_builder.values().append_value(v);
}
map_builder.append(true)?;
let metadata_array = Arc::new(map_builder.finish());
let metadata_scalar = ScalarValue::try_from_array(&metadata_array, 0)?;
let struct_scalar = ScalarValue::Struct(Arc::new([
name_scalar,
data_type_scalar,
nullable_scalar,
metadata_scalar,
]));
Ok(ColumnarValue::Scalar(struct_scalar))
}
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The result could be constructed without using intermediate arrays. This would reduce the complexity and improve the performance by allocating+deallocating less memory. |
||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| fn documentation(&self) -> Option<&Documentation> { | ||||||||||||||||||||||||||
| self.doc() | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,104 @@ | ||
| # Licensed to the Apache Software Foundation (ASF) under one | ||
| # or more contributor license agreements. See the NOTICE file | ||
| # distributed with this work for additional information | ||
| # regarding copyright ownership. The ASF licenses this file | ||
| # to you under the Apache License, Version 2.0 (the | ||
| # "License"); you may not use this file except in compliance | ||
| # with the License. You may obtain a copy of the License at | ||
|
|
||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
|
|
||
| # Unless required by applicable law or agreed to in writing, | ||
| # software distributed under the License is distributed on an | ||
| # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
| # KIND, either express or implied. See the License for the | ||
| # specific language governing permissions and limitations | ||
| # under the License. | ||
|
|
||
| # arrow_field on integer literal | ||
| query ? | ||
| SELECT arrow_field(1) | ||
| ---- | ||
| {name: lit, data_type: Int64, nullable: false, metadata: {}} | ||
|
|
||
| # arrow_field on null literal | ||
| query ? | ||
| SELECT arrow_field(null) | ||
| ---- | ||
| {name: lit, data_type: Null, nullable: true, metadata: {}} | ||
|
|
||
| # arrow_field on boolean literal | ||
| query ? | ||
| SELECT arrow_field(true) | ||
| ---- | ||
| {name: lit, data_type: Boolean, nullable: false, metadata: {}} | ||
|
|
||
| # arrow_field on string literal | ||
| query ? | ||
| SELECT arrow_field('foo') | ||
| ---- | ||
| {name: lit, data_type: Utf8, nullable: false, metadata: {}} | ||
|
|
||
| # arrow_field on float literal | ||
| query ? | ||
| SELECT arrow_field(1.0) | ||
| ---- | ||
| {name: lit, data_type: Float64, nullable: false, metadata: {}} | ||
|
|
||
| # arrow_field on list | ||
| query ? | ||
| SELECT arrow_field(ARRAY[1,2,3]) | ||
| ---- | ||
| {name: lit, data_type: List(Int64), nullable: false, metadata: {}} | ||
|
|
||
| # arrow_field struct field access - data_type | ||
| query T | ||
| SELECT arrow_field(1)['data_type'] | ||
| ---- | ||
| Int64 | ||
|
|
||
| # arrow_field struct field access - nullable | ||
| query B | ||
| SELECT arrow_field(1)['nullable'] | ||
| ---- | ||
| false | ||
|
|
||
| # arrow_field struct field access - name | ||
| query T | ||
| SELECT arrow_field(1)['name'] | ||
| ---- | ||
| lit | ||
|
|
||
| # arrow_field with table columns | ||
| statement ok | ||
| CREATE TABLE arrow_field_test(x INT NOT NULL, y TEXT) AS VALUES (1, 'a'); | ||
|
|
||
| query ? | ||
| SELECT arrow_field(x) FROM arrow_field_test | ||
| ---- | ||
| {name: x, data_type: Int32, nullable: false, metadata: {}} | ||
|
|
||
| query ? | ||
| SELECT arrow_field(y) FROM arrow_field_test | ||
| ---- | ||
| {name: y, data_type: Utf8View, nullable: true, metadata: {}} | ||
|
|
||
| # arrow_field column access - name reflects column name | ||
| query T | ||
| SELECT arrow_field(x)['name'] FROM arrow_field_test | ||
| ---- | ||
| x | ||
|
|
||
| # arrow_field column access - nullability | ||
| query B | ||
| SELECT arrow_field(x)['nullable'] FROM arrow_field_test | ||
| ---- | ||
| false | ||
|
|
||
| query B | ||
| SELECT arrow_field(y)['nullable'] FROM arrow_field_test | ||
| ---- | ||
| true | ||
|
|
||
| statement ok | ||
| DROP TABLE arrow_field_test; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines are a duplicate of lines 42-45 and can be removed to avoid redundancy.