Conversation
This patch adds the passing of the current session to the UDTF call. This helps implement session-dependent functions, for example, a function that returns the list of registered tables.
WalkthroughThis change introduces a new table function invocation API that passes execution context to table functions. A new ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Review:
|
There was a problem hiding this comment.
Code Review
This pull request introduces session-aware User Defined Table Functions (UDTFs) by replacing the TableFunctionImpl::call method with call_with_args. This new method receives a TableFunctionArgs struct containing both the function arguments and the session context, enabling UDTFs to access session-specific state. The PR includes updates to existing table functions, FFI support for the new signature, and a new example demonstrating a table_list UDTF. Feedback was provided regarding the table_list example, specifically suggesting the use of concurrent lookups instead of blocking the thread within a loop to improve performance.
| for catalog_name in catalog_list.catalog_names() { | ||
| let Some(catalog) = catalog_list.catalog(&catalog_name) else { | ||
| continue; | ||
| }; | ||
| for schema_name in catalog.schema_names() { | ||
| let Some(schema) = catalog.schema(&schema_name) else { | ||
| continue; | ||
| }; | ||
| for table_name in schema.table_names() { | ||
| let Some(provider) = block_in_place(|| { | ||
| Handle::current().block_on(schema.table(&table_name)) | ||
| })? | ||
| else { | ||
| continue; | ||
| }; | ||
| catalogs.append_value(catalog_name.clone()); | ||
| schemas.append_value(schema_name.clone()); | ||
| tables.append_value(table_name.clone()); | ||
| types.append_value(provider.table_type().to_string()) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The current implementation iterates through catalogs, schemas, and tables, and for each table, it blocks to get the table provider. Using block_in_place inside a loop can be inefficient as it blocks the current thread from running other tasks.
To improve performance, you could gather all the asynchronous schema.table() calls and execute them concurrently using futures::future::try_join_all. This would perform all the lookups in parallel.
let mut table_futures = vec![];
for catalog_name in catalog_list.catalog_names() {
if let Some(catalog) = catalog_list.catalog(&catalog_name) {
for schema_name in catalog.schema_names() {
if let Some(schema) = catalog.schema(&schema_name) {
for table_name in schema.table_names() {
let schema_clone = Arc::clone(&schema);
let catalog_name_clone = catalog_name.clone();
let schema_name_clone = schema_name.clone();
let table_name_clone = table_name.clone();
table_futures.push(async move {
schema_clone.table(&table_name_clone).await.map(|provider| {
(catalog_name_clone, schema_name_clone, table_name_clone, provider)
})
});
}
}
}
}
}
let results = block_in_place(|| {
Handle::current().block_on(futures::future::try_join_all(table_futures))
})?;
for (catalog_name, schema_name, table_name, provider) in results {
if let Some(provider) = provider {
catalogs.append_value(catalog_name);
schemas.append_value(schema_name);
tables.append_value(table_name);
types.append_value(provider.table_type().to_string())
}
}There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The current solution uses Tokio's block_in_place() method that works only in multi-threaded runtime. By using futures::try_join_all the code will work both in multi-threaded runtime and current-thread runtime. In addition it will be more concurrent.
🤖 Augment PR SummarySummary: This PR updates DataFusion’s UDTF (table function) API to pass the calling session into table function implementations. Changes:
Technical Notes: The new API enables UDTFs to inspect catalogs/schemas via the provided session, while maintaining backward compatibility via deprecated entry points. 🤖 Was this summary useful? React with 👍 or 👎 |
| since = "53.0.0", | ||
| note = "Implement `TableFunctionImpl::call_with_args` instead" | ||
| )] | ||
| fn call(&self, _args: &[Expr]) -> Result<Arc<dyn TableProvider>> { |
There was a problem hiding this comment.
datafusion/catalog/src/table.rs:525: Because both TableFunctionImpl::call and call_with_args now have default implementations, it’s possible to impl TableFunctionImpl for X {} and compile successfully but only fail at runtime with internal_err!("unimplemented"). Is that loss of compile-time enforcement intentional for this API transition?
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:valid-but-wont-fix; category:bug; feedback: The Augment AI reviewer is not correct! The default implementation for call_with_args is provided, so that the existing users do not need to fix their builds when they upgrade to the next version of DataFusion. Only the new users could break with an internal error but they will quickly realize this and fix their implementation.
| unsafe extern "C" fn(udtf: &Self, args: RVec<u8>) -> FFIResult<FFI_TableProvider>, | ||
|
|
||
| /// Equivalent to the [`TableFunctionImpl::call_with_args`]. | ||
| call_with_args: unsafe extern "C" fn( |
There was a problem hiding this comment.
datafusion/ffi/src/udtf.rs:59: FFI_TableFunction is #[repr(C)] and StableAbi, so adding the new call_with_args field changes the struct layout and likely breaks ABI for any external libraries built against the previous definition. Can you confirm there’s explicit versioning/prefix-type handling here so older plugins won’t be misinterpreted at runtime?
Severity: high
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:good-but-wont-fix; category:bug; feedback: The Augment AI reviewer is correct! The FFI API is broken by adding the new function but this is OK for a new major release of DataFusion.
| }; | ||
| for table_name in schema.table_names() { | ||
| let Some(provider) = block_in_place(|| { | ||
| Handle::current().block_on(schema.table(&table_name)) |
There was a problem hiding this comment.
datafusion-examples/examples/udf/table_list_udtf.rs:100: Using block_in_place + Handle::current().block_on(...) will panic if table_list() is invoked without a Tokio runtime, and block_in_place also requires the multi-thread runtime. Since UDTFs can be called from various execution contexts, it may be worth documenting/guarding these runtime assumptions for the example.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! The example would work only in multi-threaded Tokio runtime. It would be good to document this (e.g. with a comment), so the users copying this example into their application are aware of this limitation.
| The simple UDTF used here takes a single `Int64` argument and returns a table with a single column with the value of the | ||
| argument. To create a function in DataFusion, you need to implement the `TableFunctionImpl` trait. This trait has a | ||
| single method, `call`, that takes a slice of `Expr`s and returns a `Result<Arc<dyn TableProvider>>`. | ||
| single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`. |
There was a problem hiding this comment.
docs/source/library-user-guide/functions/adding-udfs.md:1380: This says TableFunctionImpl has a “single method” and that TableFunctionArgs only carries the Expr slice, but the trait still includes the deprecated call and TableFunctionArgs also includes the calling session. That mismatch may confuse readers trying to write session-aware UDTFs.
Severity: low
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! The documentation would be misleading until the old call() method is fully removed. It should be improved to mention both methods for the time being.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
datafusion/core/tests/user_defined/user_defined_table_functions.rs (1)
203-205: Please add one regression test that actually readsargs.session.Right now the test migrations only prove the new signature compiles; they never assert that the passed session is usable. Since this PR’s value is the new session plumbing, a small test that inspects
args.session(for example by downcasting toSessionStateand checking a registered table is visible) would make regressions much harder to miss.Also applies to: 235-235
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@datafusion/core/tests/user_defined/user_defined_table_functions.rs` around lines 203 - 205, Add a regression test that uses the table function's call_with_args(TableFunctionArgs) path and actually inspects args.session: in the test create a SessionState, register a temporary table or table provider into that session, call the table function (triggering call_with_args) and inside its implementation downcast args.session to SessionState and assert the registered table is visible via SessionState::catalog or catalog list_tables; reference the function call_with_args, the TableFunctionArgs struct (args.session), and SessionState downcast to locate where to add the assertion so the test validates the session plumbing rather than only compilation.datafusion-examples/examples/udf/table_list_udtf.rs (1)
76-82: Use a user-facing error for non-local sessions.This downcast can fail legitimately once the new FFI path supplies a
ForeignSession, so surfacing it asDataFusionError::Internalmakes an unsupported calling mode look like engine corruption. Please return a clear "table_listcurrently requires a localSessionState" error here, or document that limitation inline so users do not copy a local-only pattern as the general session-aware UDTF recipe.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@datafusion-examples/examples/udf/table_list_udtf.rs` around lines 76 - 82, Replace the internal-only error when downcasting args.session to SessionState in table_list_udtf (the block using args.session.as_any().downcast_ref::<SessionState>()) with a user-facing error that clearly states that table_list requires a local SessionState (or documents the limitation), e.g. returning an Err that mentions "table_list currently requires a local SessionState" (and mention ForeignSession as the possible alternate) instead of DataFusionError::Internal so callers with ForeignSession get a clear, actionable message.datafusion/ffi/src/udtf.rs (1)
427-430: Add coverage for the non-local session branch.This still passes a local
&ctx.state(), socall_with_args_wrapper()takes theas_local()path at Lines 153-157 and never exercises the newForeignSession::try_frombridge at Lines 158-160. One extra round-trip case that forces a non-localFFI_SessionRefwould make this change much safer.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@datafusion/ffi/src/udtf.rs` around lines 427 - 430, The test currently passes a local &ctx.state() so call_with_args_wrapper() follows the as_local() path and never hits ForeignSession::try_from; add a second round-trip case that constructs and passes a non-local FFI_SessionRef (so TableFunctionArgs.session is not a local ctx.state()) to table.call_with_args()/call_with_args_wrapper() to force the ForeignSession::try_from branch to run; reference the existing TableFunctionArgs, call_with_args, call_with_args_wrapper, ForeignSession::try_from, ctx.state(), and FFI_SessionRef when implementing this additional case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@datafusion/catalog/src/table.rs`:
- Around line 510-516: Make TableFunctionArgs future-proof by marking the struct
#[non_exhaustive], making its fields private, and adding a public constructor
and accessors: add TableFunctionArgs::new(args: &'a [Expr], session: &'a dyn
Session) -> Self and getters like args(&self) and session(&self); update all
external construction sites (e.g., where session_state.rs builds the struct) to
call TableFunctionArgs::new(...) instead of constructing fields directly so
adding new fields later won’t be breaking.
In `@datafusion/ffi/src/udtf.rs`:
- Around line 277-279: The FFI session is being constructed without the UDTF
runtime; update the FFI_SessionRef::new call to pass the session runtime instead
of None by extracting self.0.runtime() (the same runtime used in
call_with_args_wrapper) as the second argument so
FFI_SessionRef::new(args.session, self.0.runtime(),
self.0.logical_codec.clone()) and enable session-aware async/catalog operations
for foreign UDTFs.
In `@docs/source/library-user-guide/functions/adding-udfs.md`:
- Around line 1378-1384: The docs describe TableFunctionArgs only as carrying a
slice of Exprs and omit the new session capability; update the UDTF overview to
mention that TableFunctionArgs (used in TableFunctionImpl::call_with_args) also
exposes args.session so implementers can access the execution/session context
when parsing Exprs and returning an Arc<dyn TableProvider>, and add a short note
to validate/use args.session when implementing session-aware UDTFs (reference
TableFunctionArgs, call_with_args, args.session, TableFunctionImpl, Expr,
TableProvider).
---
Nitpick comments:
In `@datafusion-examples/examples/udf/table_list_udtf.rs`:
- Around line 76-82: Replace the internal-only error when downcasting
args.session to SessionState in table_list_udtf (the block using
args.session.as_any().downcast_ref::<SessionState>()) with a user-facing error
that clearly states that table_list requires a local SessionState (or documents
the limitation), e.g. returning an Err that mentions "table_list currently
requires a local SessionState" (and mention ForeignSession as the possible
alternate) instead of DataFusionError::Internal so callers with ForeignSession
get a clear, actionable message.
In `@datafusion/core/tests/user_defined/user_defined_table_functions.rs`:
- Around line 203-205: Add a regression test that uses the table function's
call_with_args(TableFunctionArgs) path and actually inspects args.session: in
the test create a SessionState, register a temporary table or table provider
into that session, call the table function (triggering call_with_args) and
inside its implementation downcast args.session to SessionState and assert the
registered table is visible via SessionState::catalog or catalog list_tables;
reference the function call_with_args, the TableFunctionArgs struct
(args.session), and SessionState downcast to locate where to add the assertion
so the test validates the session plumbing rather than only compilation.
In `@datafusion/ffi/src/udtf.rs`:
- Around line 427-430: The test currently passes a local &ctx.state() so
call_with_args_wrapper() follows the as_local() path and never hits
ForeignSession::try_from; add a second round-trip case that constructs and
passes a non-local FFI_SessionRef (so TableFunctionArgs.session is not a local
ctx.state()) to table.call_with_args()/call_with_args_wrapper() to force the
ForeignSession::try_from branch to run; reference the existing
TableFunctionArgs, call_with_args, call_with_args_wrapper,
ForeignSession::try_from, ctx.state(), and FFI_SessionRef when implementing this
additional case.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: bafea0ab-08c9-4ce5-8a2e-03b29291acec
📒 Files selected for processing (11)
datafusion-cli/src/functions.rsdatafusion-examples/README.mddatafusion-examples/examples/udf/main.rsdatafusion-examples/examples/udf/simple_udtf.rsdatafusion-examples/examples/udf/table_list_udtf.rsdatafusion/catalog/src/table.rsdatafusion/core/src/execution/session_state.rsdatafusion/core/tests/user_defined/user_defined_table_functions.rsdatafusion/ffi/src/udtf.rsdatafusion/functions-table/src/generate_series.rsdocs/source/library-user-guide/functions/adding-udfs.md
| /// Describes arguments provided to the table function call. | ||
| pub struct TableFunctionArgs<'a> { | ||
| /// Call arguments. | ||
| pub args: &'a [Expr], | ||
| /// Session within which the function is called. | ||
| pub session: &'a dyn Session, | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Future-proof TableFunctionArgs before this API ships.
This type is already constructed cross-crate from datafusion/core/src/execution/session_state.rs, so the current public-field shape freezes it immediately. If you ever need to carry more call metadata, adding a field becomes a breaking change for downstream UDTFs. A constructor plus #[non_exhaustive] would keep the new API extensible.
Suggested API hardening
+#[non_exhaustive]
/// Describes arguments provided to the table function call.
pub struct TableFunctionArgs<'a> {
/// Call arguments.
pub args: &'a [Expr],
/// Session within which the function is called.
pub session: &'a dyn Session,
}
+
+impl<'a> TableFunctionArgs<'a> {
+ pub fn new(args: &'a [Expr], session: &'a dyn Session) -> Self {
+ Self { args, session }
+ }
+}Then switch external construction sites to TableFunctionArgs::new(...).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /// Describes arguments provided to the table function call. | |
| pub struct TableFunctionArgs<'a> { | |
| /// Call arguments. | |
| pub args: &'a [Expr], | |
| /// Session within which the function is called. | |
| pub session: &'a dyn Session, | |
| } | |
| #[non_exhaustive] | |
| /// Describes arguments provided to the table function call. | |
| pub struct TableFunctionArgs<'a> { | |
| /// Call arguments. | |
| pub args: &'a [Expr], | |
| /// Session within which the function is called. | |
| pub session: &'a dyn Session, | |
| } | |
| impl<'a> TableFunctionArgs<'a> { | |
| pub fn new(args: &'a [Expr], session: &'a dyn Session) -> Self { | |
| Self { args, session } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@datafusion/catalog/src/table.rs` around lines 510 - 516, Make
TableFunctionArgs future-proof by marking the struct #[non_exhaustive], making
its fields private, and adding a public constructor and accessors: add
TableFunctionArgs::new(args: &'a [Expr], session: &'a dyn Session) -> Self and
getters like args(&self) and session(&self); update all external construction
sites (e.g., where session_state.rs builds the struct) to call
TableFunctionArgs::new(...) instead of constructing fields directly so adding
new fields later won’t be breaking.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct! The new struct is fully public (it is public itself and all its fields are public too). This makes it too open for the external users and thus less extendable in the future. It would be better to hide the fields, add a constructor method and getters. This way more fields could be added later without breaking the existing users.
| let session = | ||
| FFI_SessionRef::new(args.session, None, self.0.logical_codec.clone()); | ||
| let codec: Arc<dyn LogicalExtensionCodec> = (&self.0.logical_codec).into(); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Inspect the session bridge implementation and constructor signature.
fd -i session.rs datafusion/ffi/src --exec sed -n '1,260p' {}
# Show where runtime is stored/used across the FFI session bridge.
rg -n -C4 'struct FFI_SessionRef|impl FFI_SessionRef|FFI_SessionRef::new\(|ForeignSession|runtime|block_on' datafusion/ffi/srcRepository: martin-augment/datafusion
Length of output: 50382
🏁 Script executed:
rg -n 'impl FFI_SessionRef|FFI_SessionRef::new|struct ForeignSession|impl ForeignSession|impl.*ForeignSession.*try_from' datafusion/ffi/src/session.rs -A 10Repository: martin-augment/datafusion
Length of output: 138
🏁 Script executed:
find datafusion/ffi/src -name '*.rs' -type f | head -20Repository: martin-augment/datafusion
Length of output: 853
🏁 Script executed:
rg -l 'struct FFI_SessionRef|struct ForeignSession' datafusion/ffi/src/Repository: martin-augment/datafusion
Length of output: 102
🏁 Script executed:
cat -n datafusion/ffi/src/session/mod.rs | head -150Repository: martin-augment/datafusion
Length of output: 6853
🏁 Script executed:
sed -n '150,250p' datafusion/ffi/src/session/mod.rsRepository: martin-augment/datafusion
Length of output: 3411
🏁 Script executed:
sed -n '250,400p' datafusion/ffi/src/session/mod.rs | grep -A 20 'fn new'Repository: martin-augment/datafusion
Length of output: 987
🏁 Script executed:
sed -n '250,450p' datafusion/ffi/src/session/mod.rs | grep -A 40 'struct ForeignSession'Repository: martin-augment/datafusion
Length of output: 1486
🏁 Script executed:
sed -n '400,500p' datafusion/ffi/src/session/mod.rs | head -60Repository: martin-augment/datafusion
Length of output: 2116
🏁 Script executed:
sed -n '400,550p' datafusion/ffi/src/session/mod.rs | grep -B 5 -A 15 'runtime_env: Default'Repository: martin-augment/datafusion
Length of output: 625
Pass the UDTF runtime to FFI_SessionRef::new() to support session-aware operations.
At line 278, FFI_SessionRef::new() receives None for the runtime parameter. Extract self.0.runtime() (as done in call_with_args_wrapper at line 138) and pass it as the second argument to enable proper bridging of async session/catalog operations in foreign UDTFs.
let session =
- FFI_SessionRef::new(args.session, None, self.0.logical_codec.clone());
+ FFI_SessionRef::new(args.session, self.0.runtime(), self.0.logical_codec.clone());📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| let session = | |
| FFI_SessionRef::new(args.session, None, self.0.logical_codec.clone()); | |
| let codec: Arc<dyn LogicalExtensionCodec> = (&self.0.logical_codec).into(); | |
| let session = | |
| FFI_SessionRef::new(args.session, self.0.runtime(), self.0.logical_codec.clone()); | |
| let codec: Arc<dyn LogicalExtensionCodec> = (&self.0.logical_codec).into(); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@datafusion/ffi/src/udtf.rs` around lines 277 - 279, The FFI session is being
constructed without the UDTF runtime; update the FFI_SessionRef::new call to
pass the session runtime instead of None by extracting self.0.runtime() (the
same runtime used in call_with_args_wrapper) as the second argument so
FFI_SessionRef::new(args.session, self.0.runtime(),
self.0.logical_codec.clone()) and enable session-aware async/catalog operations
for foreign UDTFs.
There was a problem hiding this comment.
value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The UDF runtime could be made available to the FFI SessionRef, so that it could be used by the foreign caller to make async operations too.
| The simple UDTF used here takes a single `Int64` argument and returns a table with a single column with the value of the | ||
| argument. To create a function in DataFusion, you need to implement the `TableFunctionImpl` trait. This trait has a | ||
| single method, `call`, that takes a slice of `Expr`s and returns a `Result<Arc<dyn TableProvider>>`. | ||
| single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`. | ||
| Passed struct includes function arguments as a slice of `Expr`s. | ||
|
|
||
| In the `call` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some | ||
| In the `call_with_args` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some | ||
| validation of the input `Expr`s, e.g. checking that the number of arguments is correct. |
There was a problem hiding this comment.
Document args.session in the UDTF overview.
This paragraph currently describes TableFunctionArgs as if it only carries the Expr slice, which hides the main capability added by this PR. Readers will not realize session-aware UDTFs are supported unless they inspect the type definition or examples.
Suggested wording
-single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
-Passed struct includes function arguments as a slice of `Expr`s.
+single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
+`TableFunctionArgs` carries both the function arguments (`args.args`) and the calling session (`args.session`).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| The simple UDTF used here takes a single `Int64` argument and returns a table with a single column with the value of the | |
| argument. To create a function in DataFusion, you need to implement the `TableFunctionImpl` trait. This trait has a | |
| single method, `call`, that takes a slice of `Expr`s and returns a `Result<Arc<dyn TableProvider>>`. | |
| single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`. | |
| Passed struct includes function arguments as a slice of `Expr`s. | |
| In the `call` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some | |
| In the `call_with_args` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some | |
| validation of the input `Expr`s, e.g. checking that the number of arguments is correct. | |
| The simple UDTF used here takes a single `Int64` argument and returns a table with a single column with the value of the | |
| argument. To create a function in DataFusion, you need to implement the `TableFunctionImpl` trait. This trait has a | |
| single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`. | |
| `TableFunctionArgs` carries both the function arguments (`args.args`) and the calling session (`args.session`). | |
| In the `call_with_args` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some | |
| validation of the input `Expr`s, e.g. checking that the number of arguments is correct. |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/source/library-user-guide/functions/adding-udfs.md` around lines 1378 -
1384, The docs describe TableFunctionArgs only as carrying a slice of Exprs and
omit the new session capability; update the UDTF overview to mention that
TableFunctionArgs (used in TableFunctionImpl::call_with_args) also exposes
args.session so implementers can access the execution/session context when
parsing Exprs and returning an Arc<dyn TableProvider>, and add a short note to
validate/use args.session when implementing session-aware UDTFs (reference
TableFunctionArgs, call_with_args, args.session, TableFunctionImpl, Expr,
TableProvider).
There was a problem hiding this comment.
value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! The main benefit of the new call_with_args method is the access to the session state. The updated documentation should mention it to make the user/developer aware of it.
20222: To review by AI