Skip to content

20222: Provide session to the udtf call#296

Open
martin-augment wants to merge 1 commit intomainfrom
pr-20222-2026-03-29-11-26-05
Open

20222: Provide session to the udtf call#296
martin-augment wants to merge 1 commit intomainfrom
pr-20222-2026-03-29-11-26-05

Conversation

@martin-augment
Copy link
Copy Markdown
Owner

20222: To review by AI

This patch adds the passing of the current session to the UDTF call.
This helps implement session-dependent functions, for example, a function
that returns the list of registered tables.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 29, 2026

Walkthrough

This change introduces a new table function invocation API that passes execution context to table functions. A new TableFunctionArgs struct was created to carry both argument expressions and the calling session state. The TableFunctionImpl trait was updated with a new call_with_args method that accepts TableFunctionArgs, while the existing call method was deprecated. Multiple implementations of table functions across the codebase were updated to use the new method signature. The session state provider was modified to construct and pass the execution context when invoking table functions. FFI layer support was added for the new calling convention. A new example demonstrating a session-aware UDTF was added, and documentation was updated to reflect the API changes.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch pr-20222-2026-03-29-11-26-05

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude bot commented Mar 29, 2026

PR Review: 20222: Provide session to the udtf call

This PR adds a TableFunctionArgs struct that bundles the call arguments with the Session reference, allowing UDTFs to access session state at call time. The approach is well thought-out and follows DataFusion's deprecation conventions. Here is my detailed feedback:


Design

args.args redundancy (datafusion/catalog/src/table.rs)

The field is named args inside a struct named TableFunctionArgs, leading to args.args at every call site. Every existing implementor uses let exprs = args.args; as the first line just to escape this awkwardness. Consider renaming the field to exprs or expressions:

pub struct TableFunctionArgs<'a> {
    pub exprs: &'a [Expr],
    pub session: &'a dyn Session,
}

Single lifetime for two independent borrows

args and session share the same lifetime 'a. In practice, the session lives at least as long as the query planning phase, but this couples two semantically independent lifetimes and may make it harder for callers to construct the struct in some scenarios. Using two independent lifetimes ('a for args, 'b for session) would be more flexible:

pub struct TableFunctionArgs<'a, 'b> {
    pub args: &'a [Expr],
    pub session: &'b dyn Session,
}

Backward compatibility

The default call() impl returns internal_err!("unimplemented"). This is correct when call_with_args is overridden, but if a third-party crate implements only the old call(), the chain is:

call_with_args (default) → call() (user impl) ✓

That path works. However, the internal_err! message ("unimplemented") is surprising since Rust normally uses unimplemented!() for that; consider "TableFunctionImpl::call is not implemented. Implement call_with_args instead." for clarity.


Example: table_list_udtf.rs

block_in_place limitation (line ~257)

let Some(provider) = block_in_place(|| {
    Handle::current().block_on(schema.table(&table_name))
})?

block_in_place panics on single-threaded Tokio runtimes (i.e., #[tokio::test] without flavor = "multi_thread", or any current_thread runtime). The example works because SessionContext defaults to multi-thread, but this is a footgun for anyone copying this pattern into tests or embedded contexts. A comment explaining the constraint would help:

// `call_with_args` is synchronous, but `schema.table()` is async.
// `block_in_place` requires a multi-threaded Tokio runtime; panics otherwise.

This is also a broader design note: because call_with_args is a sync method, any UDTF that needs async catalog lookups is forced to use block_in_place or similar hacks. This trade-off is probably intentional (matching TableFunctionImpl's existing sync nature), but documenting it in the trait's rustdoc would help users.

MemTable import path (line ~183)

use datafusion::{
    catalog::{MemTable, TableFunctionArgs, TableFunctionImpl, TableProvider},
    ...
};

MemTable is typically at datafusion::datasource::memory::MemTable. Importing it via datafusion::catalog::MemTable implies a re-export. Please verify this is an intentional stable re-export path, not an accidental internal one; if it's only re-exported for convenience it should be documented.


FFI (datafusion/ffi/src/udtf.rs)

#[deprecated] on a repr(C) struct field (line ~449)

Putting #[deprecated] on a field in a #[repr(C)] struct is unusual and may not be well-supported across toolchains. The deprecation note here is informational only (it won't prevent FFI callers in other languages from calling it), so this might be better handled via a doc comment rather than the attribute.

Lifetime of foreign_session in call_with_args_wrapper (lines ~498-511)

let mut foreign_session = None;
let session = rresult_return!(
    session
        .as_local()
        .map(Ok::<&(dyn Session + Send + Sync), DataFusionError>)
        .unwrap_or_else(|| {
            foreign_session = Some(ForeignSession::try_from(&session)?);
            Ok(foreign_session.as_ref().unwrap())
        })
);

The foreign_session variable keeps the allocation alive until end of scope, which is correct. But foreign_session.as_ref().unwrap() inside the closure returns a reference that's only valid while foreign_session lives in the outer scope — this looks safe here, but is fragile and non-obvious. A comment explaining why foreign_session must outlive session would prevent future regressions.


Minor / Nit

  • exprs.get(0) in the docs example (docs/source/library-user-guide/functions/adding-udfs.md line ~689): Clippy prefers exprs.first(). Since this is documentation code that will be checked, it should match Clippy's expectations.

  • let exprs = args.args; boilerplate: Every single implementor has this as their first line. If the field is not renamed to exprs, consider at minimum adding a convenience method:

    impl TableFunctionArgs<'_> {
        pub fn exprs(&self) -> &[Expr] { self.args }
    }
  • Documentation gap: TableFunctionArgs has a doc comment for the struct and its fields, but the struct-level comment could note that the session is only available during planning (not execution), so UDTFs should capture needed state into the TableProvider they return.


Overall

The approach is sound: it's backward-compatible, properly deprecated, and the new example clearly motivates the change. Addressing the args.args naming, documenting the block_in_place constraint, and clarifying the FFI deprecation approach would strengthen the PR before merge.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces session-aware User Defined Table Functions (UDTFs) by replacing the TableFunctionImpl::call method with call_with_args. This new method receives a TableFunctionArgs struct containing both the function arguments and the session context, enabling UDTFs to access session-specific state. The PR includes updates to existing table functions, FFI support for the new signature, and a new example demonstrating a table_list UDTF. Feedback was provided regarding the table_list example, specifically suggesting the use of concurrent lookups instead of blocking the thread within a loop to improve performance.

Comment on lines +90 to +111
for catalog_name in catalog_list.catalog_names() {
let Some(catalog) = catalog_list.catalog(&catalog_name) else {
continue;
};
for schema_name in catalog.schema_names() {
let Some(schema) = catalog.schema(&schema_name) else {
continue;
};
for table_name in schema.table_names() {
let Some(provider) = block_in_place(|| {
Handle::current().block_on(schema.table(&table_name))
})?
else {
continue;
};
catalogs.append_value(catalog_name.clone());
schemas.append_value(schema_name.clone());
tables.append_value(table_name.clone());
types.append_value(provider.table_type().to_string())
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation iterates through catalogs, schemas, and tables, and for each table, it blocks to get the table provider. Using block_in_place inside a loop can be inefficient as it blocks the current thread from running other tasks.

To improve performance, you could gather all the asynchronous schema.table() calls and execute them concurrently using futures::future::try_join_all. This would perform all the lookups in parallel.

        let mut table_futures = vec![];
        for catalog_name in catalog_list.catalog_names() {
            if let Some(catalog) = catalog_list.catalog(&catalog_name) {
                for schema_name in catalog.schema_names() {
                    if let Some(schema) = catalog.schema(&schema_name) {
                        for table_name in schema.table_names() {
                            let schema_clone = Arc::clone(&schema);
                            let catalog_name_clone = catalog_name.clone();
                            let schema_name_clone = schema_name.clone();
                            let table_name_clone = table_name.clone();
                            table_futures.push(async move {
                                schema_clone.table(&table_name_clone).await.map(|provider| {
                                    (catalog_name_clone, schema_name_clone, table_name_clone, provider)
                                })
                            });
                        }
                    }
                }
            }
        }

        let results = block_in_place(|| {
            Handle::current().block_on(futures::future::try_join_all(table_futures))
        })?;

        for (catalog_name, schema_name, table_name, provider) in results {
            if let Some(provider) = provider {
                catalogs.append_value(catalog_name);
                schemas.append_value(schema_name);
                tables.append_value(table_name);
                types.append_value(provider.table_type().to_string())
            }
        }

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The current solution uses Tokio's block_in_place() method that works only in multi-threaded runtime. By using futures::try_join_all the code will work both in multi-threaded runtime and current-thread runtime. In addition it will be more concurrent.

@augmentcode
Copy link
Copy Markdown

augmentcode bot commented Mar 29, 2026

🤖 Augment PR Summary

Summary: This PR updates DataFusion’s UDTF (table function) API to pass the calling session into table function implementations.

Changes:

  • Introduces TableFunctionArgs (arguments + session) and adds TableFunctionImpl::call_with_args while deprecating call
  • Updates table-function call sites in the planner to invoke table functions with session context
  • Migrates existing UDTFs (CLI functions, examples, core tests, generate_series/range) to implement call_with_args
  • Extends the FFI UDTF bridge to support session-aware calls
  • Adds a new table_list_udtf example demonstrating a session-aware UDTF and updates README/docs accordingly

Technical Notes: The new API enables UDTFs to inspect catalogs/schemas via the provided session, while maintaining backward compatibility via deprecated entry points.

🤖 Was this summary useful? React with 👍 or 👎

Copy link
Copy Markdown

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 4 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

since = "53.0.0",
note = "Implement `TableFunctionImpl::call_with_args` instead"
)]
fn call(&self, _args: &[Expr]) -> Result<Arc<dyn TableProvider>> {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/catalog/src/table.rs:525: Because both TableFunctionImpl::call and call_with_args now have default implementations, it’s possible to impl TableFunctionImpl for X {} and compile successfully but only fail at runtime with internal_err!("unimplemented"). Is that loss of compile-time enforcement intentional for this API transition?

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:valid-but-wont-fix; category:bug; feedback: The Augment AI reviewer is not correct! The default implementation for call_with_args is provided, so that the existing users do not need to fix their builds when they upgrade to the next version of DataFusion. Only the new users could break with an internal error but they will quickly realize this and fix their implementation.

unsafe extern "C" fn(udtf: &Self, args: RVec<u8>) -> FFIResult<FFI_TableProvider>,

/// Equivalent to the [`TableFunctionImpl::call_with_args`].
call_with_args: unsafe extern "C" fn(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion/ffi/src/udtf.rs:59: FFI_TableFunction is #[repr(C)] and StableAbi, so adding the new call_with_args field changes the struct layout and likely breaks ABI for any external libraries built against the previous definition. Can you confirm there’s explicit versioning/prefix-type handling here so older plugins won’t be misinterpreted at runtime?

Severity: high

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-but-wont-fix; category:bug; feedback: The Augment AI reviewer is correct! The FFI API is broken by adding the new function but this is OK for a new major release of DataFusion.

};
for table_name in schema.table_names() {
let Some(provider) = block_in_place(|| {
Handle::current().block_on(schema.table(&table_name))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datafusion-examples/examples/udf/table_list_udtf.rs:100: Using block_in_place + Handle::current().block_on(...) will panic if table_list() is invoked without a Tokio runtime, and block_in_place also requires the multi-thread runtime. Since UDTFs can be called from various execution contexts, it may be worth documenting/guarding these runtime assumptions for the example.

Severity: medium

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! The example would work only in multi-threaded Tokio runtime. It would be good to document this (e.g. with a comment), so the users copying this example into their application are aware of this limitation.

The simple UDTF used here takes a single `Int64` argument and returns a table with a single column with the value of the
argument. To create a function in DataFusion, you need to implement the `TableFunctionImpl` trait. This trait has a
single method, `call`, that takes a slice of `Expr`s and returns a `Result<Arc<dyn TableProvider>>`.
single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docs/source/library-user-guide/functions/adding-udfs.md:1380: This says TableFunctionImpl has a “single method” and that TableFunctionArgs only carries the Expr slice, but the trait still includes the deprecated call and TableFunctionArgs also includes the calling session. That mismatch may confuse readers trying to write session-aware UDTFs.

Severity: low

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The Augment AI reviewer is correct! The documentation would be misleading until the old call() method is fully removed. It should be improved to mention both methods for the time being.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
datafusion/core/tests/user_defined/user_defined_table_functions.rs (1)

203-205: Please add one regression test that actually reads args.session.

Right now the test migrations only prove the new signature compiles; they never assert that the passed session is usable. Since this PR’s value is the new session plumbing, a small test that inspects args.session (for example by downcasting to SessionState and checking a registered table is visible) would make regressions much harder to miss.

Also applies to: 235-235

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion/core/tests/user_defined/user_defined_table_functions.rs` around
lines 203 - 205, Add a regression test that uses the table function's
call_with_args(TableFunctionArgs) path and actually inspects args.session: in
the test create a SessionState, register a temporary table or table provider
into that session, call the table function (triggering call_with_args) and
inside its implementation downcast args.session to SessionState and assert the
registered table is visible via SessionState::catalog or catalog list_tables;
reference the function call_with_args, the TableFunctionArgs struct
(args.session), and SessionState downcast to locate where to add the assertion
so the test validates the session plumbing rather than only compilation.
datafusion-examples/examples/udf/table_list_udtf.rs (1)

76-82: Use a user-facing error for non-local sessions.

This downcast can fail legitimately once the new FFI path supplies a ForeignSession, so surfacing it as DataFusionError::Internal makes an unsupported calling mode look like engine corruption. Please return a clear "table_list currently requires a local SessionState" error here, or document that limitation inline so users do not copy a local-only pattern as the general session-aware UDTF recipe.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion-examples/examples/udf/table_list_udtf.rs` around lines 76 - 82,
Replace the internal-only error when downcasting args.session to SessionState in
table_list_udtf (the block using
args.session.as_any().downcast_ref::<SessionState>()) with a user-facing error
that clearly states that table_list requires a local SessionState (or documents
the limitation), e.g. returning an Err that mentions "table_list currently
requires a local SessionState" (and mention ForeignSession as the possible
alternate) instead of DataFusionError::Internal so callers with ForeignSession
get a clear, actionable message.
datafusion/ffi/src/udtf.rs (1)

427-430: Add coverage for the non-local session branch.

This still passes a local &ctx.state(), so call_with_args_wrapper() takes the as_local() path at Lines 153-157 and never exercises the new ForeignSession::try_from bridge at Lines 158-160. One extra round-trip case that forces a non-local FFI_SessionRef would make this change much safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion/ffi/src/udtf.rs` around lines 427 - 430, The test currently passes
a local &ctx.state() so call_with_args_wrapper() follows the as_local() path and
never hits ForeignSession::try_from; add a second round-trip case that
constructs and passes a non-local FFI_SessionRef (so TableFunctionArgs.session
is not a local ctx.state()) to table.call_with_args()/call_with_args_wrapper()
to force the ForeignSession::try_from branch to run; reference the existing
TableFunctionArgs, call_with_args, call_with_args_wrapper,
ForeignSession::try_from, ctx.state(), and FFI_SessionRef when implementing this
additional case.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@datafusion/catalog/src/table.rs`:
- Around line 510-516: Make TableFunctionArgs future-proof by marking the struct
#[non_exhaustive], making its fields private, and adding a public constructor
and accessors: add TableFunctionArgs::new(args: &'a [Expr], session: &'a dyn
Session) -> Self and getters like args(&self) and session(&self); update all
external construction sites (e.g., where session_state.rs builds the struct) to
call TableFunctionArgs::new(...) instead of constructing fields directly so
adding new fields later won’t be breaking.

In `@datafusion/ffi/src/udtf.rs`:
- Around line 277-279: The FFI session is being constructed without the UDTF
runtime; update the FFI_SessionRef::new call to pass the session runtime instead
of None by extracting self.0.runtime() (the same runtime used in
call_with_args_wrapper) as the second argument so
FFI_SessionRef::new(args.session, self.0.runtime(),
self.0.logical_codec.clone()) and enable session-aware async/catalog operations
for foreign UDTFs.

In `@docs/source/library-user-guide/functions/adding-udfs.md`:
- Around line 1378-1384: The docs describe TableFunctionArgs only as carrying a
slice of Exprs and omit the new session capability; update the UDTF overview to
mention that TableFunctionArgs (used in TableFunctionImpl::call_with_args) also
exposes args.session so implementers can access the execution/session context
when parsing Exprs and returning an Arc<dyn TableProvider>, and add a short note
to validate/use args.session when implementing session-aware UDTFs (reference
TableFunctionArgs, call_with_args, args.session, TableFunctionImpl, Expr,
TableProvider).

---

Nitpick comments:
In `@datafusion-examples/examples/udf/table_list_udtf.rs`:
- Around line 76-82: Replace the internal-only error when downcasting
args.session to SessionState in table_list_udtf (the block using
args.session.as_any().downcast_ref::<SessionState>()) with a user-facing error
that clearly states that table_list requires a local SessionState (or documents
the limitation), e.g. returning an Err that mentions "table_list currently
requires a local SessionState" (and mention ForeignSession as the possible
alternate) instead of DataFusionError::Internal so callers with ForeignSession
get a clear, actionable message.

In `@datafusion/core/tests/user_defined/user_defined_table_functions.rs`:
- Around line 203-205: Add a regression test that uses the table function's
call_with_args(TableFunctionArgs) path and actually inspects args.session: in
the test create a SessionState, register a temporary table or table provider
into that session, call the table function (triggering call_with_args) and
inside its implementation downcast args.session to SessionState and assert the
registered table is visible via SessionState::catalog or catalog list_tables;
reference the function call_with_args, the TableFunctionArgs struct
(args.session), and SessionState downcast to locate where to add the assertion
so the test validates the session plumbing rather than only compilation.

In `@datafusion/ffi/src/udtf.rs`:
- Around line 427-430: The test currently passes a local &ctx.state() so
call_with_args_wrapper() follows the as_local() path and never hits
ForeignSession::try_from; add a second round-trip case that constructs and
passes a non-local FFI_SessionRef (so TableFunctionArgs.session is not a local
ctx.state()) to table.call_with_args()/call_with_args_wrapper() to force the
ForeignSession::try_from branch to run; reference the existing
TableFunctionArgs, call_with_args, call_with_args_wrapper,
ForeignSession::try_from, ctx.state(), and FFI_SessionRef when implementing this
additional case.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: bafea0ab-08c9-4ce5-8a2e-03b29291acec

📥 Commits

Reviewing files that changed from the base of the PR and between 91ea71d and f96b566.

📒 Files selected for processing (11)
  • datafusion-cli/src/functions.rs
  • datafusion-examples/README.md
  • datafusion-examples/examples/udf/main.rs
  • datafusion-examples/examples/udf/simple_udtf.rs
  • datafusion-examples/examples/udf/table_list_udtf.rs
  • datafusion/catalog/src/table.rs
  • datafusion/core/src/execution/session_state.rs
  • datafusion/core/tests/user_defined/user_defined_table_functions.rs
  • datafusion/ffi/src/udtf.rs
  • datafusion/functions-table/src/generate_series.rs
  • docs/source/library-user-guide/functions/adding-udfs.md

Comment on lines +510 to +516
/// Describes arguments provided to the table function call.
pub struct TableFunctionArgs<'a> {
/// Call arguments.
pub args: &'a [Expr],
/// Session within which the function is called.
pub session: &'a dyn Session,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Future-proof TableFunctionArgs before this API ships.

This type is already constructed cross-crate from datafusion/core/src/execution/session_state.rs, so the current public-field shape freezes it immediately. If you ever need to carry more call metadata, adding a field becomes a breaking change for downstream UDTFs. A constructor plus #[non_exhaustive] would keep the new API extensible.

Suggested API hardening
+#[non_exhaustive]
 /// Describes arguments provided to the table function call.
 pub struct TableFunctionArgs<'a> {
     /// Call arguments.
     pub args: &'a [Expr],
     /// Session within which the function is called.
     pub session: &'a dyn Session,
 }
+
+impl<'a> TableFunctionArgs<'a> {
+    pub fn new(args: &'a [Expr], session: &'a dyn Session) -> Self {
+        Self { args, session }
+    }
+}

Then switch external construction sites to TableFunctionArgs::new(...).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/// Describes arguments provided to the table function call.
pub struct TableFunctionArgs<'a> {
/// Call arguments.
pub args: &'a [Expr],
/// Session within which the function is called.
pub session: &'a dyn Session,
}
#[non_exhaustive]
/// Describes arguments provided to the table function call.
pub struct TableFunctionArgs<'a> {
/// Call arguments.
pub args: &'a [Expr],
/// Session within which the function is called.
pub session: &'a dyn Session,
}
impl<'a> TableFunctionArgs<'a> {
pub fn new(args: &'a [Expr], session: &'a dyn Session) -> Self {
Self { args, session }
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion/catalog/src/table.rs` around lines 510 - 516, Make
TableFunctionArgs future-proof by marking the struct #[non_exhaustive], making
its fields private, and adding a public constructor and accessors: add
TableFunctionArgs::new(args: &'a [Expr], session: &'a dyn Session) -> Self and
getters like args(&self) and session(&self); update all external construction
sites (e.g., where session_state.rs builds the struct) to call
TableFunctionArgs::new(...) instead of constructing fields directly so adding
new fields later won’t be breaking.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct! The new struct is fully public (it is public itself and all its fields are public too). This makes it too open for the external users and thus less extendable in the future. It would be better to hide the fields, add a constructor method and getters. This way more fields could be added later without breaking the existing users.

Comment on lines +277 to +279
let session =
FFI_SessionRef::new(args.session, None, self.0.logical_codec.clone());
let codec: Arc<dyn LogicalExtensionCodec> = (&self.0.logical_codec).into();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect the session bridge implementation and constructor signature.
fd -i session.rs datafusion/ffi/src --exec sed -n '1,260p' {}

# Show where runtime is stored/used across the FFI session bridge.
rg -n -C4 'struct FFI_SessionRef|impl FFI_SessionRef|FFI_SessionRef::new\(|ForeignSession|runtime|block_on' datafusion/ffi/src

Repository: martin-augment/datafusion

Length of output: 50382


🏁 Script executed:

rg -n 'impl FFI_SessionRef|FFI_SessionRef::new|struct ForeignSession|impl ForeignSession|impl.*ForeignSession.*try_from' datafusion/ffi/src/session.rs -A 10

Repository: martin-augment/datafusion

Length of output: 138


🏁 Script executed:

find datafusion/ffi/src -name '*.rs' -type f | head -20

Repository: martin-augment/datafusion

Length of output: 853


🏁 Script executed:

rg -l 'struct FFI_SessionRef|struct ForeignSession' datafusion/ffi/src/

Repository: martin-augment/datafusion

Length of output: 102


🏁 Script executed:

cat -n datafusion/ffi/src/session/mod.rs | head -150

Repository: martin-augment/datafusion

Length of output: 6853


🏁 Script executed:

sed -n '150,250p' datafusion/ffi/src/session/mod.rs

Repository: martin-augment/datafusion

Length of output: 3411


🏁 Script executed:

sed -n '250,400p' datafusion/ffi/src/session/mod.rs | grep -A 20 'fn new'

Repository: martin-augment/datafusion

Length of output: 987


🏁 Script executed:

sed -n '250,450p' datafusion/ffi/src/session/mod.rs | grep -A 40 'struct ForeignSession'

Repository: martin-augment/datafusion

Length of output: 1486


🏁 Script executed:

sed -n '400,500p' datafusion/ffi/src/session/mod.rs | head -60

Repository: martin-augment/datafusion

Length of output: 2116


🏁 Script executed:

sed -n '400,550p' datafusion/ffi/src/session/mod.rs | grep -B 5 -A 15 'runtime_env: Default'

Repository: martin-augment/datafusion

Length of output: 625


Pass the UDTF runtime to FFI_SessionRef::new() to support session-aware operations.

At line 278, FFI_SessionRef::new() receives None for the runtime parameter. Extract self.0.runtime() (as done in call_with_args_wrapper at line 138) and pass it as the second argument to enable proper bridging of async session/catalog operations in foreign UDTFs.

        let session =
-           FFI_SessionRef::new(args.session, None, self.0.logical_codec.clone());
+           FFI_SessionRef::new(args.session, self.0.runtime(), self.0.logical_codec.clone());
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let session =
FFI_SessionRef::new(args.session, None, self.0.logical_codec.clone());
let codec: Arc<dyn LogicalExtensionCodec> = (&self.0.logical_codec).into();
let session =
FFI_SessionRef::new(args.session, self.0.runtime(), self.0.logical_codec.clone());
let codec: Arc<dyn LogicalExtensionCodec> = (&self.0.logical_codec).into();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion/ffi/src/udtf.rs` around lines 277 - 279, The FFI session is being
constructed without the UDTF runtime; update the FFI_SessionRef::new call to
pass the session runtime instead of None by extracting self.0.runtime() (the
same runtime used in call_with_args_wrapper) as the second argument so
FFI_SessionRef::new(args.session, self.0.runtime(),
self.0.logical_codec.clone()) and enable session-aware async/catalog operations
for foreign UDTFs.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! The UDF runtime could be made available to the FFI SessionRef, so that it could be used by the foreign caller to make async operations too.

Comment on lines 1378 to 1384
The simple UDTF used here takes a single `Int64` argument and returns a table with a single column with the value of the
argument. To create a function in DataFusion, you need to implement the `TableFunctionImpl` trait. This trait has a
single method, `call`, that takes a slice of `Expr`s and returns a `Result<Arc<dyn TableProvider>>`.
single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
Passed struct includes function arguments as a slice of `Expr`s.

In the `call` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
In the `call_with_args` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
validation of the input `Expr`s, e.g. checking that the number of arguments is correct.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Document args.session in the UDTF overview.

This paragraph currently describes TableFunctionArgs as if it only carries the Expr slice, which hides the main capability added by this PR. Readers will not realize session-aware UDTFs are supported unless they inspect the type definition or examples.

Suggested wording
-single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
-Passed struct includes function arguments as a slice of `Expr`s.
+single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
+`TableFunctionArgs` carries both the function arguments (`args.args`) and the calling session (`args.session`).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
The simple UDTF used here takes a single `Int64` argument and returns a table with a single column with the value of the
argument. To create a function in DataFusion, you need to implement the `TableFunctionImpl` trait. This trait has a
single method, `call`, that takes a slice of `Expr`s and returns a `Result<Arc<dyn TableProvider>>`.
single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
Passed struct includes function arguments as a slice of `Expr`s.
In the `call` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
In the `call_with_args` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
validation of the input `Expr`s, e.g. checking that the number of arguments is correct.
The simple UDTF used here takes a single `Int64` argument and returns a table with a single column with the value of the
argument. To create a function in DataFusion, you need to implement the `TableFunctionImpl` trait. This trait has a
single method, `call_with_args`, that takes a `TableFunctionArgs` struct and returns a `Result<Arc<dyn TableProvider>>`.
`TableFunctionArgs` carries both the function arguments (`args.args`) and the calling session (`args.session`).
In the `call_with_args` method, you parse the input `Expr`s and return a `TableProvider`. You might also want to do some
validation of the input `Expr`s, e.g. checking that the number of arguments is correct.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/library-user-guide/functions/adding-udfs.md` around lines 1378 -
1384, The docs describe TableFunctionArgs only as carrying a slice of Exprs and
omit the new session capability; update the UDTF overview to mention that
TableFunctionArgs (used in TableFunctionImpl::call_with_args) also exposes
args.session so implementers can access the execution/session context when
parsing Exprs and returning an Arc<dyn TableProvider>, and add a short note to
validate/use args.session when implementing session-aware UDTFs (reference
TableFunctionArgs, call_with_args, args.session, TableFunctionImpl, Expr,
TableProvider).

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:documentation; feedback: The CodeRabbit AI reviewer is correct! The main benefit of the new call_with_args method is the access to the session state. The updated documentation should mention it to make the user/developer aware of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants