18657: Implement CatalogProviderList in FFI#17
Conversation
WalkthroughThis change introduces FFI support for DataFusion's CatalogProviderList functionality. A new stable C-ABI wrapper struct ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Review: Implement CatalogProviderList in FFISummaryThis PR implements FFI support for Code Quality and Structure ✅Strengths:
Minor Observations:
Potential Issues and Bugs1. Memory Leak Risk in
|
| catalog: Arc<dyn CatalogProvider>, | ||
| ) -> Option<Arc<dyn CatalogProvider>> { | ||
| if !["blue", "red", "green", "yellow"].contains(&name.as_str()) { | ||
| log::warn!("FixedCatalogProviderList only provides four schemas: blue, red, green, yellow"); |
There was a problem hiding this comment.
The warning message mentions "schemas" but this type manages catalogs; consider using "catalogs" for accuracy (also applies to other locations in the PR).
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:good-to-have; category:typo; feedback:The Augment AI reviewer is correct that there is a copy/paste error in the warning message. It may lead to confusion when a developer investigates the logs.
| module | ||
| .create_catalog_list() | ||
| .ok_or(DataFusionError::NotImplemented( | ||
| "External catalog provider failed to implement create_catalog" |
There was a problem hiding this comment.
This error message references create_catalog, but this branch calls create_catalog_list; consider updating the text to match the function.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:good-to-have; category:typo; feedback:The Augment AI reviewer is correct that there is a copy/paste error in the error message. It may lead to confusion when a developer investigates the logs.
| /// This wrapper struct exists on the receiver side of the FFI interface, so it has | ||
| /// no guarantees about being able to access the data in `private_data`. Any functions | ||
| /// defined on this struct must only use the stable functions provided in | ||
| /// FFI_CatalogProviderList to interact with the foreign table provider. |
There was a problem hiding this comment.
Doc comment says "foreign table provider"; here it should refer to the foreign catalog provider list for consistency.
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:good-to-have; category:typo; feedback:The Augment AI reviewer is correct that there is a copy/paste error in docstring. The finding prevents confusion when a developer reads the method documentation.
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
datafusion/ffi/src/catalog_provider.rs(1 hunks)datafusion/ffi/src/catalog_provider_list.rs(1 hunks)datafusion/ffi/src/lib.rs(1 hunks)datafusion/ffi/src/tests/catalog.rs(2 hunks)datafusion/ffi/src/tests/mod.rs(3 hunks)datafusion/ffi/tests/ffi_catalog.rs(1 hunks)datafusion/ffi/tests/ffi_integration.rs(0 hunks)
💤 Files with no reviewable changes (1)
- datafusion/ffi/tests/ffi_integration.rs
🧰 Additional context used
🧬 Code graph analysis (4)
datafusion/ffi/src/tests/mod.rs (2)
datafusion/ffi/src/catalog_provider_list.rs (2)
catalog(203-203)catalog(224-232)datafusion/ffi/src/tests/catalog.rs (2)
catalog(216-218)create_catalog_provider_list(234-237)
datafusion/ffi/src/tests/catalog.rs (1)
datafusion/ffi/src/catalog_provider_list.rs (7)
catalog(203-203)catalog(224-232)inner(72-75)new(152-167)as_any(193-195)catalog_names(215-222)register_catalog(197-213)
datafusion/ffi/tests/ffi_catalog.rs (4)
datafusion/ffi/src/tests/utils.rs (1)
get_module(56-87)datafusion/ffi/src/tests/catalog.rs (3)
default(78-88)default(139-145)default(195-204)datafusion/ffi/src/catalog_provider.rs (1)
new(183-199)datafusion/ffi/src/catalog_provider_list.rs (1)
new(152-167)
datafusion/ffi/src/catalog_provider_list.rs (2)
datafusion/ffi/src/catalog_provider.rs (9)
runtime(89-92)inner(84-87)new(183-199)from(213-215)release_fn_wrapper(147-150)drop(176-178)clone_fn_wrapper(152-173)clone(219-221)as_any(225-227)datafusion/ffi/src/lib.rs (1)
version(52-56)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Cursor Bugbot
- GitHub Check: claude-review
- GitHub Check: Analyze (rust)
🔇 Additional comments (19)
datafusion/ffi/src/lib.rs (1)
29-29: LGTM!The module addition follows the existing pattern and is appropriately placed in the module hierarchy.
datafusion/ffi/src/catalog_provider.rs (1)
207-207: LGTM!The visibility change to
pub(crate)is necessary for the new catalog_provider_list module to access the innerFFI_CatalogProvider(used at line 205 in catalog_provider_list.rs). The crate-private visibility appropriately limits exposure while enabling internal FFI bridging.datafusion/ffi/src/tests/mod.rs (3)
37-38: LGTM!The imports for the new catalog provider list functionality are correctly added.
67-68: LGTM!The new
create_catalog_listfield follows the same pattern as the existingcreate_catalogfield, providing a consistent API for test infrastructure.
131-131: LGTM!The wiring correctly connects to the
create_catalog_provider_listfactory function from the catalog module.datafusion/ffi/tests/ffi_catalog.rs (1)
29-54: LGTM!The integration test correctly validates the FFI catalog provider by:
- Creating a ForeignCatalogProvider from the FFI module
- Registering it in a SessionContext
- Querying a fully-qualified table path
- Verifying the result count
datafusion/ffi/src/tests/catalog.rs (5)
31-31: LGTM!The import for FFI_CatalogProviderList is correctly added to support the new test infrastructure.
34-34: LGTM!The imports for CatalogProviderList and MemoryCatalogProviderList are correctly added.
187-205: LGTM!The FixedCatalogProviderList test implementation correctly:
- Wraps MemoryCatalogProviderList for delegation
- Provides a default initialization with a "blue" catalog
- Mirrors the pattern established by FixedCatalogProvider
207-232: LGTM!The CatalogProviderList implementation correctly:
- Delegates all methods to the inner MemoryCatalogProviderList
- Validates catalog names against a whitelist in register_catalog
- Uses log::warn! and returns None for invalid names (appropriate for an Option-returning method)
234-237: LGTM!The factory function correctly creates an FFI_CatalogProviderList from the test implementation, following the same pattern as create_catalog_provider.
datafusion/ffi/src/catalog_provider_list.rs (8)
29-61: LGTM!The FFI_CatalogProviderList struct is well-designed:
- Uses
#[repr(C)]andStableAbifor ABI stability- Function pointers follow the established pattern from FFI_CatalogProvider
- Includes all necessary operations: register_catalog, catalog_names, catalog, clone, release, version
- private_data pointer enables safe state management across FFI boundary
63-81: LGTM!The Send/Sync implementations and helper methods correctly:
- Mark the struct as thread-safe (necessary for FFI but inherently unsafe)
- Provide safe access to the inner provider and runtime through private helper methods
- Follow the exact pattern established in catalog_provider.rs
83-115: LGTM!The catalog operation wrappers correctly:
- Bridge between FFI types (RString, RVec, ROption) and native Rust types
- Handle conversion between FFI_CatalogProvider and ForeignCatalogProvider
- Propagate the runtime Handle for async operation support
- Follow the established pattern from catalog_provider.rs
117-142: LGTM!The memory management functions correctly:
- release_fn_wrapper properly frees the Box-allocated private_data
- clone_fn_wrapper creates a new instance with Arc::clone (cheap reference counting)
- Follow the established pattern from catalog_provider.rs
144-168: LGTM!The Drop implementation and constructor correctly:
- Drop calls release to ensure cleanup
- new() properly boxes private_data and leaks it as a raw pointer for FFI
- All function pointers are initialized consistently
- Follow the established pattern from catalog_provider.rs
170-190: LGTM!The ForeignCatalogProviderList wrapper correctly:
- Provides a safe interface on the receiver side of the FFI boundary
- Implements Send/Sync (required for DataFusion's threading model)
- Provides From conversion for ergonomic usage
- Implements Clone via the function pointer
- Follows the established pattern from ForeignCatalogProvider
192-233: LGTM!The CatalogProviderList trait implementation correctly:
- Bridges all required methods through FFI function pointers
- Handles bidirectional conversion between ForeignCatalogProvider and FFI_CatalogProvider (lines 203-207)
- Properly wraps non-FFI CatalogProviders when needed
- Converts results back to Arc trait objects
- Follows the same pattern as the CatalogProvider impl in catalog_provider.rs
235-283: LGTM!The test provides comprehensive coverage:
- Round-trip FFI conversion
- Registering and listing catalogs
- Replacing existing catalogs (returns Some)
- Adding new catalogs (returns None)
- Retrieving both existing and non-existent catalogs
This validates the FFI bridge works correctly across the boundary.
| #[tokio::test] | ||
| async fn test_catalog_list() -> datafusion_common::Result<()> { | ||
| let module = get_module()?; | ||
|
|
||
| let ffi_catalog_list = | ||
| module | ||
| .create_catalog_list() | ||
| .ok_or(DataFusionError::NotImplemented( | ||
| "External catalog provider failed to implement create_catalog" | ||
| .to_string(), | ||
| ))?(); | ||
| let foreign_catalog_list: ForeignCatalogProviderList = (&ffi_catalog_list).into(); | ||
|
|
||
| let ctx = SessionContext::default(); | ||
| ctx.register_catalog_list(Arc::new(foreign_catalog_list)); | ||
|
|
||
| let df = ctx.table("blue.apple.purchases").await?; | ||
|
|
||
| let results = df.collect().await?; | ||
|
|
||
| assert_eq!(results.len(), 2); | ||
| let num_rows: usize = results.into_iter().map(|rb| rb.num_rows()).sum(); | ||
| assert_eq!(num_rows, 5); | ||
|
|
||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
Fix the error message for catalog list creation.
Line 64 references "create_catalog" in the error message, but it should reference "create_catalog_list" for consistency and clarity.
Apply this diff:
.create_catalog_list()
.ok_or(DataFusionError::NotImplemented(
- "External catalog provider failed to implement create_catalog"
+ "External catalog provider failed to implement create_catalog_list"
.to_string(),📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #[tokio::test] | |
| async fn test_catalog_list() -> datafusion_common::Result<()> { | |
| let module = get_module()?; | |
| let ffi_catalog_list = | |
| module | |
| .create_catalog_list() | |
| .ok_or(DataFusionError::NotImplemented( | |
| "External catalog provider failed to implement create_catalog" | |
| .to_string(), | |
| ))?(); | |
| let foreign_catalog_list: ForeignCatalogProviderList = (&ffi_catalog_list).into(); | |
| let ctx = SessionContext::default(); | |
| ctx.register_catalog_list(Arc::new(foreign_catalog_list)); | |
| let df = ctx.table("blue.apple.purchases").await?; | |
| let results = df.collect().await?; | |
| assert_eq!(results.len(), 2); | |
| let num_rows: usize = results.into_iter().map(|rb| rb.num_rows()).sum(); | |
| assert_eq!(num_rows, 5); | |
| Ok(()) | |
| } | |
| #[tokio::test] | |
| async fn test_catalog_list() -> datafusion_common::Result<()> { | |
| let module = get_module()?; | |
| let ffi_catalog_list = | |
| module | |
| .create_catalog_list() | |
| .ok_or(DataFusionError::NotImplemented( | |
| "External catalog provider failed to implement create_catalog_list" | |
| .to_string(), | |
| ))?(); | |
| let foreign_catalog_list: ForeignCatalogProviderList = (&ffi_catalog_list).into(); | |
| let ctx = SessionContext::default(); | |
| ctx.register_catalog_list(Arc::new(foreign_catalog_list)); | |
| let df = ctx.table("blue.apple.purchases").await?; | |
| let results = df.collect().await?; | |
| assert_eq!(results.len(), 2); | |
| let num_rows: usize = results.into_iter().map(|rb| rb.num_rows()).sum(); | |
| assert_eq!(num_rows, 5); | |
| Ok(()) | |
| } |
🤖 Prompt for AI Agents
In datafusion/ffi/tests/ffi_catalog.rs around lines 56 to 81, the error message
raised when create_catalog_list() returns None incorrectly references
"create_catalog" — update the message to reference "create_catalog_list"
instead; modify the DataFusionError::NotImplemented string to say "External
catalog provider failed to implement create_catalog_list" so the log matches the
failing function name and improves clarity.
There was a problem hiding this comment.
value:good-to-have; category:typo; feedback:The CodeRabbit AI reviewer is correct that there is a copy/paste error in the error message. It may lead to confusion when a developer investigates the logs.
18657: To review by AI