-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Hi @paleolimbot ,
I'm exploring the possibility of building a dplyr-compatible interface on top of sedonadb for R, similar to what duckplyr provides for DuckDB.
Looking at the current R bindings, I noticed that sedonadb exposes a limited subset of DataFrame operations:
select_indices()for column selectionlimit()for row limitingcollect()/to_view()for materialization
In contrast, duckdb-r exposes a full relational algebra API that duckplyr uses:
- Expression builders:
expr_reference(),expr_constant(),expr_function(),expr_comparison() - Relation operations:
rel_filter(),rel_project(),rel_aggregate(),rel_order(),rel_join()
This allows duckplyr to translate dplyr verbs directly into relational operations without going through SQL string generation.
Questions:
-
Are there any plans to expose more of DataFusion's DataFrame API (like
filter(),aggregate(),sort()) through the R bindings? -
Would there be interest in accepting contributions that add an expression/relational API similar to duckdb-r?
For now, I'm working on a SQL-based approach using sd_sql(), which works but requires R-to-SQL expression translation. A native relational API would be more elegant and potentially more performant (avoiding SQL parsing overhead).