Skip to content

extended relational API for R bindings (similar to duckdb-r) #474

@e-kotov

Description

@e-kotov

Hi @paleolimbot ,

I'm exploring the possibility of building a dplyr-compatible interface on top of sedonadb for R, similar to what duckplyr provides for DuckDB.

Looking at the current R bindings, I noticed that sedonadb exposes a limited subset of DataFrame operations:

  • select_indices() for column selection
  • limit() for row limiting
  • collect() / to_view() for materialization

In contrast, duckdb-r exposes a full relational algebra API that duckplyr uses:

  • Expression builders: expr_reference(), expr_constant(), expr_function(), expr_comparison()
  • Relation operations: rel_filter(), rel_project(), rel_aggregate(), rel_order(), rel_join()

This allows duckplyr to translate dplyr verbs directly into relational operations without going through SQL string generation.

Questions:

  1. Are there any plans to expose more of DataFusion's DataFrame API (like filter(), aggregate(), sort()) through the R bindings?

  2. Would there be interest in accepting contributions that add an expression/relational API similar to duckdb-r?

For now, I'm working on a SQL-based approach using sd_sql(), which works but requires R-to-SQL expression translation. A native relational API would be more elegant and potentially more performant (avoiding SQL parsing overhead).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions