Skip to content

Conversation

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Jan 8, 2026

This PR implements sd_select(), sd_transmute(), and sd_filter() wrapping the expression translation implemented in #468. The supported expressions are still very minimal but this establishes the first API we can expose in this way.

I chose to do this instead of just implementing dplyr::transmute() and dplyr::filter() because those functions have other arguments and perhaps the expectation of exact compatibility. The sd_...() versions have the added benefit of converting to a SedonaDB data frame for you and usually it's a good idea for this to be explicit (particularly for now).

This doesn't support aggregate expressions in the arguments, which does work in SQL and in dplyr. I took a stab at translating the DataFusion assembler of SELECT statements and it does work but is a bit more complicated and needs more testing than I have time to put together right now ( https://gist.github.com/paleolimbot/de220c55c96e721a50a4752397f1cbf9 ).

The next step is to add blanket support for all functions in the sedona-specific function registry so that we can do geo stuff.

Also, sd_join() would be particularly useful to expose the (arguably) most useful part of SedonaDB as an engine.

library(sedonadb)

nc <- sf::read_sf(system.file("shape/nc.shp", package = "sf"))

nc |> 
  sd_select(AREA:CNTY_ID)
#> ┌─────────┬───────────┬─────────┬─────────┐
#> │   AREA  ┆ PERIMETER ┆  CNTY_  ┆ CNTY_ID │
#> │ float64 ┆  float64  ┆ float64 ┆ float64 │
#> ╞═════════╪═══════════╪═════════╪═════════╡
#> │   0.114 ┆     1.442 ┆  1825.0 ┆  1825.0 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
#> │   0.061 ┆     1.231 ┆  1827.0 ┆  1827.0 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
#> │   0.143 ┆      1.63 ┆  1828.0 ┆  1828.0 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
#> │    0.07 ┆     2.968 ┆  1831.0 ┆  1831.0 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
#> │   0.153 ┆     2.206 ┆  1832.0 ┆  1832.0 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
#> │   0.097 ┆      1.67 ┆  1833.0 ┆  1833.0 │
#> └─────────┴───────────┴─────────┴─────────┘
#> Preview of up to 6 row(s)

nc |> 
  sd_transmute(AREA, PERIMETER)
#> ┌─────────┬───────────┐
#> │   AREA  ┆ PERIMETER │
#> │ float64 ┆  float64  │
#> ╞═════════╪═══════════╡
#> │   0.114 ┆     1.442 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
#> │   0.061 ┆     1.231 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
#> │   0.143 ┆      1.63 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
#> │    0.07 ┆     2.968 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
#> │   0.153 ┆     2.206 │
#> ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
#> │   0.097 ┆      1.67 │
#> └─────────┴───────────┘
#> Preview of up to 6 row(s)

nc |> 
  sd_filter(CNTY_ID == 1825)
#> ┌─────────┬───────────┬───┬─────────┬──────────────────────────────────────────┐
#> │   AREA  ┆ PERIMETER ┆ … ┆ NWBIR79 ┆                 geometry                 │
#> │ float64 ┆  float64  ┆   ┆ float64 ┆                 geometry                 │
#> ╞═════════╪═══════════╪═══╪═════════╪══════════════════════════════════════════╡
#> │   0.114 ┆     1.442 ┆ … ┆    19.0 ┆ MULTIPOLYGON(((-81.4727554321289 36.234… │
#> └─────────┴───────────┴───┴─────────┴──────────────────────────────────────────┘
#> Preview of up to 6 row(s)

Created on 2026-01-23 with reprex v2.1.1

@paleolimbot paleolimbot changed the title feat(r): Add transmute() implementation feat(r): Add basic DataFrame API with sd_select(), sd_transmute(), and sd_filter() Jan 23, 2026
@paleolimbot paleolimbot changed the title feat(r): Add basic DataFrame API with sd_select(), sd_transmute(), and sd_filter() feat(r/sedonadb): Add basic DataFrame API with sd_select(), sd_transmute(), and sd_filter() Jan 23, 2026
@jiayuasu
Copy link
Member

why do we need sd_ prefix if we already have sedonadb:::?

@paleolimbot
Copy link
Member Author

paleolimbot commented Jan 23, 2026

why do we need sd_ prefix if we already have sedonadb:::?

Autocomplete!
Screenshot 2026-01-23 at 11 32 01 AM

...probably a better example would be just sd_ where it lists all the functions for you. It's pretty common for R packages to do this since we don't have Python style import xxx from xxx as xxx and it's reasonably easy to end up with name conflicts.

@paleolimbot paleolimbot marked this pull request as ready for review January 23, 2026 17:34
@paleolimbot paleolimbot requested a review from Copilot January 23, 2026 17:34
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements basic DataFrame manipulation functions (sd_select(), sd_transmute(), and sd_filter()) for the SedonaDB R package, wrapping the expression translation system introduced in a previous PR. These functions provide a familiar dplyr-like API for column selection, transformation, and row filtering.

Changes:

  • Added three new exported functions (sd_select(), sd_transmute(), sd_filter()) in R/dataframe.R with corresponding Rust implementations
  • Updated documentation to consistently describe .data parameter as "A sedonadb_dataframe or an object that can be coerced to one"
  • Added comprehensive test coverage for the new DataFrame API functions

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
r/sedonadb/R/dataframe.R Implements sd_select(), sd_transmute(), and sd_filter() functions with expression translation support
r/sedonadb/src/rust/src/dataframe.rs Adds Rust methods select() and filter() to InternalDataFrame for expression-based operations
r/sedonadb/src/rust/src/expression.rs Makes exprs() method public to support DataFrame operations
r/sedonadb/tests/testthat/test-dataframe.R Adds test cases for the three new DataFrame functions
r/sedonadb/R/000-wrappers.R Auto-generated wrapper functions for new Rust methods
r/sedonadb/src/rust/api.h Auto-generated C API declarations
r/sedonadb/src/init.c Auto-generated C initialization code
r/sedonadb/NAMESPACE Exports new functions
r/sedonadb/man/*.Rd Documentation files for new and updated functions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@paleolimbot paleolimbot merged commit c38dbc6 into apache:main Jan 26, 2026
9 checks passed
@paleolimbot paleolimbot deleted the r-expr-eval-actually branch January 26, 2026 19:56
@paleolimbot paleolimbot added this to the 0.3.0 milestone Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants