fix: replace docs, upcasting ints, sf geo encoding#7
Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the transaction/cursor workflow to support row-by-row replace/edit/insert patterns by adding a rows() materializer on cursors and allowing OdpTransaction$insert() to accept named lists (single-row “row dictionaries”).
Changes:
- Add
OdpCursor$rows()returning a list of per-row named lists. - Allow
odp_to_arrow_table()(and thustx$insert()) to accept named list input. - Update docs, examples, and tests around replace + row iteration; bump package version.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
R/cursor.R |
Adds rows() helper that materializes cursor results as row dictionaries. |
R/utils.R |
Extends Arrow table coercion to accept named list input. |
tests/testthat/test-cursor.R |
Adds tests for cursor$rows() behavior. |
tests/testthat/test-table_tx.R |
Updates list input validation tests; adds replace + row-iteration workflow test. |
scripts/example_quick_start.R |
Demonstrates replace + row iteration in a transaction. |
man/OdpCursor.Rd |
Documents the new rows() method (but currently malformed Rd output). |
man/OdpTransaction.Rd |
Updates insert docs and adds replace example. |
R/r6-docs.R |
Roxygen source updates for cursor/transaction docs. |
DESCRIPTION |
Version bump to 0.1.3. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/testthat/test-table_tx.R
Outdated
|
|
||
| expect_true(TRUE) |
There was a problem hiding this comment.
This test doesn't assert any observable outcome beyond expect_true(TRUE), so it won't catch regressions (e.g., that tx$insert(row) + tx$commit() successfully flushes, or that the mock client saw the expected insert/commit calls). Consider adding an assertion that the loop produces no errors and/or verifying the fake client's call_log contains the expected request(s) after committing.
| expect_true(TRUE) | |
| tx$commit() | |
| expect_true(length(client$call_log) >= 1L) |
man/OdpCursor.Rd
Outdated
| \item{$next_batch()$}{Return the next \code{RecordBatch} or \code{NULL} when finished.} | ||
| \item{$collect()\code{/}$arrow()\verb{\}\{Materialise unread batches as an Arrow Table.\} \\item\{$dataframe()}}{Materialise unread batches as a base \code{data.frame}.} | ||
| \item{$tibble()`}{Materialise unread batches as a tibble (optional | ||
| dependency).} | ||
| \item{$tibble()\verb{\}\{Materialise unread batches as a tibble (optional dependency).\} \\item\{$rows()}}{Materialise unread batches as a list of named lists (row dictionaries).} |
There was a problem hiding this comment.
In this generated Rd, the \describe{} / \item{} markup for method docs is malformed (e.g., the $collect()/$arrow() and $tibble()/$rows() entries appear to be merged and contain stray \verb{} / \\item sequences). This is likely to break R CMD check Rd parsing. Please fix the roxygen source in R/r6-docs.R for the OdpCursor Methods section (avoid backticks around $method() and ensure each method has its own \item{label}{description}), then re-run roxygen to regenerate this file.
man/OdpTransaction.Rd
Outdated
|
|
||
| \describe{ | ||
| \item{$insert(data)$}{Validate and buffer a data frame. Auto-flushes on size thresholds.} | ||
| \item{$insert(data)$}{Validate and buffer a data frame, named list (single row), or Arrow Table. Auto-flushes on size thresholds.} |
There was a problem hiding this comment.
The docs say named list input is a "single row", but the current conversion path accepts any named list and as.data.frame() will happily treat vector elements as full columns (potentially inserting multiple rows). Either enforce single-row named lists (e.g., all elements length 1) or adjust the wording here to reflect the actual accepted shape (named list-as-columns).
| \item{$insert(data)$}{Validate and buffer a data frame, named list (single row), or Arrow Table. Auto-flushes on size thresholds.} | |
| \item{$insert(data)$}{Validate and buffer a data frame, named list of columns (treated like a data frame), or Arrow Table. Auto-flushes on size thresholds.} |
R/utils.R
Outdated
| arg | ||
| } else if (is.data.frame(arg)) { | ||
| arrow::Table$create(arg) | ||
| } else if (is.list(arg) && !is.null(names(arg))) { |
There was a problem hiding this comment.
odp_to_arrow_table() currently treats any named list as valid input as long as names(arg) is not NULL. That will also accept partially unnamed lists (e.g., list(a = 1, 2)) or empty-string names, which then become ambiguous/invalid column names after as.data.frame(). Consider tightening validation to require all names are present and non-empty (and, if the intent is a single-row "row dictionary", also validate that each element is length-1) before converting to an Arrow Table.
| } else if (is.list(arg) && !is.null(names(arg))) { | |
| } else if (is.list(arg) && !is.null(names(arg))) { | |
| nms <- names(arg) | |
| if (length(nms) != length(arg) || any(is.na(nms)) || any(!nzchar(nms))) { | |
| cli::cli_abort("`arg` must be a fully named list with non-empty names to be converted to an Arrow Table") | |
| } |
| if (request$operation == "replace") { | ||
| buf <- arrow::BufferOutputStream$create() | ||
| writer <- arrow::RecordBatchStreamWriter$create(buf, schema = mock_batch$schema) | ||
| writer$write(mock_batch) |
There was a problem hiding this comment.
The Arrow IPC writer is invoked with writer$write(mock_batch), but elsewhere in the repo Arrow stream writers use write_batch() / write_table(). Using a different method name here risks test failures across Arrow versions. Consider switching to the same writer API used in production code (write_batch(mock_batch) or equivalent) for consistency and compatibility.
| writer$write(mock_batch) | |
| writer$write_batch(mock_batch) |
replace