WIP: Refactor `List` to use `Rep<Obj>` internally #172

sebffischer · 2024-08-05T08:47:06Z

TODOs:

(maybe) properly implement .len()

sebffischer · 2024-08-05T13:50:35Z

@dgkf To allow for this refactor I think the Rep<T> should then also include the names.
We would thereby also get named vectors.

I am wondering though whether there was a specific reason why the Names subset was implemented via NamedSubsets (

R/src/object/vector/subsets.rs

Line 9 in ee9b4a4

pub struct NamedSubsets {

) and not using the Subset::Names (

R/src/object/vector/subset.rs

Line 19 in ee9b4a4

Names(CowObj<Vec<Character>>),

)

dgkf · 2024-08-05T23:40:13Z

I am wondering though whether there was a specific reason why the Names subset was implemented via NamedSubsets

Yeah, this part isn't as clean as I'd like. It could definitely benefit from a rework.

If a subset only operates on indices, then we can collapse multiple subsets into a set of indices pretty easily - x[1:10][c(2, 4, 6)][c(1, 3)] can be determined to be equivalent to x[c(2, 6)]. We don't need any other information about x. But when a vector is named we need to know its names if we want to subset using them.

These two structures serve different purposes.

Names as a subset just lists which names have been selected. When calling x[c("a", "b")], we'd have a Names subset of vec!["a", "b"].
NamedSubsets is entirely different - it provides a mechanism for handing off a vector's names to the Subset object so that the names of each element can be used to apply a Names subset. Without it, we'd have no idea which position corresponds to each name being subset.

These are poorly named and conceptually I'm not crazy about how I modeled the problem. It exists this way because I built the subset altrep while completely neglecting named subsets and only realized after the rest of the proof-of-concept was put together 😬. If the feature looks tacked on, that's because it totally was.

sebffischer · 2024-09-09T08:16:07Z

Just fyi: My vacation is starting on Friday and I am planning to resume working on this!

sebffischer · 2024-09-21T11:23:06Z

Uff, so I think I started something here before properly understanding it.

The changes here basically allow Rep<Obj> to represent an unnamed list.
Because I think we eventually want to distinguish between a list and an unnamed list for efficiency, I think it is worth to keep those minor changes.

One idea I had is whether we maybe want to add a NamedSubsets variant to the RepType enum.
This NamedSubsets type would basically be what is now the List, with the different that it is generic in the type that it holds.

This would:

allow to unify the API of the list representation and the vector representation
give us named vectors (which I assume we want ?)
allow a more efficient representation for lists without names

I am not sure whether this is fully necessary now and maybe there is a better way to approach this. What do you think?

dgkf · 2024-09-21T16:03:06Z

One idea I had is whether we maybe want to add a NamedSubsets variant to the RepType enum.

Let me make sure I'm understanding you correctly here. The proposal is to store names in representation instead of in the Obj type?

Just from the perspective of how this abstracts the problem, I feel like this sort of conflates a couple ideas. If we still want a named object to be able to behave like an unnamed object, then having two separate variants here that have largely overlapping behaviors feels like it might not be modelling the problem as effectively as it could be.

What about something like:

RepType<T> {
  Subset(CowObj<Vec<T>>, Option<..>, Subsets)
}

Where Option<..> is optional name information. This does introduce some overhead for unnamed vectors, but I feel like it models what we want a bit more directly.

sebffischer · 2024-09-22T03:52:05Z

Yeah, I think this is better, I will try it out.

sebffischer · 2024-09-22T11:53:01Z

Another positive side effect of refactoring this is that it forces us to implement the behavior of lists and vectors consistently.

Assigning NULL to list-subsets behaves rather weirdly in R I think:

l = list(1, 2, 3, 4)
l[1:2] = NULL
l
#> [[1]]
#> [1] 3
#> 
#> [[2]]
#> [1] 4

l[[1]] = NULL
l
#> [[1]]
#> [1] 4

While I find it somewhat reasonable that assigning NULL to a slice removes the elements (l[1:2] = NULL), I find the behavior in the second assignment (l[[1]] = NULL), where we set an individual element to NULL quite weird. Instead of removing the element by assigning it to NULL, I think it should replace the value of 3 with NULL.

Furthermore, I would even argue against the semantics of the slice-assignment to NULL (l[1:2] = NULL) , as it is inconsistent with how (atomic) vectors behave:

v = 1:3
v[1:2] = NULL
#> Error in v[1:2] = NULL: replacement has length zero

For the scenario where one wants to remove specific elements from a list/vector, we can (in the future, see #169) achieve this with negative indices.

However, for named lists / vectors there is an open question on how we want to handle removing a named element:

ln = list(a = 1, b = 2)
ln$a = NULL
ln
#> $b
#> [1] 2

One way to support this operation would be to somehow allow to replace line 2 from above with ln = ln[-"a"].
It is not totally clear to me how to achieve this in a streamlined way.
The second (and my preferred) solution would be to implement a remove function so that line 2 could be replaced with remove(ln, "a").

dgkf · 2024-09-22T16:18:35Z

Assigning NULL as a way of removing elements is definitely odd.

One way to address this would be to implement a version of [.list that operates on functions. Then you have a lot of freedom to implement selectors such as:

l <- list(a = 1, b = 2, c = 3)
l[not("b")]

where

not <- function(not_names) {
  function(x) x[!names(x) %in% names(not_names)]
}

Then

`[.list` <- function(x, subset) {
  UseMethod("[.list", subset)
}

`[.list.function` <- function(x, subset) {
  subset(x)
}

I generally like this style, where things that are generic over higher-order functions are used to provide functionality as opposed to bespoke syntax. They open up a lot of opportunities for more composable and expressive selectors.

So maybe just to tie it back to this work, I think it's important to first and foremost implement what is most reasonable to the underlying data representation in the internals. It's the role of the standard library or other packages to add convenience syntax.

sebffischer · 2024-10-06T15:45:46Z

superseded by: #180

sebffischer added 2 commits August 4, 2024 12:25

wip

f1fefbc

it compiles again, yeah

4011ab1

sebffischer marked this pull request as draft August 5, 2024 08:47

sebffischer added 2 commits August 5, 2024 11:01

two down

fca82a6

another 3 down

0f97435

...

10304f4

sebffischer added 2 commits September 21, 2024 16:40

cleanup

1df8607

gst

cbe46c1

silence clippy

65e4a5a

sebffischer mentioned this pull request Sep 27, 2024

Bug when accessing nested list #179

Closed

sebffischer mentioned this pull request Oct 6, 2024

Allow to pass functions as a subset #195

Open

sebffischer closed this Oct 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: Refactor `List` to use `Rep<Obj>` internally #172

WIP: Refactor `List` to use `Rep<Obj>` internally #172

Uh oh!

sebffischer commented Aug 5, 2024 •

edited

Loading

Uh oh!

sebffischer commented Aug 5, 2024

Uh oh!

dgkf commented Aug 5, 2024

Uh oh!

sebffischer commented Sep 9, 2024

Uh oh!

sebffischer commented Sep 21, 2024

Uh oh!

dgkf commented Sep 21, 2024

Uh oh!

sebffischer commented Sep 22, 2024

Uh oh!

sebffischer commented Sep 22, 2024

Uh oh!

dgkf commented Sep 22, 2024 •

edited

Loading

Uh oh!

sebffischer commented Oct 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP: Refactor List to use Rep<Obj> internally #172

WIP: Refactor List to use Rep<Obj> internally #172

Uh oh!

Conversation

sebffischer commented Aug 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sebffischer commented Aug 5, 2024

Uh oh!

dgkf commented Aug 5, 2024

Uh oh!

sebffischer commented Sep 9, 2024

Uh oh!

sebffischer commented Sep 21, 2024

Uh oh!

dgkf commented Sep 21, 2024

Uh oh!

sebffischer commented Sep 22, 2024

Uh oh!

sebffischer commented Sep 22, 2024

Uh oh!

dgkf commented Sep 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sebffischer commented Oct 6, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

WIP: Refactor `List` to use `Rep<Obj>` internally #172

WIP: Refactor `List` to use `Rep<Obj>` internally #172

sebffischer commented Aug 5, 2024 •

edited

Loading

dgkf commented Sep 22, 2024 •

edited

Loading