-
Notifications
You must be signed in to change notification settings - Fork 21
Description
This issue is a follow up to this email on the bioc-devel mailing list.
When merging DFrame instances, the *List types are lost:
The following two instances have NumericList columns (y and z)
> d1 <- DataFrame(x = letters[1:3], y = List(1, 1:2, 1:3))
> d2 <- DataFrame(x = letters[1:3], z = List(1:3, 1:2, 1))That are however converted to list when merged
> merge(d1, d2, by = "x")
## DataFrame with 3 rows and 3 columns
## x y z
## <character> <list> <list>
## 1 a 1 1,2,3
## 2 b 1,2 1,2
## 3 c 1,2,3 1
I would be happy to help out with some guidance. @lawremi already mentioned
There's an opportunity to implement faster matching than
base::merge(), using stuff likematchIntegerQuads(),findMatches(), andgrouping().
grouping()can be really fast for character vectors, since it takes advantage of string internalization. For example, let's say you're merging on three character vector keys. Concatenate the keys of 'y' onto they keys of 'x'. Then callgrouping(k1, k2, k3)and you effectively have a matching. Should be way faster than thepaste()approach used bybase::merge(). Would be interesting to see.
I'll have a look at these functions and report back here if I have any questions or lead.