Skip to content

merging DFrames #73

@lgatto

Description

@lgatto

This issue is a follow up to this email on the bioc-devel mailing list.

When merging DFrame instances, the *List types are lost:

The following two instances have NumericList columns (y and z)

> d1 <- DataFrame(x = letters[1:3], y = List(1, 1:2, 1:3))
> d2 <- DataFrame(x = letters[1:3], z = List(1:3, 1:2, 1))

That are however converted to list when merged

> merge(d1, d2, by = "x")
## DataFrame with 3 rows and 3 columns
##             x      y      z
##   <character> <list> <list>
## 1           a      1  1,2,3
## 2           b    1,2    1,2
## 3           c  1,2,3      1

I would be happy to help out with some guidance. @lawremi already mentioned

There's an opportunity to implement faster matching than base::merge(), using stuff like matchIntegerQuads(), findMatches(), and grouping().

grouping() can be really fast for character vectors, since it takes advantage of string internalization. For example, let's say you're merging on three character vector keys. Concatenate the keys of 'y' onto they keys of 'x'. Then call grouping(k1, k2, k3) and you effectively have a matching. Should be way faster than the paste() approach used by base::merge(). Would be interesting to see.

I'll have a look at these functions and report back here if I have any questions or lead.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions