how cSplit() treats multiple splitCols when they contain different number of fields

Here is example table:

``` r
dt1 <- fread("V1 V2       V3
              x  xA;xB;xC x1;x2;x3
              y  yD       y1
              z  zF;zG    z1")
```

and I want to split it by both `V2` and `V3` columns. You can see that the last record is "wrong": `V2` has 2 values while `V3` has only one. And that how `cSplit()` treats those cases:

``` r
# with default arguments:
cSplit(dt1, splitCols = c('V2', 'V3'), sep=';', direction = 'long')
#    V1 V2 V3
#1:  x xA x1
#2:  x xB x2
#3:  x xC x3
#4:  y yD y1
#5:  y NA NA
#6:  y NA NA
#7:  z zF z1
#8:  z zG NA
#9:  z NA NA

# with `makeEqual = TRUE`:
cSplit(dt1, splitCols = c('V2', 'V3'), sep=';', direction = 'long', makeEqual = T)
#    V1 V2 V3
#1:  x xA x1
#2:  x xB x2
#3:  x xC x3
#4:  y yD y1
#5:  y NA NA
#6:  y NA NA
#7:  z zF z1
#8:  z zG NA
#9:  z NA NA
```

So, by default it works like with `makeEqual = TRUE` while in the help it is said `Defaults to FALSE`. Then I tried with `FALSE`:

``` r
cSplit(dt1, splitCols = c('V2', 'V3'), sep=';', direction = 'long', makeEqual = F)
# Warning in `[.data.table`(indt, , `:=`(eval(splitCols), lapply(X, function(x) { :
#     Supplied 5 items to be assigned to 6 items of column 'V3' (recycled leaving remainder of 1 items).
#      V1 V2 V3
#   1:  x xA x1
#   2:  x xB x2
#   3:  x xC x3
#   4:  y yD y1
#   5:  z zF z1
#   6:  z zG x1
```

It recycles `V3` elements but it takes it from another group which is kinda unexpected. I think it would be more logical to give one of the following outputs:

``` r
# without recycling, fill with NA:
#    V1 V2 V3
#1:  x xA x1
#2:  x xB x2
#3:  x xC x3
#4:  y yD y1
#5:  z zF z1
#6:  z zG NA

# with recycling:
#    V1 V2 V3
#1:  x xA x1
#2:  x xB x2
#3:  x xC x3
#4:  y yD y1
#5:  z zF z1
#6:  z zG z1
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how cSplit() treats multiple splitCols when they contain different number of fields #46

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

how cSplit() treats multiple splitCols when they contain different number of fields #46

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions