split {data.table} | R Documentation |

Split method for data.table. Faster and more flexible. Be aware that processing list of data.tables will be generally much slower than manipulation in single data.table by group using `by`

argument, read more on `data.table`

.

```
## S3 method for class 'data.table'
split(x, f, drop = FALSE,
by, sorted = FALSE, keep.by = TRUE, flatten = TRUE,
..., verbose = getOption("datatable.verbose"))
```

`x` |
data.table |

`f` |
factor or list of factors. Same as |

`drop` |
logical. Default |

`by` |
character vector. Column names on which split should be made. For |

`sorted` |
When default |

`keep.by` |
logical default |

`flatten` |
logical default |

`...` |
passed to data.frame way of processing when using |

`verbose` |
logical default |

Argument `f`

is just for consistency in usage to data.frame method. Recommended is to use `by`

argument instead, it will be faster, more flexible, and by default will preserve order according to order in data.

List of `data.table`

s. If using `flatten`

FALSE and `length(by) > 1L`

then recursively nested lists having `data.table`

s as leafs of grouping according to `by`

argument.

```
set.seed(123)
DT = data.table(x1 = rep(letters[1:2], 6),
x2 = rep(letters[3:5], 4),
x3 = rep(letters[5:8], 3),
y = rnorm(12))
DT = DT[sample(.N)]
DF = as.data.frame(DT)
# split consistency with data.frame: `x, f, drop`
all.equal(
split(DT, list(DT$x1, DT$x2)),
lapply(split(DF, list(DF$x1, DF$x2)), setDT)
)
# nested list using `flatten` arguments
split(DT, by=c("x1", "x2"))
split(DT, by=c("x1", "x2"), flatten=FALSE)
# dealing with factors
fdt = DT[, c(lapply(.SD, as.factor), list(y=y)), .SDcols=x1:x3]
fdf = as.data.frame(fdt)
sdf = split(fdf, list(fdf$x1, fdf$x2))
all.equal(
split(fdt, by=c("x1", "x2"), sorted=TRUE),
lapply(sdf[sort(names(sdf))], setDT)
)
# factors having unused levels, drop FALSE, TRUE
fdt = DT[, .(x1 = as.factor(c(as.character(x1), "c"))[-13L],
x2 = as.factor(c("a", as.character(x2)))[-1L],
x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)],
y = y)]
fdf = as.data.frame(fdt)
sdf = split(fdf, list(fdf$x1, fdf$x2))
all.equal(
split(fdt, by=c("x1", "x2"), sorted=TRUE),
lapply(sdf[sort(names(sdf))], setDT)
)
sdf = split(fdf, list(fdf$x1, fdf$x2), drop=TRUE)
all.equal(
split(fdt, by=c("x1", "x2"), sorted=TRUE, drop=TRUE),
lapply(sdf[sort(names(sdf))], setDT)
)
```

[Package *data.table* version 1.14.7 Index]