Data table utilities
transform.data.table.Rd
Utilities for data.table
transformation.
within
, transform
and other similar functions in data.table
are not just provided for users who expect them to work, but for non-data.table-aware packages to retain keys, for example. Hopefully the faster and more convenient data.table
syntax will be used in time. See examples.
Usage
# S3 method for data.table
transform(`_data`, ...)
# S3 method for data.table
within(data, expr, ...)
Arguments
- data, _data
data.table to be transformed.
- ...
for
transform
, Further arguments of the formtag=value
. Ignored forwithin
.- expr
expression to be evaluated within the data.table.
Details
within
is like with
, but modifications (columns changed,
added, or removed) are updated in the returned data.table.
Note that transform
will keep the key of the
data.table
provided the targets of the transform (i.e. the
columns that appear in ...) are not in the key of the data.table.
within
also retains the key provided the key columns are not touched.
Examples
DT <- data.table(a=rep(1:3, each=2), b=1:6)
DT2 <- transform(DT, c = a^2)
DT[, c:=a^2]
#> a b c
#> <int> <int> <num>
#> 1: 1 1 1
#> 2: 1 2 1
#> 3: 2 3 4
#> 4: 2 4 4
#> 5: 3 5 9
#> 6: 3 6 9
identical(DT,DT2)
#> [1] TRUE
DT2 <- within(DT, {
b <- rev(b)
c <- a*2
rm(a)
})
DT[,`:=`(b = rev(b),
c = a*2,
a = NULL)]
#> b c
#> <int> <num>
#> 1: 6 2
#> 2: 5 2
#> 3: 4 4
#> 4: 3 4
#> 5: 2 6
#> 6: 1 6
identical(DT,DT2)
#> [1] TRUE
DT$d = ave(DT$b, DT$c, FUN=max) # copies entire DT, even if it is 10GB in RAM
DT = DT[, transform(.SD, d=max(b)), by="c"] # same, but even worse as .SD is copied for each group
DT[, d:=max(b), by="c"] # same result, but much faster, shorter and scales
#> c b d
#> <num> <int> <int>
#> 1: 2 6 6
#> 2: 2 5 6
#> 3: 4 4 4
#> 4: 4 3 4
#> 5: 6 2 2
#> 6: 6 1 2
# Multiple update by group. Convenient, fast, scales and easy to read.
DT[, `:=`(minb = min(b),
meanb = mean(b),
bplusd = sum(b+d)), by=c%/%5]
#> c b d minb meanb bplusd
#> <num> <int> <int> <int> <num> <int>
#> 1: 2 6 6 3 4.5 38
#> 2: 2 5 6 3 4.5 38
#> 3: 4 4 4 3 4.5 38
#> 4: 4 3 4 3 4.5 38
#> 5: 6 2 2 1 1.5 7
#> 6: 6 1 2 1 1.5 7
DT
#> c b d minb meanb bplusd
#> <num> <int> <int> <int> <num> <int>
#> 1: 2 6 6 3 4.5 38
#> 2: 2 5 6 3 4.5 38
#> 3: 4 4 4 3 4.5 38
#> 4: 4 3 4 3 4.5 38
#> 5: 6 2 2 1 1.5 7
#> 6: 6 1 2 1 1.5 7