data.table.Rd
data.table
inherits from data.frame
. It offers fast and memory efficient: file reader and writer, aggregations, updates, equi, nonequi, rolling, range and interval joins, in a short and flexible syntax, for faster development.
It is inspired by A[B]
syntax in R where A
is a matrix and B
is a 2column matrix. Since a data.table
is a data.frame
, it is compatible with R functions and packages that accept only data.frame
s.
Type vignette(package="data.table")
to get started. The Introduction to data.table vignette introduces data.table
's x[i, j, by]
syntax and is a good place to start. If you have read the vignettes and the help page below, please read the data.table support guide.
Please check the homepage for up to the minute live NEWS.
Tip: one of the quickest ways to learn the features is to type example(data.table)
and study the output at the prompt.
data.table(..., keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFactors=FALSE) # S3 method for data.table [(x, i, j, by, keyby, with = TRUE, nomatch = NA, mult = "all", roll = FALSE, rollends = if (roll=="nearest") c(TRUE,TRUE) else if (roll>=0) c(FALSE,TRUE) else c(TRUE,FALSE), which = FALSE, .SDcols, verbose = getOption("datatable.verbose"), # default: FALSE allow.cartesian = getOption("datatable.allow.cartesian"), # default: FALSE drop = NULL, on = NULL, env = NULL)
...  Just as 

keep.rownames  If 
check.names  Just as 
key  Character vector of one or more column names which is passed to 
stringsAsFactors  Logical (default is 
x  A 
i  Integer, logical or character vector, single column numeric
If
Using When the binary operator Support for nonequi join was recently implemented, which allows for other binary operators See Advanced: When 
j  When As long as The expression `.()` is a shorthand alias to When Advanced: Advanced: When Advanced: Columns of 
by  Column names are seen as if they are variables (as in
Advanced: When Advanced: In the 
keyby  Same as 
with  By default When 
nomatch  When a row in 
mult  When 
roll  When
Rolling joins apply to the last join column, generally a date but can be any variable. It is particularly fast using a modified binary search. A common idiom is to select a contemporaneous regular time series ( 
rollends  A logical vector length 2 (a single logical is recycled) indicating whether values falling before the first value or after the last value for a group should be rolled as well.
When 
which 

.SDcols  Specifies the columns of For convenient interactive use, the form Inversion (column dropping instead of keeping) can be accomplished be prepending the argument with Finally, you can filter columns to include in 
verbose 

allow.cartesian 

drop  Never used by 
on  Indicate which columns in
See examples as well as 
env  List or an environment, passed to 
data.table
builds on base R functionality to reduce 2 types of time:
programming time (easier to write, read, debug and maintain), and
compute time (fast and memory efficient).
The general form of data.table syntax is:
DT[ i, j, by ] # + extra arguments      > grouped by what?  > what to do? > on which rows?
The way to read this out loud is: "Take DT
, subset rows by i
, then compute j
grouped by by
. Here are some basic usage examples expanding on this definition. See the vignette (and examples) for working examples.
X[, a] # return col 'a' from X as vector. If not found, search in parent frame. X[, .(a)] # same as above, but return as a data.table. X[, sum(a)] # return sum(a) as a vector (with same scoping rules as above) X[, .(sum(a)), by=c] # get sum(a) grouped by 'c'. X[, sum(a), by=c] # same as above, .() can be omitted in j and by on single expression for convenience X[, sum(a), by=c:f] # get sum(a) grouped by all columns in between 'c' and 'f' (both inclusive) X[, sum(a), keyby=b] # get sum(a) grouped by 'b', and sort that result by the grouping column 'b' X[, sum(a), by=b, keyby=TRUE] # same order as above, but using sorting flag X[, sum(a), by=b][order(b)] # same order as above, but by chaining compound expressions X[c>1, sum(a), by=c] # get rows where c>1 is TRUE, and on those rows, get sum(a) grouped by 'c' X[Y, .(a, b), on="c"] # get rows where Y$c == X$c, and select columns 'X$a' and 'X$b' for those rows X[Y, .(a, i.a), on="c"] # get rows where Y$c == X$c, and then select 'X$a' and 'Y$a' (=i.a) X[Y, sum(a*i.a), on="c" by=.EACHI] # for *each* 'Y$c', get sum(a*i.a) on matching rows in 'X$c' X[, plot(a, b), by=c] # j accepts any expression, generates plot for each group and returns no data # see ?assign to add/update/delete columns by reference using the same consistent interface
A data.table
query may be invoked on a data.frame
using functional form DT(...)
, see examples. The class of the input is retained.
A data.table
is a list
of vectors, just like a data.frame
. However :
it never has or uses rownames. Rownames based indexing can be done by setting a key of one or more columns or done adhoc using the on
argument (now preferred).
it has enhanced functionality in [.data.table
for fast joins of keyed tables, fast aggregation, fast last observation carried forward (LOCF) and fast add/modify/delete of columns by reference with no copy at all.
See the see also
section for the several other methods that are available for operating on data.tables efficiently.
https://rdatatable.com (data.table
homepage)
https://en.wikipedia.org/wiki/Binary_search
If keep.rownames
or check.names
are supplied they must be written in full because R does not allow partial argument names after ...
. For example, data.table(DF, keep=TRUE)
will create a
column called keep
containing TRUE
and this is correct behaviour; data.table(DF, keep.rownames=TRUE)
was intended.
POSIXlt
is not supported as a column type because it uses 40 bytes to store a single datetime. They are implicitly converted to POSIXct
type with warning. You may also be interested in IDateTime
instead; it has methods to convert to and from POSIXlt
.
specialsymbols
, data.frame
, [.data.frame
, as.data.table
, setkey
, setorder
, setDT
, setDF
, J
, SJ
, CJ
, merge.data.table
, tables
, test.data.table
, IDateTime
, unique.data.table
, copy
, :=
, setalloccol
, truelength
, rbindlist
, setNumericRounding
, datatableoptimize
, fsetdiff
, funion
, fintersect
, fsetequal
, anyDuplicated
, uniqueN
, rowid
, rleid
, na.omit
, frank
if (FALSE) { example(data.table) # to run these examples yourself } DF = data.frame(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9) DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9) DF#> x y v #> 1 b 1 1 #> 2 b 3 2 #> 3 b 6 3 #> 4 a 1 4 #> 5 a 3 5 #> 6 a 6 6 #> 7 c 1 7 #> 8 c 3 8 #> 9 c 6 9DT#> x y v #> 1: b 1 1 #> 2: b 3 2 #> 3: b 6 3 #> 4: a 1 4 #> 5: a 3 5 #> 6: a 6 6 #> 7: c 1 7 #> 8: c 3 8 #> 9: c 6 9#> [1] TRUE#> [1] TRUE#> [1] TRUE#> [1] TRUE#> [1] TRUE#> NAME NROW NCOL MB COLS KEY #> 1: DT 9 3 0 x,y,v #> Total: 0MB# basic row subset operations DT[2] # 2nd row#> x y v #> 1: b 3 2DT[3:2] # 3rd and 2nd row#> x y v #> 1: b 6 3 #> 2: b 3 2#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6 #> 4: b 1 1 #> 5: b 3 2 #> 6: b 6 3 #> 7: c 1 7 #> 8: c 3 8 #> 9: c 6 9#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6 #> 4: b 1 1 #> 5: b 3 2 #> 6: b 6 3 #> 7: c 1 7 #> 8: c 3 8 #> 9: c 6 9DT[y>2] # all rows where DT$y > 2#> x y v #> 1: b 3 2 #> 2: b 6 3 #> 3: a 3 5 #> 4: a 6 6 #> 5: c 3 8 #> 6: c 6 9DT[y>2 & v>5] # compound logical expressions#> x y v #> 1: a 6 6 #> 2: c 3 8 #> 3: c 6 9DT[!2:4] # all rows other than 2:4#> x y v #> 1: b 1 1 #> 2: a 3 5 #> 3: a 6 6 #> 4: c 1 7 #> 5: c 3 8 #> 6: c 6 9DT[(2:4)] # same#> x y v #> 1: b 1 1 #> 2: a 3 5 #> 3: a 6 6 #> 4: c 1 7 #> 5: c 3 8 #> 6: c 6 9# selectcompute columns data.table way DT[, v] # v column (as vector)#> [1] 1 2 3 4 5 6 7 8 9#> v #> 1: 1 #> 2: 2 #> 3: 3 #> 4: 4 #> 5: 5 #> 6: 6 #> 7: 7 #> 8: 8 #> 9: 9DT[, .(v)] # same as above, .() is a shorthand alias to list()#> v #> 1: 1 #> 2: 2 #> 3: 3 #> 4: 4 #> 5: 5 #> 6: 6 #> 7: 7 #> 8: 8 #> 9: 9#> [1] 45#> V1 #> 1: 45#> sv #> 1: 45DT[, .(v, v*2)] # return two column data.table, v and v*2#> v V2 #> 1: 1 2 #> 2: 2 4 #> 3: 3 6 #> 4: 4 8 #> 5: 5 10 #> 6: 6 12 #> 7: 7 14 #> 8: 8 16 #> 9: 9 18# subset rows and selectcompute data.table way DT[2:3, sum(v)] # sum(v) over rows 2 and 3, return vector#> [1] 5#> V1 #> 1: 5#> sv #> 1: 5#> 2 3 4 5#> NULL# select columns the data.frame way DT[, 2] # 2nd column, returns a data.table always#> y #> 1: 1 #> 2: 3 #> 3: 6 #> 4: 1 #> 5: 3 #> 6: 6 #> 7: 1 #> 8: 3 #> 9: 6colNum = 2 DT[, ..colNum] # same, .. prefix conveys onelevelup in calling scope#> y #> 1: 1 #> 2: 3 #> 3: 6 #> 4: 1 #> 5: 3 #> 6: 6 #> 7: 1 #> 8: 3 #> 9: 6DT[["v"]] # same as DT[, v] but faster if called in a loop#> [1] 1 2 3 4 5 6 7 8 9# grouping operations  j and by DT[, sum(v), by=x] # ad hoc by, order of groups preserved in result#> x V1 #> 1: b 6 #> 2: a 15 #> 3: c 24#> x V1 #> 1: a 15 #> 2: b 6 #> 3: c 24#> x V1 #> 1: a 15 #> 2: b 6 #> 3: c 24#> x V1 #> 1: a 15 #> 2: b 6 #> 3: c 24# fast ad hoc row subsets (subsets as joins) DT["a", on="x"] # same as x == "a" but uses binary search (fast)#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6DT["a", on=.(x)] # same, for convenience, no need to quote every column#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6DT[.("a"), on="x"] # same#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6DT[x=="a"] # same, single "==" internally optimised to use binary search (fast)#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6DT[x!="b"  y!=3] # not yet optimized, currently vector scan subset#> x y v #> 1: b 1 1 #> 2: b 6 3 #> 3: a 1 4 #> 4: a 3 5 #> 5: a 6 6 #> 6: c 1 7 #> 7: c 3 8 #> 8: c 6 9#> x y v #> 1: b 3 2DT[.("b", 3), on=.(x, y)] # same, but using on=.()#> x y v #> 1: b 3 2#> x y v #> 1: b 1 1 #> 2: b 2 NADT[.("b", 1:2), on=.(x, y), nomatch=NULL] # no match row is not returned#> x y v #> 1: b 1 1#> x y v #> 1: b 1 1 #> 2: b 2 1DT[.("b", 1:2), on=.(x, y), roll=Inf] # nocb, nomatch row gets rolled by next row#> x y v #> 1: b 1 1 #> 2: b 2 2#> [1] 25#> x V1 #> 1: b 6 #> 2: c 24#> x V1 #> 1: b 6 #> 2: c 24#> x V1 #> 1: b 6 #> 2: c 24#> x V1 #> 1: b 6 #> 2: c 24#> x v foo #> 1: c 8 4 #> 2: b 7 2DT[X, on="x"] # right join#> x y v i.v foo #> 1: c 1 7 8 4 #> 2: c 3 8 8 4 #> 3: c 6 9 8 4 #> 4: b 1 1 7 2 #> 5: b 3 2 7 2 #> 6: b 6 3 7 2X[DT, on="x"] # left join#> x v foo y i.v #> 1: b 7 2 1 1 #> 2: b 7 2 3 2 #> 3: b 7 2 6 3 #> 4: a NA NA 1 4 #> 5: a NA NA 3 5 #> 6: a NA NA 6 6 #> 7: c 8 4 1 7 #> 8: c 8 4 3 8 #> 9: c 8 4 6 9DT[X, on="x", nomatch=NULL] # inner join#> x y v i.v foo #> 1: c 1 7 8 4 #> 2: c 3 8 8 4 #> 3: c 6 9 8 4 #> 4: b 1 1 7 2 #> 5: b 3 2 7 2 #> 6: b 6 3 7 2DT[!X, on="x"] # not join#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6#> x y v i.x foo #> 1: <NA> 8 NA c 4 #> 2: <NA> 7 NA b 2DT[X, on="y==v"] # same as above (v1.9.8+)#> x y v i.x foo #> 1: <NA> 8 NA c 4 #> 2: <NA> 7 NA b 2DT[X, on=.(y<=foo)] # NEW nonequi join (v1.9.8+)#> x y v i.x i.v #> 1: b 4 1 c 8 #> 2: b 4 2 c 8 #> 3: a 4 4 c 8 #> 4: a 4 5 c 8 #> 5: c 4 7 c 8 #> 6: c 4 8 c 8 #> 7: b 2 1 b 7 #> 8: a 2 4 b 7 #> 9: c 2 7 b 7DT[X, on="y<=foo"] # same as above#> x y v i.x i.v #> 1: b 4 1 c 8 #> 2: b 4 2 c 8 #> 3: a 4 4 c 8 #> 4: a 4 5 c 8 #> 5: c 4 7 c 8 #> 6: c 4 8 c 8 #> 7: b 2 1 b 7 #> 8: a 2 4 b 7 #> 9: c 2 7 b 7#> x y v i.x i.v #> 1: b 4 1 c 8 #> 2: b 4 2 c 8 #> 3: a 4 4 c 8 #> 4: a 4 5 c 8 #> 5: c 4 7 c 8 #> 6: c 4 8 c 8 #> 7: b 2 1 b 7 #> 8: a 2 4 b 7 #> 9: c 2 7 b 7DT[X, on=.(y>=foo)] # NEW nonequi join (v1.9.8+)#> x y v i.x i.v #> 1: b 4 3 c 8 #> 2: a 4 6 c 8 #> 3: c 4 9 c 8 #> 4: b 2 2 b 7 #> 5: b 2 3 b 7 #> 6: a 2 5 b 7 #> 7: a 2 6 b 7 #> 8: c 2 8 b 7 #> 9: c 2 9 b 7DT[X, on=.(x, y<=foo)] # NEW nonequi join (v1.9.8+)#> x y v i.v #> 1: c 4 7 8 #> 2: c 4 8 8 #> 3: b 2 1 7DT[X, .(x,y,x.y,v), on=.(x, y>=foo)] # Select x's join columns as well#> x y x.y v #> 1: c 4 6 9 #> 2: b 2 3 2 #> 3: b 2 6 3DT[X, on="x", mult="first"] # first row of each group#> x y v i.v foo #> 1: c 1 7 8 4 #> 2: b 1 1 7 2DT[X, on="x", mult="last"] # last row of each group#> x y v i.v foo #> 1: c 6 9 8 4 #> 2: b 6 3 7 2#> x V1 #> 1: c 24 #> 2: b 6#> x V1 #> 1: c 96 #> 2: b 12#> x V1 #> 1: c 192 #> 2: b 42#> x v V1 #> 1: c 8 36 #> 2: b 7 NA# setting keys kDT = copy(DT) # (deep) copy DT to kDT to work with it. setkey(kDT,x) # set a 1column key. No quotes, for convenience. setkeyv(kDT,"x") # same (v in setkeyv stands for vector) v="x" setkeyv(kDT,v) # same # key(kDT)<"x" # copies whole table, please use set* functions instead haskey(kDT) # TRUE#> [1] TRUE#> [1] "x"# fast *keyed* subsets kDT["a"] # subsetasjoin on *key* column 'x'#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6kDT["a", on="x"] # same, being explicit using 'on=' (preferred)#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6#> x V1 #> 1: b 6 #> 2: c 24# multicolumn key setkey(kDT,x,y) # 2column key setkeyv(kDT,c("x","y")) # same # fast *keyed* subsets on multicolumn key kDT["a"] # join to 1st column of key#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6kDT["a", on="x"] # on= is optional, but is preferred#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6kDT[.("a")] # same, .() is an alias for list()#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6#> x y v #> 1: a 1 4 #> 2: a 3 5 #> 3: a 6 6kDT[.("a", 3)] # join to 2 columns#> x y v #> 1: a 3 5kDT[.("a", 3:6)] # join 4 rows (2 missing)#> x y v #> 1: a 3 5 #> 2: a 4 NA #> 3: a 5 NA #> 4: a 6 6kDT[.("a", 3:6), nomatch=NULL] # remove missing#> x y v #> 1: a 3 5 #> 2: a 6 6kDT[.("a", 3:6), roll=TRUE] # locf rolling join#> x y v #> 1: a 3 5 #> 2: a 4 5 #> 3: a 5 5 #> 4: a 6 6kDT[.("a", 3:6), roll=Inf] # same as above#> x y v #> 1: a 3 5 #> 2: a 4 5 #> 3: a 5 5 #> 4: a 6 6kDT[.("a", 3:6), roll=Inf] # nocb rolling join#> x y v #> 1: a 3 5 #> 2: a 4 6 #> 3: a 5 6 #> 4: a 6 6kDT[!.("a")] # not join#> x y v #> 1: b 1 1 #> 2: b 3 2 #> 3: b 6 3 #> 4: c 1 7 #> 5: c 3 8 #> 6: c 6 9kDT[!"a"] # same#> x y v #> 1: b 1 1 #> 2: b 3 2 #> 3: b 6 3 #> 4: c 1 7 #> 5: c 3 8 #> 6: c 6 9# more on special symbols, see also ?"specialsymbols" DT[.N] # last row#> x y v #> 1: c 6 9DT[, .N] # total number of rows in DT#> [1] 9DT[, .N, by=x] # number of rows in each group#> x N #> 1: b 3 #> 2: a 3 #> 3: c 3DT[, .SD, .SDcols=x:y] # select columns 'x' through 'y'#> x y #> 1: b 1 #> 2: b 3 #> 3: b 6 #> 4: a 1 #> 5: a 3 #> 6: a 6 #> 7: c 1 #> 8: c 3 #> 9: c 6DT[ , .SD, .SDcols = !x:y] # drop columns 'x' through 'y'#> v #> 1: 1 #> 2: 2 #> 3: 3 #> 4: 4 #> 5: 5 #> 6: 6 #> 7: 7 #> 8: 8 #> 9: 9#> x v #> 1: b 1 #> 2: b 2 #> 3: b 3 #> 4: a 4 #> 5: a 5 #> 6: a 6 #> 7: c 7 #> 8: c 8 #> 9: c 9DT[, .SD[1]] # first row of all columns#> x y v #> 1: b 1 1DT[, .SD[1], by=x] # first row of 'y' and 'v' for each group in 'x'#> x y v #> 1: b 1 1 #> 2: a 1 4 #> 3: c 1 7#> x N y v #> 1: b 3 10 6 #> 2: a 3 10 15 #> 3: c 3 10 24DT[, .I[1], by=x] # row number in DT corresponding to each group#> x V1 #> 1: b 1 #> 2: a 4 #> 3: c 7DT[, grp := .GRP, by=x] # add a group counter column#> x y v grp #> 1: b 1 1 1 #> 2: b 3 2 1 #> 3: b 6 3 1 #> 4: a 1 4 2 #> 5: a 3 5 2 #> 6: a 6 6 2 #> 7: c 1 7 3 #> 8: c 3 8 3 #> 9: c 6 9 3#> list(x = "b", y = 1) #> list(x = "b", y = 3) #> list(x = "b", y = 6) #> list(x = "a", y = 1) #> list(x = "a", y = 3) #> list(x = "a", y = 6) #> list(x = "c", y = 1) #> list(x = "c", y = 3) #> list(x = "c", y = 6)#> x y x y #> 1: b 1 b 1 #> 2: b 3 b 3 #> 3: b 6 b 6 #> 4: a 1 a 1 #> 5: a 3 a 3 #> 6: a 6 a 6 #> 7: c 1 c 1 #> 8: c 3 c 3 #> 9: c 6 c 6X[, DT[.BY, y, on="x"], by=x] # join within each group#> x V1 #> 1: c 1 #> 2: c 3 #> 3: c 6 #> 4: b 1 #> 5: b 3 #> 6: b 6DT[, { # write each group to a different file fwrite(.SD, file.path(tempdir(), paste0('x=', .BY$x, '.csv'))) }, by=x]#> Empty data.table (0 rows and 1 cols): x#> [1] "downlit" "x=a.csv" "x=b.csv" "x=c.csv"#> x y v grp z #> 1: b 1 1 1 42 #> 2: b 3 2 1 42 #> 3: b 6 3 1 42 #> 4: a 1 4 2 42 #> 5: a 3 5 2 42 #> 6: a 6 6 2 42 #> 7: c 1 7 3 42 #> 8: c 3 8 3 42 #> 9: c 6 9 3 42#> x y v grp #> 1: b 1 1 1 #> 2: b 3 2 1 #> 3: b 6 3 1 #> 4: a 1 4 2 #> 5: a 3 5 2 #> 6: a 6 6 2 #> 7: c 1 7 3 #> 8: c 3 8 3 #> 9: c 6 9 3#> x y v grp #> 1: b 1 1 1 #> 2: b 3 2 1 #> 3: b 6 3 1 #> 4: a 1 42 2 #> 5: a 3 42 2 #> 6: a 6 42 2 #> 7: c 1 7 3 #> 8: c 3 8 3 #> 9: c 6 9 3#> x y v grp v2 #> 1: b 1 1 1 84 #> 2: b 3 2 1 84 #> 3: b 6 3 1 84 #> 4: a 1 42 2 NA #> 5: a 3 42 2 NA #> 6: a 6 42 2 NA #> 7: c 1 7 3 NA #> 8: c 3 8 3 NA #> 9: c 6 9 3 NA#> x y v grp v2 m #> 1: b 1 1 1 84 2 #> 2: b 3 2 1 84 2 #> 3: b 6 3 1 84 2 #> 4: a 1 42 2 NA 42 #> 5: a 3 42 2 NA 42 #> 6: a 6 42 2 NA 42 #> 7: c 1 7 3 NA 8 #> 8: c 3 8 3 NA 8 #> 9: c 6 9 3 NA 8# NB: postfix [] is shortcut to print() # advanced usage DT = data.table(x=rep(c("b","a","c"),each=3), v=c(1,1,1,2,2,1,1,2,2), y=c(1,3,6), a=1:9, b=9:1) DT[, sum(v), by=.(y%%2)] # expressions in by#> y V1 #> 1: 1 9 #> 2: 0 4#> bool V1 #> 1: 1 9 #> 2: 0 4DT[, .SD[2], by=x] # get 2nd row of each group#> x v y a b #> 1: b 1 3 2 8 #> 2: a 2 3 5 5 #> 3: c 2 3 8 2#> x v y a b #> 1: b 1 3 2 8 #> 2: b 1 6 3 7 #> 3: a 2 3 5 5 #> 4: a 1 6 6 4 #> 5: c 2 3 8 2 #> 6: c 2 6 9 1#> x v y a b #> 1: b 3 10 6 24 #> 2: a 5 10 15 15 #> 3: c 5 10 24 6#> x v y a b #> 1: b 1 1 1 9 #> 2: a 1 6 6 4 #> 3: c 1 1 7 3#> x y MySum MyMin MyMax #> 1: b 1 2 1 1 #> 2: b 0 1 1 1 #> 3: a 1 4 2 2 #> 4: a 0 1 1 1 #> 5: c 1 3 1 2 #> 6: c 0 2 2 2DT[, .(a = .(a), b = .(b)), by=x] # list columns#> x a b #> 1: b 1,2,3 9,8,7 #> 2: a 4,5,6 6,5,4 #> 3: c 7,8,9 3,2,1#> x seq #> 1: b 1 #> 2: b 2 #> 3: b 3 #> 4: b 4 #> 5: b 5 #> 6: b 6 #> 7: b 7 #> 8: b 8 #> 9: b 9 #> 10: a 4 #> 11: a 5 #> 12: a 6 #> 13: c 7 #> 14: c 6 #> 15: c 5 #> 16: c 4 #> 17: c 3#> x V1 #> 1: b 3 #> 2: a 5 #> 3: c 5#> x V1 #> 1: a 5 #> 2: c 5 #> 3: b 3#> x N v y a b #> 1: b 3 3 10 6 24 #> 2: a 3 5 10 15 15 #> 3: c 3 5 10 24 6DT[, {tmp < mean(y); .(a = atmp, b = btmp) }, by=x] # anonymous lambda in 'j', j accepts any valid#> x a b #> 1: b 2.3333333 5.6666667 #> 2: b 1.3333333 4.6666667 #> 3: b 0.3333333 3.6666667 #> 4: a 0.6666667 2.6666667 #> 5: a 1.6666667 1.6666667 #> 6: a 2.6666667 0.6666667 #> 7: c 3.6666667 0.3333333 #> 8: c 4.6666667 1.3333333 #> 9: c 5.6666667 2.3333333# expression. TO REMEMBER: every element of # the list becomes a column in result. pdf("new.pdf") DT[, plot(a,b), by=x] # can also plot in 'j'#> Empty data.table (0 rows and 1 cols): x#> agg_png #> 2#> [1] TRUE# using rleid, get max(y) and min of all cols in .SDcols for each consecutive run of 'v' DT[, c(.(y=max(y)), lapply(.SD, min)), by=rleid(v), .SDcols=v:b]#> rleid y v y a b #> 1: 1 6 1 1 1 7 #> 2: 2 3 2 1 4 5 #> 3: 3 6 1 1 6 3 #> 4: 4 6 2 3 8 1# functional query DT(...) #dontrun to pass R CMD check prior to R 4.1.0 when > was added # an if getRVersion()>"4.1.0" still has its code parsed if (FALSE) { mtcars > DT(mpg>20, .(mean_hp=mean(hp)), by=cyl) } # Support guide and links: # https://github.com/Rdatatable/data.table/wiki/Support if (FALSE) { if (interactive()) { vignette(package="data.table") # 9 vignettes test.data.table() # 6,000 tests # keep up to date with latest stable version on CRAN update.packages() # get the latest devel version update.dev.pkg() # read more at: # https://github.com/Rdatatable/data.table/wiki/Installation } }