Coerce lists and data.frames to data.table by reference
setDT.Rd
In data.table
parlance, all set*
functions change their input by reference. That is, no copy is made at all, other than temporary working memory, which is as large as one column. The only other data.table
operator that modifies input by reference is :=
. Check out the See Also
section below for other set*
function data.table
provides.
setDT
converts lists (both named and unnamed) and data.frames to data.tables by reference. This feature was requested on Stackoverflow.
Arguments
- x
A named or unnamed
list
,data.frame
ordata.table
.- keep.rownames
For
data.frame
s,TRUE
retains thedata.frame
's row names under a new columnrn
.keep.rownames = "id"
names the column"id"
instead.- key
Character vector of one or more column names which is passed to
setkeyv
.- check.names
Just as
check.names
indata.frame
.
Details
When working on large list
s or data.frame
s, it might be both time- and memory-consuming to convert them to a data.table
using as.data.table(.)
, which will make a complete copy of the input object before converting it to a data.table
. setDT
takes care of this issue by converting any list
(named or unnamed, data.frame or not) by reference instead. That is, the input object is modified in place with no copy.
This should come with low overhead, but note that setDT
does check that the input is valid by looking for inconsistent input lengths and inadmissible column types (e.g. matrix).
Value
The input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., setDT(X)[, sum(B), by=A]
. If you require a copy, take a copy first (using DT2 = copy(DT)
). See ?copy
.
See also
data.table
, as.data.table
, setDF
, copy
, setkey
, setcolorder
, setattr
, setnames
, set
, :=
, setorder
,
See the FAQ vignette: vignette("datatable-faq", package = "data.table")
.
Examples
set.seed(45L)
X = data.frame(
A=sample(3, 10, TRUE),
B=sample(letters[1:3], 10, TRUE),
C=sample(10))
# Convert X to data.table by reference and
# get the frequency of each "A,B" combination
setDT(X)[, .N, by=.(A,B)]
#> A B N
#> <int> <char> <int>
#> 1: 1 b 1
#> 2: 3 a 3
#> 3: 2 a 1
#> 4: 3 c 1
#> 5: 2 c 1
#> 6: 3 b 2
#> 7: 1 c 1
# convert list to data.table
# autofill names
X = list(1:4, letters[1:4])
setDT(X)
# don't provide names
X = list(a=1:4, letters[1:4])
setDT(X, FALSE)
# setkey directly
X = list(a = 4:1, b=runif(4))
setDT(X, key="a")[]
#> Key: <a>
#> a b
#> <int> <num>
#> 1: 1 0.3396133
#> 2: 2 0.5762478
#> 3: 3 0.4272740
#> 4: 4 0.5409554
# check.names argument
X = list(a=1:5, a=6:10)
setDT(X, check.names=TRUE)[]
#> a a.1
#> <int> <int>
#> 1: 1 6
#> 2: 2 7
#> 3: 3 8
#> 4: 4 9
#> 5: 5 10
# Example demonstrating setDT after loading from RDS
rds_file = tempfile(fileext = ".rds")
X = data.table(a = 1:5, b = letters[1:5])
saveRDS(X, rds_file)
X_loaded = readRDS(rds_file)
setDT(X_loaded) # restore internal data.table attributes
print(X_loaded)
#> a b
#> <int> <char>
#> 1: 1 a
#> 2: 2 b
#> 3: 3 c
#> 4: 4 d
#> 5: 5 e
unlink(rds_file)