Creates a join data.table
J.Rd
Creates a data.table
for use in i
in a [.data.table
join.
Usage
# DT[J(...)] # J() only for use inside DT[...]
# DT[.(...)] # .() only for use inside DT[...]
# DT[list(...)] # same; .(), list() and J() are identical
SJ(...) # DT[SJ(...)]
CJ(..., sorted=TRUE, unique=FALSE) # DT[CJ(...)]
Arguments
- ...
Each argument is a vector. Generally each vector is the same length, but if they are not then the usual silent recycling is applied.
- sorted
logical. Should
setkey()
be called on all the columns in the order they were passed toCJ
?- unique
logical. When
TRUE
, only unique values of each vectors are used (automatically).
Details
SJ
and CJ
are convenience functions to create a data.table
to be used in i
when performing a data.table
'query' on x
.
x[data.table(id)]
is the same as x[J(id)]
but the latter is more readable. Identical alternatives are x[list(id)]
and x[.(id)]
.
When using a join table in i
, x
must either be keyed or the on
argument be used to indicate the columns in x
and i
which should be joined. See [.data.table
.
Value
J
: the same result as callinglist
, for whichJ
is a direct alias.SJ
: Sorted Join. The same value asJ()
but additionallysetkey()
is called on all columns in the order they were passed toSJ
. For efficiency, to invoke a binary merge rather than a repeated binary full search for each row ofi
.CJ
: Cross Join. Adata.table
is formed from the cross product of the vectors. For example,CJ
on 10 ids and 100 dates, returns a 1000 row table containing all dates for all ids. Ifsorted = TRUE
(default),setkey()
is called on all columns in the order they were passed in toCJ
. Ifsorted = FALSE
, the result is unkeyed and input order is retained.
Examples
DT = data.table(A=5:1, B=letters[5:1])
setkey(DT, B) # reorders table and marks it sorted
DT[J("b")] # returns the 2nd row
#> Key: <B>
#> A B
#> <int> <char>
#> 1: 2 b
DT[list("b")] # same
#> Key: <B>
#> A B
#> <int> <char>
#> 1: 2 b
DT[.("b")] # same using the dot alias for list
#> Key: <B>
#> A B
#> <int> <char>
#> 1: 2 b
# CJ usage examples
CJ(c(5, NA, 1), c(1, 3, 2)) # sorted and keyed data.table
#> Key: <V1, V2>
#> V1 V2
#> <num> <num>
#> 1: NA 1
#> 2: NA 2
#> 3: NA 3
#> 4: 1 1
#> 5: 1 2
#> 6: 1 3
#> 7: 5 1
#> 8: 5 2
#> 9: 5 3
do.call(CJ, list(c(5, NA, 1), c(1, 3, 2))) # same as above
#> Key: <V1, V2>
#> V1 V2
#> <num> <num>
#> 1: NA 1
#> 2: NA 2
#> 3: NA 3
#> 4: 1 1
#> 5: 1 2
#> 6: 1 3
#> 7: 5 1
#> 8: 5 2
#> 9: 5 3
CJ(c(5, NA, 1), c(1, 3, 2), sorted=FALSE) # same order as input, unkeyed
#> V1 V2
#> <num> <num>
#> 1: 5 1
#> 2: 5 3
#> 3: 5 2
#> 4: NA 1
#> 5: NA 3
#> 6: NA 2
#> 7: 1 1
#> 8: 1 3
#> 9: 1 2
# use for 'unique=' argument
x = c(1, 1, 2)
y = c(4, 6, 4)
CJ(x, y) # output columns are automatically named 'x' and 'y'
#> Key: <x, y>
#> x y
#> <num> <num>
#> 1: 1 4
#> 2: 1 4
#> 3: 1 4
#> 4: 1 4
#> 5: 1 6
#> 6: 1 6
#> 7: 2 4
#> 8: 2 4
#> 9: 2 6
CJ(x, y, unique=TRUE) # unique(x) and unique(y) are computed automatically
#> Key: <x, y>
#> x y
#> <num> <num>
#> 1: 1 4
#> 2: 1 6
#> 3: 2 4
#> 4: 2 6
CJ(x, y, sorted = FALSE) # retain input order for y
#> x y
#> <num> <num>
#> 1: 1 4
#> 2: 1 6
#> 3: 1 4
#> 4: 1 4
#> 5: 1 6
#> 6: 1 4
#> 7: 2 4
#> 8: 2 6
#> 9: 2 4