Skip to contents

Creates a data.table for use in i in a [.data.table join.

Usage

# DT[J(...)]                          # J() only for use inside DT[...]
# DT[.(...)]                          # .() only for use inside DT[...]
# DT[list(...)]                       # same; .(), list() and J() are identical
SJ(...)                             # DT[SJ(...)]
CJ(..., sorted=TRUE, unique=FALSE)  # DT[CJ(...)]

Arguments

...

Each argument is a vector. Generally each vector is the same length, but if they are not then the usual silent recycling is applied.

sorted

logical. Should setkey() be called on all the columns in the order they were passed to CJ?

unique

logical. When TRUE, only unique values of each vectors are used (automatically).

Details

SJ and CJ are convenience functions to create a data.table to be used in i when performing a data.table 'query' on x.

x[data.table(id)] is the same as x[J(id)] but the latter is more readable. Identical alternatives are x[list(id)] and x[.(id)].

When using a join table in i, x must either be keyed or the on argument be used to indicate the columns in x and i which should be joined. See [.data.table.

Value

  • J : the same result as calling list, for which J is a direct alias.

    SJ : Sorted Join. The same value as J() but additionally setkey() is called on all columns in the order they were passed to SJ. For efficiency, to invoke a binary merge rather than a repeated binary full search for each row of i.

    CJ : Cross Join. A data.table is formed from the cross product of the vectors. For example, CJ on 10 ids and 100 dates, returns a 1000 row table containing all dates for all ids. If sorted = TRUE (default), setkey() is called on all columns in the order they were passed in to CJ. If sorted = FALSE, the result is unkeyed and input order is retained.

Examples

DT = data.table(A=5:1, B=letters[5:1])
setkey(DT, B)   # reorders table and marks it sorted
DT[J("b")]      # returns the 2nd row
#> Key: <B>
#>        A      B
#>    <int> <char>
#> 1:     2      b
DT[list("b")]   # same
#> Key: <B>
#>        A      B
#>    <int> <char>
#> 1:     2      b
DT[.("b")]      # same using the dot alias for list
#> Key: <B>
#>        A      B
#>    <int> <char>
#> 1:     2      b

# CJ usage examples
CJ(c(5, NA, 1), c(1, 3, 2))                 # sorted and keyed data.table
#> Key: <V1, V2>
#>       V1    V2
#>    <num> <num>
#> 1:    NA     1
#> 2:    NA     2
#> 3:    NA     3
#> 4:     1     1
#> 5:     1     2
#> 6:     1     3
#> 7:     5     1
#> 8:     5     2
#> 9:     5     3
do.call(CJ, list(c(5, NA, 1), c(1, 3, 2)))  # same as above
#> Key: <V1, V2>
#>       V1    V2
#>    <num> <num>
#> 1:    NA     1
#> 2:    NA     2
#> 3:    NA     3
#> 4:     1     1
#> 5:     1     2
#> 6:     1     3
#> 7:     5     1
#> 8:     5     2
#> 9:     5     3
CJ(c(5, NA, 1), c(1, 3, 2), sorted=FALSE)   # same order as input, unkeyed
#>       V1    V2
#>    <num> <num>
#> 1:     5     1
#> 2:     5     3
#> 3:     5     2
#> 4:    NA     1
#> 5:    NA     3
#> 6:    NA     2
#> 7:     1     1
#> 8:     1     3
#> 9:     1     2
# use for 'unique=' argument
x = c(1, 1, 2)
y = c(4, 6, 4)
CJ(x, y)              # output columns are automatically named 'x' and 'y'
#> Key: <x, y>
#>        x     y
#>    <num> <num>
#> 1:     1     4
#> 2:     1     4
#> 3:     1     4
#> 4:     1     4
#> 5:     1     6
#> 6:     1     6
#> 7:     2     4
#> 8:     2     4
#> 9:     2     6
CJ(x, y, unique=TRUE) # unique(x) and unique(y) are computed automatically
#> Key: <x, y>
#>        x     y
#>    <num> <num>
#> 1:     1     4
#> 2:     1     6
#> 3:     2     4
#> 4:     2     6
CJ(x, y, sorted = FALSE) # retain input order for y
#>        x     y
#>    <num> <num>
#> 1:     1     4
#> 2:     1     6
#> 3:     1     4
#> 4:     1     4
#> 5:     1     6
#> 6:     1     4
#> 7:     2     4
#> 8:     2     6
#> 9:     2     4