data.table parlance, all
set* functions change their input
by reference. That is, no copy is made at all, other than temporary
working memory, which is as large as one column. The only other
data.table operator that modifies input by reference is
Check out the
See Also section below for other
setorderv) reorders the rows of a
based on the columns (and column order) provided. It reorders the table
by reference and is therefore very memory efficient.
Note that queries like
x[order(.)] are optimised internally to use
data.table's fast order.
Also note that
data.table always reorders in "C-locale" (see Details). To sort by session locale, use
bit64::integer64 type is also supported for reordering rows of a
setorder(x, ..., na.last=FALSE) setorderv(x, cols = colnames(x), order=1L, na.last=FALSE) # optimised to use data.table's internal fast order # x[order(., na.last=TRUE)]
The columns to sort by. Do not quote column names. If
A character vector of column names of
An integer vector with only possible values of
data.table implements its own fast radix-based ordering. See the references for some exposition on the concept of radix sort.
setorder accepts unquoted column names (with names preceded with a
- sign for descending order) and reorders
by reference, for e.g.,
setorder(x, a, -b, c). We emphasize that
this means "descending" and not "negative" because the implementation simply
reverses the sort order, as opposed to sorting the opposite of the input
(which would be inefficient).
-b also works with columns of type
order, which requires
-xtfrm(y) instead (which is slow).
setorderv in turn accepts a character vector of column names and an
integer vector of column order separately.
setkey still requires and will always sort only in
ascending order, and is different from
setorder in that it additionally
na.last argument, by default, is
setorderv to be consistent with
x[order(.)] to be consistent with
x[order(.)] can have
na.last = NA as it is a subset operation
as opposed to
setorderv which reorders the data.table
data.table always reorders in "C-locale".
As a consequence, the ordering may be different to that obtained by
In English locales, for example, sorting is case-sensitive in C-locale.
c("c", "a", "B") returns
c("B", "a", "c") in
c("a", "B", "c") in
base::order. Note this makes no difference in most cases
of data; both return identical results on ids where only upper-case or lower-case letters are present (
"AB123" < "AC234"
is true in both), or on country names and other proper nouns which are consistently capitalized.
For example, neither
"America" < "Brazil" nor
"america" < "brazil" are affected since the first letter is consistently
Using C-locale makes the behaviour of sorting in
data.table more consistent across sessions and locales.
The behaviour of
base::order depends on assumptions about the locale of the R session.
In English locales,
"america" < "BRAZIL" is true by default
but false if you either type
Sys.setlocale(locale="C") or the R session has been started in a C locale
for you -- which can happen on servers/services since the locale comes from the environment the R session
was started in. By contrast,
"america" < "BRAZIL" is always
data.table regardless of the way your R session was started.
setorder results in reordering of the rows of a keyed
then its key will be set to
The input is modified by reference, and returned (invisibly) so it can be used
in compound statements; e.g.,
setorder(DT,a,-b)[, cumsum(c), by=list(a,b)].
If you require a copy, take a copy first (using
DT2 = copy(DT)). See