Fast rank
frank.Rd
Similar to base::rank
but much faster. And it accepts vectors, lists, data.frame
s or data.table
s as input. In addition to the ties.method
possibilities provided by base::rank
, it also provides ties.method="dense"
.
Like forder
, sorting is done in "C-locale"; in particular, this may affect how capital/lowercase letters are ranked. See Details on forder
for more.
bit64::integer64
type is also supported.
Arguments
- x
A vector, or list with all its elements identical in length or
data.frame
ordata.table
.- ...
Only for
list
s,data.frame
s anddata.table
s. The columns to calculate ranks based on. Do not quote column names. If...
is missing, all columns are considered by default. To sort by a column in descending order prefix"-"
, e.g.,frank(x, a, -b, c)
.-b
works whenb
is of typecharacter
as well.- cols
A
character
vector of column names (or numbers) ofx
, for which to obtain ranks.- order
An
integer
vector with only possible values of 1 and -1, corresponding to ascending and descending order. The length oforder
must be either 1 or equal to that ofcols
. Iflength(order) == 1
, it is recycled tolength(cols)
.- na.last
Control treatment of
NA
s. IfTRUE
, missing values in the data are put last; ifFALSE
, they are put first; ifNA
, they are removed; if"keep"
they are kept with rankNA
.- ties.method
A character string specifying how ties are treated, see
Details
.
Details
To be consistent with other data.table
operations, NA
s are considered identical to other NA
s (and NaN
s to other NaN
s), unlike base::rank
. Therefore, for na.last=TRUE
and na.last=FALSE
, NA
s (and NaN
s) are given identical ranks, unlike rank
.
frank
is not limited to vectors. It accepts data.table
s (and list
s and data.frame
s) as well. It accepts unquoted column names (with names preceded with a -
sign for descending order, even on character vectors), for e.g., frank(DT, a, -b, c, ties.method="first")
where a,b,c
are columns in DT
. The equivalent in frankv
is the order
argument.
In addition to the ties.method
values possible using base's rank
, it also provides another additional argument "dense"
which returns the ranks without any gaps in the ranking. See examples.
Value
A numeric vector of length equal to NROW(x)
(unless na.last = NA
, when missing values are removed). The vector is of integer type unless ties.method = "average"
when it is of double type (irrespective of ties).
Examples
# on vectors
x = c(4, 1, 4, NA, 1, NA, 4)
# NAs are considered identical (unlike base R)
# default is average
frankv(x) # na.last=TRUE
#> [1] 4.0 1.5 4.0 6.5 1.5 6.5 4.0
frankv(x, na.last=FALSE)
#> [1] 6.0 3.5 6.0 1.5 3.5 1.5 6.0
# ties.method = min
frankv(x, ties.method="min")
#> [1] 3 1 3 6 1 6 3
# ties.method = dense
frankv(x, ties.method="dense")
#> [1] 2 1 2 3 1 3 2
# on data.table
DT = data.table(x, y=c(1, 1, 1, 0, NA, 0, 2))
frankv(DT, cols="x") # same as frankv(x) from before
#> [1] 4.0 1.5 4.0 6.5 1.5 6.5 4.0
frankv(DT, cols="x", na.last="keep")
#> [1] 4.0 1.5 4.0 NA 1.5 NA 4.0
frankv(DT, cols="x", ties.method="dense", na.last=NA)
#> [1] 2 1 2 1 2
frank(DT, x, ties.method="dense", na.last=NA) # equivalent of above using frank
#> [1] 2 1 2 1 2
# on both columns
frankv(DT, ties.method="first", na.last="keep")
#> [1] 2 1 3 NA NA NA 4
frank(DT, ties.method="first", na.last="keep") # equivalent of above using frank
#> [1] 2 1 3 NA NA NA 4
# order argument
frank(DT, x, -y, ties.method="first")
#> [1] 4 1 5 6 2 7 3
# equivalent of above using frankv
frankv(DT, order=c(1L, -1L), ties.method="first")
#> [1] 4 1 5 6 2 7 3