Generate run-length type group id
rleid.Rd
A convenience function for generating a run-length type id column to be used in grouping operations. It accepts atomic vectors, lists, data.frames or data.tables as input.
Usage
rleid(..., prefix=NULL)
rleidv(x, cols=seq_along(x), prefix=NULL)
Arguments
- x
A vector, list, data.frame or data.table.
- ...
A sequence of numeric, integer64, character or logical vectors, all of same length. For interactive use.
- cols
Only meaningful for lists, data.frames or data.tables. A character vector of column names (or numbers) of x.
- prefix
Either
NULL
(default) or a character vector of length=1 which is prefixed to the row ids, returning a character vector (instead of an integer vector).
Details
At times aggregation (or grouping) operations need to be performed where consecutive runs of identical values should belong to the same group (See rle
). The use for such a function has come up repeatedly on StackOverflow, see the See Also
section. This function allows to generate "run-length" groups directly.
rleid
is designed for interactive use and accepts a sequence of vectors as arguments. For programming, rleidv
might be more useful.
Value
When prefix = NULL
, an integer vector with same length as NROW(x)
, else a character vector with the value in prefix
prefixed to the ids obtained.
Examples
DT = data.table(grp=rep(c("A", "B", "C", "A", "B"), c(2,2,3,1,2)), value=1:10)
rleid(DT$grp) # get run-length ids
#> [1] 1 1 2 2 3 3 3 4 5 5
rleidv(DT, "grp") # same as above
#> [1] 1 1 2 2 3 3 3 4 5 5
rleid(DT$grp, prefix="grp") # prefix with 'grp'
#> [1] "grp1" "grp1" "grp2" "grp2" "grp3" "grp3" "grp3" "grp4" "grp5" "grp5"
# get sum of value over run-length groups
DT[, sum(value), by=.(grp, rleid(grp))]
#> grp rleid V1
#> <char> <int> <int>
#> 1: A 1 3
#> 2: B 2 7
#> 3: C 3 18
#> 4: A 4 8
#> 5: B 5 19
DT[, sum(value), by=.(grp, rleid(grp, prefix="grp"))]
#> grp rleid V1
#> <char> <char> <int>
#> 1: A grp1 3
#> 2: B grp2 7
#> 3: C grp3 18
#> 4: A grp4 8
#> 5: B grp5 19