Rolling functions
froll.Rd
Fast rolling functions to calculate aggregates on sliding windows. Function name and arguments are experimental.
Usage
frollmean(x, n, fill=NA, algo=c("fast", "exact"),
align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollsum(x, n, fill=NA, algo=c("fast","exact"),
align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE)
frollapply(x, n, FUN, ..., fill=NA, align=c("right", "left", "center"))
Arguments
- x
Vector,
data.frame
ordata.table
of integer, numeric or logical columns over which to calculate the windowed aggregations. May also be a list, in which case the rolling function is applied to each of its elements.- n
Integer vector giving rolling window size(s). This is the total number of included values. Adaptive rolling functions also accept a list of integer vectors.
- fill
Numeric; value to pad by. Defaults to
NA
.- algo
Character, default
"fast"
. When set to"exact"
, a slower (but more accurate) algorithm is used. It suffers less from floating point rounding errors by performing an extra pass, and carefully handles all non-finite values. It will use multiple cores where available. See Details for more information.- align
Character, specifying the "alignment" of the rolling window, defaulting to
"right"
."right"
covers preceding rows (the window ends on the current value);"left"
covers following rows (the window starts on the current value);"center"
is halfway in between (the window is centered on the current value, biased towards"left"
whenn
is even).- na.rm
Logical, default
FALSE
. Should missing values be removed when calculating window? For details on handling other non-finite values, see Details.- hasNA
Logical. If it is known that
x
containsNA
then setting this toTRUE
will speed up calculation. Defaults toNA
.- adaptive
Logical, default
FALSE
. Should the rolling function be calculated adaptively? See Details below.- FUN
The function to be applied to the rolling window; see Details for restrictions.
- ...
Extra arguments passed to
FUN
infrollapply
.
Details
froll*
functions accept vectors, lists, data.frame
s or
data.table
s. They always return a list except when the input is a
vector
and length(n)==1
, in which case a vector
is returned, for convenience. Thus, rolling functions can be used
conveniently within data.table
syntax.
Argument n
allows multiple values to apply rolling functions on
multiple window sizes. If adaptive=TRUE
, then n
must be a list.
Each list element must be integer vector of window sizes corresponding
to every single observation in each column; see Examples.
When algo="fast"
an "on-line" algorithm is used, and
all of NaN, +Inf, -Inf
are treated as NA
.
Setting algo="exact"
will make rolling functions to use a more
computationally-intensive algorithm that suffers less from floating point
rounding error (the same consideration applies to mean
).
algo="exact"
also handles NaN, +Inf, -Inf
consistently to
base R. In case of some functions (like mean), it will additionally
make extra pass to perform floating point error correction. Error
corrections might not be truly exact on some platforms (like Windows)
when using multiple threads.
Adaptive rolling functions are a special case where each observation has its own corresponding rolling window width. Due to the logic of adaptive rolling functions, the following restrictions apply:
align
only"right"
.if list of vectors is passed to
x
, then all vectors within it must have equal length.
When multiple columns or multiple windows width are provided, then they
are run in parallel. The exception is for algo="exact"
, which runs in
parallel already.
frollapply
computes rolling aggregate on arbitrary R functions.
The input x
(first argument) to the function FUN
is coerced to numeric beforehand and FUN
has to return a scalar numeric value. Checks for that are made only
during the first iteration when FUN
is evaluated. Edge cases can be
found in examples below. Any R function is supported, but it is not optimized
using our own C implementation -- hence, for example, using frollapply
to compute a rolling average is inefficient. It is also always single-threaded
because there is no thread-safe API to R's C eval
. Nevertheless we've
seen the computation speed up vis-a-vis versions implemented in base R.
Note
Users coming from most popular package for rolling functions
zoo
might expect following differences in data.table
implementation.
rolling function will always return result of the same length as input.
fill
defaults toNA
.fill
accepts only constant values. It does not support for na.locf or other functions.align
defaults to"right"
.na.rm
is respected, and other functions are not needed when input containsNA
.integers and logical are always coerced to double.
when
adaptive=FALSE
(default), thenn
must be a numeric vector. List is not accepted.when
adaptive=TRUE
, thenn
must be vector of length equal tonrow(x)
, or list of such vectors.partial
window feature is not supported, although it can be accomplished by usingadaptive=TRUE
, see examples.NA
is always returned for incomplete windows.
Be aware that rolling functions operates on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, one has to ensure that there are no gaps in input. For details see issue #3241.
Examples
d = as.data.table(list(1:6/2, 3:8/4))
# rollmean of single vector and single window
frollmean(d[, V1], 3)
#> [1] NA NA 1.0 1.5 2.0 2.5
# multiple columns at once
frollmean(d, 3)
#> [[1]]
#> [1] NA NA 1.0 1.5 2.0 2.5
#>
#> [[2]]
#> [1] NA NA 1.00 1.25 1.50 1.75
#>
# multiple windows at once
frollmean(d[, .(V1)], c(3, 4))
#> [[1]]
#> [1] NA NA 1.0 1.5 2.0 2.5
#>
#> [[2]]
#> [1] NA NA NA 1.25 1.75 2.25
#>
# multiple columns and multiple windows at once
frollmean(d, c(3, 4))
#> [[1]]
#> [1] NA NA 1.0 1.5 2.0 2.5
#>
#> [[2]]
#> [1] NA NA NA 1.25 1.75 2.25
#>
#> [[3]]
#> [1] NA NA 1.00 1.25 1.50 1.75
#>
#> [[4]]
#> [1] NA NA NA 1.125 1.375 1.625
#>
## three calls above will use multiple cores when available
# partial window using adaptive rolling function
an = function(n, len) c(seq.int(n), rep(n, len-n))
n = an(3, nrow(d))
frollmean(d, n, adaptive=TRUE)
#> [[1]]
#> [1] 0.50 0.75 1.00 1.50 2.00 2.50
#>
#> [[2]]
#> [1] 0.750 0.875 1.000 1.250 1.500 1.750
#>
# frollsum
frollsum(d, 3:4)
#> [[1]]
#> [1] NA NA 3.0 4.5 6.0 7.5
#>
#> [[2]]
#> [1] NA NA NA 5 7 9
#>
#> [[3]]
#> [1] NA NA 3.00 3.75 4.50 5.25
#>
#> [[4]]
#> [1] NA NA NA 4.5 5.5 6.5
#>
# frollapply
frollapply(d, 3:4, sum)
#> [[1]]
#> [1] NA NA 3.0 4.5 6.0 7.5
#>
#> [[2]]
#> [1] NA NA NA 5 7 9
#>
#> [[3]]
#> [1] NA NA 3.00 3.75 4.50 5.25
#>
#> [[4]]
#> [1] NA NA NA 4.5 5.5 6.5
#>
f = function(x, ...) if (sum(x, ...)>5) min(x, ...) else max(x, ...)
frollapply(d, 3:4, f, na.rm=TRUE)
#> [[1]]
#> [1] NA NA 1.5 2.0 1.5 2.0
#>
#> [[2]]
#> [1] NA NA NA 2.0 1.0 1.5
#>
#> [[3]]
#> [1] NA NA 1.25 1.50 1.75 1.50
#>
#> [[4]]
#> [1] NA NA NA 1.50 1.00 1.25
#>
# performance vs exactness
set.seed(108)
x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9))
n = 15
ma = function(x, n, na.rm=FALSE) {
ans = rep(NA_real_, nx<-length(x))
for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm)
ans
}
fastma = function(x, n, na.rm) {
if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum")
cs = cumsum(x)
scs = shift(cs, n)
scs[n] = 0
as.double((cs-scs)/n)
}
system.time(ans1<-ma(x, n))
#> user system elapsed
#> 0.005 0.000 0.005
system.time(ans2<-fastma(x, n))
#> user system elapsed
#> 0.001 0.000 0.000
system.time(ans3<-frollmean(x, n))
#> user system elapsed
#> 0 0 0
system.time(ans4<-frollmean(x, n, algo="exact"))
#> user system elapsed
#> 0 0 0
system.time(ans5<-frollapply(x, n, mean))
#> user system elapsed
#> 0.003 0.000 0.003
anserr = list(
fastma = ans2-ans1,
froll_fast = ans3-ans1,
froll_exact = ans4-ans1,
frollapply = ans5-ans1
)
errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE)
sapply(errs, format, scientific=FALSE) # roundoff
#> fastma froll_fast froll_exact frollapply
#> "0.00001287466" "0.00000001833541" "0" "0"
# frollapply corner cases
f = function(x) head(x, 2) ## FUN returns non length 1
try(frollapply(1:5, 3, f))
#> Error in frollapply(1:5, 3, f) :
#> frollapply: results from provided FUN are not length 1
f = function(x) { ## FUN sometimes returns non length 1
n = length(x)
# length 1 will be returned only for first iteration where we check length
if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored!
}
frollapply(1:5, 3, f)
#> [1] NA NA 1 2 3
options(datatable.verbose=TRUE)
x = c(1,2,1,1,1,2,3,2)
frollapply(x, 3, uniqueN) ## FUN returns integer
#> frollapplyR: allocating memory for results 1x1
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> frollapply: results from provided FUN are not of type double, coercion from integer or logical will be applied on each iteration
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> frollapply: took 0.000s
#> frollapplyR: processing of 1 column(s) and 1 window(s) took 0.000s
#> [1] NA NA 2 2 1 2 3 2
numUniqueN = function(x) as.numeric(uniqueN(x))
frollapply(x, 3, numUniqueN)
#> frollapplyR: allocating memory for results 1x1
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> forderReuseSorting: opt not possible: is.data.table(DT)=0, sortGroups=1, all1(ascArg)=1
#> forder.c received 3 rows and 1 columns
#> forderReuseSorting: opt=0, took 0.000s
#> frollapply: took 0.000s
#> frollapplyR: processing of 1 column(s) and 1 window(s) took 0.000s
#> [1] NA NA 2 2 1 2 3 2
x = c(1,2,1,1,NA,2,NA,2)
frollapply(x, 3, anyNA) ## FUN returns logical
#> frollapplyR: allocating memory for results 1x1
#> frollapply: results from provided FUN are not of type double, coercion from integer or logical will be applied on each iteration
#> frollapply: took 0.000s
#> frollapplyR: processing of 1 column(s) and 1 window(s) took 0.000s
#> [1] NA NA 0 0 1 1 1 1
as.logical(frollapply(x, 3, anyNA))
#> frollapplyR: allocating memory for results 1x1
#> frollapply: results from provided FUN are not of type double, coercion from integer or logical will be applied on each iteration
#> frollapply: took 0.000s
#> frollapplyR: processing of 1 column(s) and 1 window(s) took 0.000s
#> [1] NA NA FALSE FALSE TRUE TRUE TRUE TRUE
options(datatable.verbose=FALSE)
f = function(x) { ## FUN returns character
if (sum(x)>5) "big" else "small"
}
try(frollapply(1:5, 3, f))
#> Error in frollapply(1:5, 3, f) :
#> frollapply: results from provided FUN are not of type double
f = function(x) { ## FUN is not type-stable
n = length(x)
# double type will be returned only for first iteration where we check type
if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double
}
try(frollapply(1:5, 3, f))
#> Error in frollapply(1:5, 3, f) :
#> REAL() can only be applied to a 'numeric', not a 'logical'