`froll.Rd`

Fast rolling functions to calculate aggregates on sliding windows. Function name and arguments are experimental.

frollmean(x, n, fill=NA, algo=c("fast", "exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) frollsum(x, n, fill=NA, algo=c("fast","exact"), align=c("right", "left", "center"), na.rm=FALSE, hasNA=NA, adaptive=FALSE) frollapply(x, n, FUN, ..., fill=NA, align=c("right", "left", "center"))

x | Vector, |
---|---|

n | Integer vector giving rolling window size(s). This is the |

fill | Numeric; value to pad by. Defaults to |

algo | Character, default |

align | Character, specifying the "alignment" of the rolling window, defaulting to |

na.rm | Logical, default |

hasNA | Logical. If it is known that |

adaptive | Logical, default |

FUN | The function to be applied to the rolling window; see Details for restrictions. |

... | Extra arguments passed to |

`froll*`

functions accept vectors, lists, `data.frame`

s or
`data.table`

s. They always return a list except when the input is a
`vector`

and `length(n)==1`

, in which case a `vector`

is returned, for convenience. Thus, rolling functions can be used
conveniently within `data.table`

syntax.

Argument `n`

allows multiple values to apply rolling functions on
multiple window sizes. If `adaptive=TRUE`

, then `n`

must be a list.
Each list element must be integer vector of window sizes corresponding
to every single observation in each column; see Examples.

When `algo="fast"`

an *"on-line"* algorithm is used, and
all of `NaN, +Inf, -Inf`

are treated as `NA`

.
Setting `algo="exact"`

will make rolling functions to use a more
computationally-intensive algorithm that suffers less from floating point
rounding error (the same consideration applies to `mean`

).
`algo="exact"`

also handles `NaN, +Inf, -Inf`

consistently to
base R. In case of some functions (like *mean*), it will additionally
make extra pass to perform floating point error correction. Error
corrections might not be truly exact on some platforms (like Windows)
when using multiple threads.

Adaptive rolling functions are a special case where each observation has its own corresponding rolling window width. Due to the logic of adaptive rolling functions, the following restrictions apply:

`align`

only`"right"`

.if list of vectors is passed to

`x`

, then all vectors within it must have equal length.

When multiple columns or multiple windows width are provided, then they
are run in parallel. The exception is for `algo="exact"`

, which runs in
parallel already.

`frollapply`

computes rolling aggregate on arbitrary R functions.
The input `x`

(first argument) to the function `FUN`

is coerced to *numeric* beforehand and `FUN`

has to return a scalar *numeric* value. Checks for that are made only
during the first iteration when `FUN`

is evaluated. Edge cases can be
found in examples below. Any R function is supported, but it is not optimized
using our own C implementation -- hence, for example, using `frollapply`

to compute a rolling average is inefficient. It is also always single-threaded
because there is no thread-safe API to R's C `eval`

. Nevertheless we've
seen the computation speed up vis-a-vis versions implemented in base R.

A list except when the input is a `vector`

and
`length(n)==1`

in which case a `vector`

is returned.

Users coming from most popular package for rolling functions
`zoo`

might expect following differences in `data.table`

implementation.

rolling function will always return result of the same length as input.

`fill`

defaults to`NA`

.`fill`

accepts only constant values. It does not support for*na.locf*or other functions.`align`

defaults to`"right"`

.`na.rm`

is respected, and other functions are not needed when input contains`NA`

.integers and logical are always coerced to double.

when

`adaptive=FALSE`

(default), then`n`

must be a numeric vector. List is not accepted.when

`adaptive=TRUE`

, then`n`

must be vector of length equal to`nrow(x)`

, or list of such vectors.`partial`

window feature is not supported, although it can be accomplished by using`adaptive=TRUE`

, see examples.`NA`

is always returned for incomplete windows.

Be aware that rolling functions operates on the physical order of input. If the intent is to roll values in a vector by a logical window, for example an hour, or a day, one has to ensure that there are no gaps in input. For details see issue #3241.

d = as.data.table(list(1:6/2, 3:8/4)) # rollmean of single vector and single window frollmean(d[, V1], 3)#> [1] NA NA 1.0 1.5 2.0 2.5# multiple columns at once frollmean(d, 3)#> [[1]] #> [1] NA NA 1.0 1.5 2.0 2.5 #> #> [[2]] #> [1] NA NA 1.00 1.25 1.50 1.75 #>#> [[1]] #> [1] NA NA 1.0 1.5 2.0 2.5 #> #> [[2]] #> [1] NA NA NA 1.25 1.75 2.25 #>#> [[1]] #> [1] NA NA 1.0 1.5 2.0 2.5 #> #> [[2]] #> [1] NA NA NA 1.25 1.75 2.25 #> #> [[3]] #> [1] NA NA 1.00 1.25 1.50 1.75 #> #> [[4]] #> [1] NA NA NA 1.125 1.375 1.625 #>## three calls above will use multiple cores when available # partial window using adaptive rolling function an = function(n, len) c(seq.int(n), rep(n, len-n)) n = an(3, nrow(d)) frollmean(d, n, adaptive=TRUE)#> [[1]] #> [1] 0.50 0.75 1.00 1.50 2.00 2.50 #> #> [[2]] #> [1] 0.750 0.875 1.000 1.250 1.500 1.750 #># frollsum frollsum(d, 3:4)#> [[1]] #> [1] NA NA 3.0 4.5 6.0 7.5 #> #> [[2]] #> [1] NA NA NA 5 7 9 #> #> [[3]] #> [1] NA NA 3.00 3.75 4.50 5.25 #> #> [[4]] #> [1] NA NA NA 4.5 5.5 6.5 #># frollapply frollapply(d, 3:4, sum)#> [[1]] #> [1] NA NA 3.0 4.5 6.0 7.5 #> #> [[2]] #> [1] NA NA NA 5 7 9 #> #> [[3]] #> [1] NA NA 3.00 3.75 4.50 5.25 #> #> [[4]] #> [1] NA NA NA 4.5 5.5 6.5 #>f = function(x, ...) if (sum(x, ...)>5) min(x, ...) else max(x, ...) frollapply(d, 3:4, f, na.rm=TRUE)#> [[1]] #> [1] NA NA 1.5 2.0 1.5 2.0 #> #> [[2]] #> [1] NA NA NA 2.0 1.0 1.5 #> #> [[3]] #> [1] NA NA 1.25 1.50 1.75 1.50 #> #> [[4]] #> [1] NA NA NA 1.50 1.00 1.25 #># performance vs exactness set.seed(108) x = sample(c(rnorm(1e3, 1e6, 5e5), 5e9, 5e-9)) n = 15 ma = function(x, n, na.rm=FALSE) { ans = rep(NA_real_, nx<-length(x)) for (i in n:nx) ans[i] = mean(x[(i-n+1):i], na.rm=na.rm) ans } fastma = function(x, n, na.rm) { if (!missing(na.rm)) stop("NAs are unsupported, wrongly propagated by cumsum") cs = cumsum(x) scs = shift(cs, n) scs[n] = 0 as.double((cs-scs)/n) } system.time(ans1<-ma(x, n))#> user system elapsed #> 0.006 0.000 0.006#> user system elapsed #> 0.001 0.000 0.000#> user system elapsed #> 0 0 0#> user system elapsed #> 0 0 0#> user system elapsed #> 0.003 0.000 0.004anserr = list( fastma = ans2-ans1, froll_fast = ans3-ans1, froll_exact = ans4-ans1, frollapply = ans5-ans1 ) errs = sapply(lapply(anserr, abs), sum, na.rm=TRUE) sapply(errs, format, scientific=FALSE) # roundoff#> fastma froll_fast froll_exact frollapply #> "0.00001287466" "0.00000001833541" "0" "0"# frollapply corner cases f = function(x) head(x, 2) ## FUN returns non length 1 try(frollapply(1:5, 3, f))#> Error in frollapply(1:5, 3, f) : #> frollapply: results from provided FUN are not length 1f = function(x) { ## FUN sometimes returns non length 1 n = length(x) # length 1 will be returned only for first iteration where we check length if (n==x[n]) x[1L] else range(x) # range(x)[2L] is silently ignored! } frollapply(1:5, 3, f)#> [1] NA NA 1 2 3options(datatable.verbose=TRUE) x = c(1,2,1,1,1,2,3,2) frollapply(x, 3, uniqueN) ## FUN returns integer#> frollapplyR: allocating memory for results 1x1 #> forder.c received 3 rows and 1 columns #> frollapply: results from provided FUN are not of type double, coercion from integer or logical will be applied on each iteration #> forder.c received 3 rows and 1 columns #> forder.c received 3 rows and 1 columns #> forder.c received 3 rows and 1 columns #> forder.c received 3 rows and 1 columns #> forder.c received 3 rows and 1 columns #> frollapply: took 0.000s #> frollapplyR: processing of 1 column(s) and 1 window(s) took 0.000s#> [1] NA NA 2 2 1 2 3 2#> frollapplyR: allocating memory for results 1x1 #> forder.c received 3 rows and 1 columns #> forder.c received 3 rows and 1 columns #> forder.c received 3 rows and 1 columns #> forder.c received 3 rows and 1 columns #> forder.c received 3 rows and 1 columns #> forder.c received 3 rows and 1 columns #> frollapply: took 0.000s #> frollapplyR: processing of 1 column(s) and 1 window(s) took 0.000s#> [1] NA NA 2 2 1 2 3 2#> frollapplyR: allocating memory for results 1x1 #> frollapply: results from provided FUN are not of type double, coercion from integer or logical will be applied on each iteration #> frollapply: took 0.000s #> frollapplyR: processing of 1 column(s) and 1 window(s) took 0.000s#> [1] NA NA 0 0 1 1 1 1#> frollapplyR: allocating memory for results 1x1 #> frollapply: results from provided FUN are not of type double, coercion from integer or logical will be applied on each iteration #> frollapply: took 0.000s #> frollapplyR: processing of 1 column(s) and 1 window(s) took 0.000s#> [1] NA NA FALSE FALSE TRUE TRUE TRUE TRUEoptions(datatable.verbose=FALSE) f = function(x) { ## FUN returns character if (sum(x)>5) "big" else "small" } try(frollapply(1:5, 3, f))#> Error in frollapply(1:5, 3, f) : #> frollapply: results from provided FUN are not of type doublef = function(x) { ## FUN is not type-stable n = length(x) # double type will be returned only for first iteration where we check type if (n==x[n]) 1 else NA # NA logical turns into garbage without coercion to double } try(frollapply(1:5, 3, f))#> Error in frollapply(1:5, 3, f) : #> REAL() can only be applied to a 'numeric', not a 'logical'