Remove rows with missing values on columns specified
na.omit.data.table.Rd
This is a data.table
method for the S3 generic stats::na.omit
. The internals are written in C for speed. See examples for benchmark timings.
bit64::integer64
type is also supported.
Usage
# S3 method for data.table
na.omit(object, cols=seq_along(object), invert=FALSE, ...)
Arguments
- object
A
data.table
.- cols
A vector of column names (or numbers) on which to check for missing values. Default is all the columns.
- invert
logical. If
FALSE
omits all rows with any missing values (default).TRUE
returns just those rows with missing values instead.- ...
Further arguments special methods could require.
Details
The data.table
method consists of an additional argument cols
, which when specified looks for missing values in just those columns specified. The default value for cols
is all the columns, to be consistent with the default behaviour of stats::na.omit
.
It does not add the attribute na.action
as stats::na.omit
does.
Value
A data.table with just the rows where the specified columns have no missing value in any of them.
Examples
DT = data.table(x=c(1,NaN,NA,3), y=c(NA_integer_, 1:3), z=c("a", NA_character_, "b", "c"))
# default behaviour
na.omit(DT)
#> x y z
#> <num> <int> <char>
#> 1: 3 3 c
# omit rows where 'x' has a missing value
na.omit(DT, cols="x")
#> x y z
#> <num> <int> <char>
#> 1: 1 NA a
#> 2: 3 3 c
# omit rows where either 'x' or 'y' have missing values
na.omit(DT, cols=c("x", "y"))
#> x y z
#> <num> <int> <char>
#> 1: 3 3 c
if (FALSE) {
# Timings on relatively large data
set.seed(1L)
DT = data.table(x = sample(c(1:100, NA_integer_), 5e7L, TRUE),
y = sample(c(rnorm(100), NA), 5e7L, TRUE))
system.time(ans1 <- na.omit(DT)) ## 2.6 seconds
system.time(ans2 <- stats:::na.omit.data.frame(DT)) ## 29 seconds
# identical? check each column separately, as ans2 will have additional attribute
all(sapply(1:2, function(i) identical(ans1[[i]], ans2[[i]]))) ## TRUE
}