na.omit.data.table.Rd
This is a data.table
method for the S3 generic stats::na.omit
. The internals are written in C for speed. See examples for benchmark timings.
bit64::integer64
type is also supported.
# S3 method for data.table na.omit(object, cols=seq_along(object), invert=FALSE, ...)
object | A |
---|---|
cols | A vector of column names (or numbers) on which to check for missing values. Default is all the columns. |
invert | logical. If |
... | Further arguments special methods could require. |
The data.table
method consists of an additional argument cols
, which when specified looks for missing values in just those columns specified. The default value for cols
is all the columns, to be consistent with the default behaviour of stats::na.omit
.
It does not add the attribute na.action
as stats::na.omit
does.
A data.table with just the rows where the specified columns have no missing value in any of them.
DT = data.table(x=c(1,NaN,NA,3), y=c(NA_integer_, 1:3), z=c("a", NA_character_, "b", "c")) # default behaviour na.omit(DT)#> x y z #> 1: 3 3 c# omit rows where 'x' has a missing value na.omit(DT, cols="x")#> x y z #> 1: 1 NA a #> 2: 3 3 c#> x y z #> 1: 3 3 cif (FALSE) { # Timings on relatively large data set.seed(1L) DT = data.table(x = sample(c(1:100, NA_integer_), 5e7L, TRUE), y = sample(c(rnorm(100), NA), 5e7L, TRUE)) system.time(ans1 <- na.omit(DT)) ## 2.6 seconds system.time(ans2 <- stats:::na.omit.data.frame(DT)) ## 29 seconds # identical? check each column separately, as ans2 will have additional attribute all(sapply(1:2, function(i) identical(ans1[[i]], ans2[[i]]))) ## TRUE }