na.roughfix.Rd
Impute Missing Values by median/mode.
na.roughfix(object, ...)
object | a data frame or numeric matrix. |
---|---|
... | further arguments special methods could require. |
A completed data matrix or data frame. For numeric variables,
NA
s are replaced with column medians. For factor variables,
NA
s are replaced with the most frequent levels (breaking ties
at random). If object
contains no NA
s, it is returned
unaltered.
This is used as a starting point for imputing missing values by random forest.
data(iris) iris.na <- iris set.seed(111) ## artificially drop some data values. for (i in 1:4) iris.na[sample(150, sample(20)), i] <- NA iris.roughfix <- na.roughfix(iris.na) iris.narf <- randomForest(Species ~ ., iris.na, na.action=na.roughfix) print(iris.narf)#> #> Call: #> randomForest(formula = Species ~ ., data = iris.na, na.action = na.roughfix) #> Type of random forest: classification #> Number of trees: 500 #> No. of variables tried at each split: 2 #> #> OOB estimate of error rate: 4.67% #> Confusion matrix: #> setosa versicolor virginica class.error #> setosa 50 0 0 0.00 #> versicolor 0 47 3 0.06 #> virginica 0 4 46 0.08