rfcv.Rd
This function shows the cross-validated prediction performance of models with sequentially reduced number of predictors (ranked by variable importance) via a nested cross-validation procedure.
rfcv(trainx, trainy, cv.fold=5, scale="log", step=0.5, mtry=function(p) max(1, floor(sqrt(p))), recursive=FALSE, ...)
trainx | matrix or data frame containing columns of predictor variables |
---|---|
trainy | vector of response, must have length equal to the number
of rows in |
cv.fold | number of folds in the cross-validation |
scale | if |
step | if |
mtry | a function of number of remaining predictor variables to
use as the |
recursive | whether variable importance is (re-)assessed at each step of variable reduction |
... | other arguments passed on to |
A list with the following components:
list(n.var=n.var, error.cv=error.cv, predicted=cv.pred)
vector of number of variables used at each step
corresponding vector of error rates or MSEs at each step
list of n.var
components, each containing
the predicted values from the cross-validation
Svetnik, V., Liaw, A., Tong, C. and Wang, T., ``Application of Breiman's Random Forest to Modeling Structure-Activity Relationships of Pharmaceutical Molecules'', MCS 2004, Roli, F. and Windeatt, T. (Eds.) pp. 334-343.
set.seed(647) myiris <- cbind(iris[1:4], matrix(runif(96 * nrow(iris)), nrow(iris), 96)) result <- rfcv(myiris, iris$Species, cv.fold=3) with(result, plot(n.var, error.cv, log="x", type="o", lwd=2))## The following can take a while to run, so if you really want to try ## it, copy and paste the code into R. if (FALSE) { result <- replicate(5, rfcv(myiris, iris$Species), simplify=FALSE) error.cv <- sapply(result, "[[", "error.cv") matplot(result[[1]]$n.var, cbind(rowMeans(error.cv), error.cv), type="l", lwd=c(2, rep(1, ncol(error.cv))), col=1, lty=1, log="x", xlab="Number of variables", ylab="CV Error") }