bag
provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details below).
bag(x, ...) bagControl(fit = NULL, predict = NULL, aggregate = NULL, downSample = FALSE, oob = TRUE, allowParallel = TRUE) # S3 method for default bag(x, y, B = 10, vars = ncol(x), bagControl = NULL, ...) # S3 method for bag predict(object, newdata = NULL, ...) # S3 method for bag print(x, ...) # S3 method for bag summary(object, ...) # S3 method for summary.bag print(x, digits = max(3, getOption("digits") - 3), ...) ldaBag plsBag nbBag ctreeBag svmBag nnetBag
x | a matrix or data frame of predictors |
---|---|
... | arguments to pass to the model function |
fit | a function that has arguments |
predict | a function that generates predictions for each sub-model. The function should have #' arguments |
aggregate | a function with arguments |
downSample | logical: for classification, should the data set be randomly sampled so that each #' class has the same number of samples as the smallest class? |
oob | logical: should out-of-bag statistics be computed and the predictions retained? |
allowParallel | a parallel backend is loaded and available, should the function use it? |
y | a vector of outcomes |
B | the number of bootstrap samples to train over. |
vars | an integer. If this argument is not |
bagControl | a list of options. |
object | an object of class |
newdata | a matrix or data frame of samples for prediction. Note that this argument must have a non-null value |
digits | minimal number of significant digits. |
An object of class list
of length 3.
bag
produces an object of class bag
with elements
a list with two sub-objects: the fit
object has the actual model fit for that #' bagged samples and the vars
object is either NULL
or a vector of integers corresponding to which predictors were sampled for that model
a mirror of the arguments passed into bagControl
the call
the number of bagging iterations
the dimensions of the training set
The function is basically a framework where users can plug in any model in to assess
the effect of bagging. Examples functions can be found in ldaBag
, plsBag
, nbBag
, svmBag
and nnetBag
.
Each has elements fit
, pred
and aggregate
.
One note: when vars
is not NULL
, the sub-setting occurs prior to the fit
and #' predict
functions are called. In this way, the user probably does not need to account for the #' change in predictors in their functions.
When using bag
with train
, classification models should use type = "prob"
#' inside of the predict
function so that predict.train(object, newdata, type = "prob")
will #' work.
If a parallel backend is registered, the foreach package is used to train the models in parallel.
## A simple example of bagging conditional inference regression trees: data(BloodBrain) ## treebag <- bag(bbbDescr, logBBB, B = 10, ## bagControl = bagControl(fit = ctreeBag$fit, ## predict = ctreeBag$pred, ## aggregate = ctreeBag$aggregate)) ## An example of pooling posterior probabilities to generate class predictions data(mdrr) ## remove some zero variance predictors and linear dependencies mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)] mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)] ## basicLDA <- train(mdrrDescr, mdrrClass, "lda") ## bagLDA2 <- train(mdrrDescr, mdrrClass, ## "bag", ## B = 10, ## bagControl = bagControl(fit = ldaBag$fit, ## predict = ldaBag$pred, ## aggregate = ldaBag$aggregate), ## tuneGrid = data.frame(vars = c((1:10)*10 , ncol(mdrrDescr))))