A General Framework For Bagging

bag provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details below).

bag(x, ...)

bagControl(fit = NULL, predict = NULL, aggregate = NULL,
  downSample = FALSE, oob = TRUE, allowParallel = TRUE)

# S3 method for default
bag(x, y, B = 10, vars = ncol(x),
  bagControl = NULL, ...)

# S3 method for bag
predict(object, newdata = NULL, ...)

# S3 method for bag
print(x, ...)

# S3 method for bag
summary(object, ...)

# S3 method for summary.bag
print(x, digits = max(3, getOption("digits") - 3),
  ...)

ldaBag

plsBag

nbBag

ctreeBag

svmBag

nnetBag

Arguments

x	a matrix or data frame of predictors
...	arguments to pass to the model function
fit	a function that has arguments `x`, `y` and `...` and produces a model object #' that can later be used for prediction. Example functions are found in `ldaBag`, `plsBag`, #' `nbBag`, `svmBag` and `nnetBag`.
predict	a function that generates predictions for each sub-model. The function should have #' arguments `object` and `x`. The output of the function can be any type of object (see the #' example below where posterior probabilities are generated. Example functions are found in `ldaBag`#' , `plsBag`, `nbBag`, `svmBag` and `nnetBag`.)
aggregate	a function with arguments `x` and `type`. The function that takes the output #' of the `predict` function and reduces the bagged predictions to a single prediction per sample. #' the `type` argument can be used to switch between predicting classes or class probabilities for #' classification models. Example functions are found in `ldaBag`, `plsBag`, `nbBag`, #' `svmBag` and `nnetBag`.
downSample	logical: for classification, should the data set be randomly sampled so that each #' class has the same number of samples as the smallest class?
oob	logical: should out-of-bag statistics be computed and the predictions retained?
allowParallel	a parallel backend is loaded and available, should the function use it?
y	a vector of outcomes
B	the number of bootstrap samples to train over.
vars	an integer. If this argument is not `NULL`, a random sample of size `vars` is taken of the predictors in each bagging iteration. If `NULL`, all predictors are used.
bagControl	a list of options.
object	an object of class `bag`.
newdata	a matrix or data frame of samples for prediction. Note that this argument must have a non-null value
digits	minimal number of significant digits.

Format

An object of class list of length 3.

Value

bag produces an object of class bag with elements

fits

a list with two sub-objects: the fit object has the actual model fit for that #' bagged samples and the vars object is either NULL or a vector of integers corresponding to which predictors were sampled for that model

control

a mirror of the arguments passed into bagControl

call

the call

the number of bagging iterations

dims

the dimensions of the training set

Details

The function is basically a framework where users can plug in any model in to assess the effect of bagging. Examples functions can be found in ldaBag, plsBag , nbBag, svmBag and nnetBag. Each has elements fit, pred and aggregate.

One note: when vars is not NULL, the sub-setting occurs prior to the fit and #' predict functions are called. In this way, the user probably does not need to account for the #' change in predictors in their functions.

When using bag with train, classification models should use type = "prob" #' inside of the predict function so that predict.train(object, newdata, type = "prob") will #' work.

If a parallel backend is registered, the foreach package is used to train the models in parallel.

Examples

## A simple example of bagging conditional inference regression trees:
data(BloodBrain)

## treebag <- bag(bbbDescr, logBBB, B = 10,
##                bagControl = bagControl(fit = ctreeBag$fit,
##                                        predict = ctreeBag$pred,
##                                        aggregate = ctreeBag$aggregate))




## An example of pooling posterior probabilities to generate class predictions
data(mdrr)

## remove some zero variance predictors and linear dependencies
mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)]
mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)]

## basicLDA <- train(mdrrDescr, mdrrClass, "lda")

## bagLDA2 <- train(mdrrDescr, mdrrClass,
##                  "bag",
##                  B = 10,
##                  bagControl = bagControl(fit = ldaBag$fit,
##                                          predict = ldaBag$pred,
##                                          aggregate = ldaBag$aggregate),
##                  tuneGrid = data.frame(vars = c((1:10)*10 , ncol(mdrrDescr))))