bag provides a framework for bagging classification or regression models. The user can provide their own functions for model building, prediction and aggregation of predictions (see Details below).

bag(x, ...)

bagControl(fit = NULL, predict = NULL, aggregate = NULL,
  downSample = FALSE, oob = TRUE, allowParallel = TRUE)

# S3 method for default
bag(x, y, B = 10, vars = ncol(x),
  bagControl = NULL, ...)

# S3 method for bag
predict(object, newdata = NULL, ...)

# S3 method for bag
print(x, ...)

# S3 method for bag
summary(object, ...)

# S3 method for summary.bag
print(x, digits = max(3, getOption("digits") - 3),
  ...)

ldaBag

plsBag

nbBag

ctreeBag

svmBag

nnetBag

Arguments

x

a matrix or data frame of predictors

...

arguments to pass to the model function

fit

a function that has arguments x, y and ... and produces a model object #' that can later be used for prediction. Example functions are found in ldaBag, plsBag, #' nbBag, svmBag and nnetBag.

predict

a function that generates predictions for each sub-model. The function should have #' arguments object and x. The output of the function can be any type of object (see the #' example below where posterior probabilities are generated. Example functions are found in ldaBag#' , plsBag, nbBag, svmBag and nnetBag.)

aggregate

a function with arguments x and type. The function that takes the output #' of the predict function and reduces the bagged predictions to a single prediction per sample. #' the type argument can be used to switch between predicting classes or class probabilities for #' classification models. Example functions are found in ldaBag, plsBag, nbBag, #' svmBag and nnetBag.

downSample

logical: for classification, should the data set be randomly sampled so that each #' class has the same number of samples as the smallest class?

oob

logical: should out-of-bag statistics be computed and the predictions retained?

allowParallel

a parallel backend is loaded and available, should the function use it?

y

a vector of outcomes

B

the number of bootstrap samples to train over.

vars

an integer. If this argument is not NULL, a random sample of size vars is taken of the predictors in each bagging iteration. If NULL, all predictors are used.

bagControl

a list of options.

object

an object of class bag.

newdata

a matrix or data frame of samples for prediction. Note that this argument must have a non-null value

digits

minimal number of significant digits.

Format

An object of class list of length 3.

Value

bag produces an object of class bag with elements

fits

a list with two sub-objects: the fit object has the actual model fit for that #' bagged samples and the vars object is either NULL or a vector of integers corresponding to which predictors were sampled for that model

control

a mirror of the arguments passed into bagControl

call

the call

B

the number of bagging iterations

dims

the dimensions of the training set

Details

The function is basically a framework where users can plug in any model in to assess the effect of bagging. Examples functions can be found in ldaBag, plsBag , nbBag, svmBag and nnetBag. Each has elements fit, pred and aggregate.

One note: when vars is not NULL, the sub-setting occurs prior to the fit and #' predict functions are called. In this way, the user probably does not need to account for the #' change in predictors in their functions.

When using bag with train, classification models should use type = "prob" #' inside of the predict function so that predict.train(object, newdata, type = "prob") will #' work.

If a parallel backend is registered, the foreach package is used to train the models in parallel.

Examples

## A simple example of bagging conditional inference regression trees: data(BloodBrain) ## treebag <- bag(bbbDescr, logBBB, B = 10, ## bagControl = bagControl(fit = ctreeBag$fit, ## predict = ctreeBag$pred, ## aggregate = ctreeBag$aggregate)) ## An example of pooling posterior probabilities to generate class predictions data(mdrr) ## remove some zero variance predictors and linear dependencies mdrrDescr <- mdrrDescr[, -nearZeroVar(mdrrDescr)] mdrrDescr <- mdrrDescr[, -findCorrelation(cor(mdrrDescr), .95)] ## basicLDA <- train(mdrrDescr, mdrrClass, "lda") ## bagLDA2 <- train(mdrrDescr, mdrrClass, ## "bag", ## B = 10, ## bagControl = bagControl(fit = ldaBag$fit, ## predict = ldaBag$pred, ## aggregate = ldaBag$aggregate), ## tuneGrid = data.frame(vars = c((1:10)*10 , ncol(mdrrDescr))))