Workhorse function providing the link between R and the C++ gbm engine.
gbm
is a front-end to gbm.fit
that uses the familiar R
modeling formulas. However, model.frame
is very slow if
there are many predictor variables. For power-users with many variables use
gbm.fit
. For general practice gbm
is preferable.
gbm.fit(x, y, offset = NULL, misc = NULL, distribution = "bernoulli", w = NULL, var.monotone = NULL, n.trees = 100, interaction.depth = 1, n.minobsinnode = 10, shrinkage = 0.001, bag.fraction = 0.5, nTrain = NULL, train.fraction = NULL, keep.data = TRUE, verbose = TRUE, var.names = NULL, response.name = "y", group = NULL)
x | A data frame or matrix containing the predictor variables. The
number of rows in |
---|---|
y | A vector of outcomes. The number of rows in |
offset | A vector of offset values. |
misc | An R object that is simply passed on to the gbm engine. It can be used for additional data for the specific distribution. Currently it is only used for passing the censoring indicator for the Cox proportional hazards model. |
distribution | Either a character string specifying the name of the
distribution to use or a list with a component Currently available options are If quantile regression is specified, If If "pairwise" regression is specified,
Note that splitting of instances into training and validation sets follows
group boundaries and therefore only approximates the specified
Weights can be used in conjunction with pairwise metrics, however it is assumed that they are constant for instances from the same group. For details and background on the algorithm, see e.g. Burges (2010). |
w | A vector of weights of the same length as the |
var.monotone | an optional vector, the same length as the number of predictors, indicating which variables have a monotone increasing (+1), decreasing (-1), or arbitrary (0) relationship with the outcome. |
n.trees | the total number of trees to fit. This is equivalent to the number of iterations and the number of basis functions in the additive expansion. |
interaction.depth | The maximum depth of variable interactions. A value
of 1 implies an additive model, a value of 2 implies a model with up to 2-way
interactions, etc. Default is |
n.minobsinnode | Integer specifying the minimum number of observations in the trees terminal nodes. Note that this is the actual number of observations not the total weight. |
shrinkage | The shrinkage parameter applied to each tree in the
expansion. Also known as the learning rate or step-size reduction; 0.001 to
0.1 usually work, but a smaller learning rate typically requires more trees.
Default is |
bag.fraction | The fraction of the training set observations randomly
selected to propose the next tree in the expansion. This introduces
randomnesses into the model fit. If |
nTrain | An integer representing the number of cases on which to train.
This is the preferred way of specification for |
train.fraction | The first |
keep.data | Logical indicating whether or not to keep the data and an
index of the data stored with the object. Keeping the data and index makes
subsequent calls to |
verbose | Logical indicating whether or not to print out progress and
performance indicators ( |
var.names | Vector of strings of length equal to the number of columns
of |
response.name | Character string label for the response variable. |
group | The |
A gbm.object
object.
This package implements the generalized boosted modeling framework. Boosting is the process of iteratively adding basis functions in a greedy fashion so that each additional basis function further reduces the selected loss function. This implementation closely follows Friedman's Gradient Boosting Machine (Friedman, 2001).
In addition to many of the features documented in the Gradient Boosting
Machine, gbm
offers additional features including the out-of-bag
estimator for the optimal number of iterations, the ability to store and
manipulate the resulting gbm
object, and a variety of other loss
functions that had not previously had associated boosting algorithms,
including the Cox partial likelihood for censored data, the poisson
likelihood for count outcomes, and a gradient boosting implementation to
minimize the AdaBoost exponential loss function.
Y. Freund and R.E. Schapire (1997) “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, 55(1):119-139.
G. Ridgeway (1999). “The state of boosting,” Computing Science and Statistics 31:172-181.
J.H. Friedman, T. Hastie, R. Tibshirani (2000). “Additive Logistic Regression: a Statistical View of Boosting,” Annals of Statistics 28(2):337-374.
J.H. Friedman (2001). “Greedy Function Approximation: A Gradient Boosting Machine,” Annals of Statistics 29(5):1189-1232.
J.H. Friedman (2002). “Stochastic Gradient Boosting,” Computational Statistics and Data Analysis 38(4):367-378.
B. Kriegler (2007). Cost-Sensitive Stochastic Gradient Boosting Within a Quantitative Regression Framework. Ph.D. Dissertation. University of California at Los Angeles, Los Angeles, CA, USA. Advisor(s) Richard A. Berk. urlhttps://dl.acm.org/citation.cfm?id=1354603.
C. Burges (2010). “From RankNet to LambdaRank to LambdaMART: An Overview,” Microsoft Research Technical Report MSR-TR-2010-82.