Compute partial dependence functions (i.e., marginal effects) for various model fitting objects.
partial(object, ...) # S3 method for default partial(object, pred.var, pred.grid, pred.fun = NULL, grid.resolution = NULL, ice = FALSE, center = FALSE, quantiles = FALSE, probs = 1:9/10, trim.outliers = FALSE, type = c("auto", "regression", "classification"), inv.link = NULL, which.class = 1L, prob = FALSE, recursive = TRUE, plot = FALSE, plot.engine = c("lattice", "ggplot2"), smooth = FALSE, rug = FALSE, chull = FALSE, levelplot = TRUE, contour = FALSE, contour.color = "white", palette = c("viridis", "magma", "inferno", "plasma", "cividis"), alpha = 1, train, cats = NULL, check.class = TRUE, progress = "none", parallel = FALSE, paropts = NULL, ...)
object | A fitted model object of appropriate class (e.g., |
---|---|
... | Additional optional arguments to be passed onto
|
pred.var | Character string giving the names of the predictor variables of interest. For reasons of computation/interpretation, this should include no more than three variables. |
pred.grid | Data frame containing the joint values of interest for the
variables listed in |
pred.fun | Optional prediction function that requires two arguments:
|
grid.resolution | Integer giving the number of equally spaced points to
use for the continuous variables listed in |
ice | Logical indicating whether or not to compute individual
conditional expectation (ICE) curves. Default is |
center | Logical indicating whether or not to produce centered ICE
curves (c-ICE curves). Only used when |
quantiles | Logical indicating whether or not to use the sample
quantiles of the continuous predictors listed in |
probs | Numeric vector of probabilities with values in [0,1]. (Values up
to 2e-14 outside that range are accepted and moved to the nearby endpoint.)
Default is |
trim.outliers | Logical indicating whether or not to trim off outliers
from the continuous predictors listed in |
type | Character string specifying the type of supervised learning.
Current options are |
inv.link | Function specifying the transformation to be applied to the
predictions before the partial dependence function is computed
(experimental). Default is |
which.class | Integer specifying which column of the matrix of predicted
probabilities to use as the "focus" class. Default is to use the first class.
Only used for classification problems (i.e., when
|
prob | Logical indicating whether or not partial dependence for
classification problems should be returned on the probability scale, rather
than the centered logit. If |
recursive | Logical indicating whether or not to use the weighted tree
traversal method described in Friedman (2001). This only applies to objects
that inherit from class |
plot | Logical indicating whether to return a data frame containing the
partial dependence values ( |
plot.engine | Character string specifying which plotting engine to use
whenever |
smooth | Logical indicating whether or not to overlay a LOESS smooth.
Default is |
rug | Logical indicating whether or not to include a rug display on the
predictor axes. The tick marks indicate the min/max and deciles of the
predictor distributions. This helps reduce the risk of interpreting the
partial dependence plot outside the region of the data (i.e., extrapolating).
Only used when |
chull | Logical indicating whether or not to restrict the values of the
first two variables in |
levelplot | Logical indicating whether or not to use a false color level
plot ( |
contour | Logical indicating whether or not to add contour lines to the
level plot. Only used when |
contour.color | Character string specifying the color to use for the
contour lines when |
palette | Character string indicating the colormap option to use. Five options are available: "viridis" (the default), "magma", "inferno", "plasma", and "cividis". |
alpha | Numeric value in |
train | An optional data frame, matrix, or sparse matrix containing the
original training data. This may be required depending on the class of
|
cats | Character string indicating which columns of |
check.class | Logical indicating whether or not to make sure each column
in |
progress | Character string giving the name of the progress bar to use
while constructing the partial dependence function. See
|
parallel | Logical indicating whether or not to run |
paropts | List containing additional options to be passed onto
|
By default, partial
returns an object of class
c("data.frame", "partial")
. If ice = TRUE
and
center = FALSE
then an object of class c("data.frame", "ice")
is returned. If ice = TRUE
and center = TRUE
then an object of
class c("data.frame", "cice")
is returned. These three classes
determine the behavior of the plotPartial
function which is
automatically called whenever plot = TRUE
. Specifically, when
plot = TRUE
, a "trellis"
object is returned (see
lattice
for details); the "trellis"
object will
also include an additional attribute, "partial.data"
, containing the
data displayed in the plot.
In some cases it is difficult for partial
to extract the original
training data from object
. In these cases an error message is
displayed requesting the user to supply the training data via the
train
argument in the call to partial
. In most cases where
partial
can extract the required training data from object
,
it is taken from the same environment in which partial
is called.
Therefore, it is important to not change the training data used to construct
object
before calling partial
. This problem is completely
avoided when the training data are passed to the train
argument in the
call to partial
.
It is recommended to call partial
with plot = FALSE
and store
the results. This allows for more flexible plotting, and the user will not
have to waste time calling partial
again if the default plot is not
sufficient.
It is possible to retrieve the last printed "trellis"
object, such as
those produced by plotPartial
, using trellis.last.object()
.
If ice = TRUE
or the prediction function given to pred.fun
returns a prediction for each observation in newdata
, then the result
will be a curve for each observation. These are called individual conditional
expectation (ICE) curves; see Goldstein et al. (2015) and
ice
for details.
J. H. Friedman. Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29: 1189-1232, 2001.
Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2014) Journal of Computational and Graphical Statistics, 24(1): 44-65, 2015.
if (FALSE) { # # Regression example (requires randomForest package to run) # # Fit a random forest to the boston housing data library(randomForest) data (boston) # load the boston housing data set.seed(101) # for reproducibility boston.rf <- randomForest(cmedv ~ ., data = boston) # Using randomForest's partialPlot function partialPlot(boston.rf, pred.data = boston, x.var = "lstat") # Using pdp's partial function head(partial(boston.rf, pred.var = "lstat")) # returns a data frame partial(boston.rf, pred.var = "lstat", plot = TRUE, rug = TRUE) # The partial function allows for multiple predictors partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40, plot = TRUE, chull = TRUE, progress = "text") # The plotPartial function offers more flexible plotting pd <- partial(boston.rf, pred.var = c("lstat", "rm"), grid.resolution = 40) plotPartial(pd, levelplot = FALSE, zlab = "cmedv", drape = TRUE, colorkey = FALSE, screen = list(z = -20, x = -60)) # The autplot function can be used to produce graphics based on ggplot2 library(ggplot2) autoplot(pd, contour = TRUE, legend.title = "Partial\ndependence") # # Individual conditional expectation (ICE) curves # # Use partial to obtain ICE/c-ICE curves rm.ice <- partial(boston.rf, pred.var = "rm", ice = TRUE) plotPartial(rm.ice, rug = TRUE, train = boston, alpha = 0.2) autoplot(rm.ice, center = TRUE, alpha = 0.2, rug = TRUE, train = boston) # # Classification example (requires randomForest package to run) # # Fit a random forest to the Pima Indians diabetes data data (pima) # load the boston housing data set.seed(102) # for reproducibility pima.rf <- randomForest(diabetes ~ ., data = pima, na.action = na.omit) # Partial dependence of positive test result on glucose (default logit scale) partial(pima.rf, pred.var = "glucose", plot = TRUE, chull = TRUE, progress = "text") # Partial dependence of positive test result on glucose (probability scale) partial(pima.rf, pred.var = "glucose", prob = TRUE, plot = TRUE, chull = TRUE, progress = "text") }