cppls.fit.Rd
Fits a PLS model using the CPPLS algorithm.
cppls.fit(X, Y, ncomp, Y.add = NULL, center = TRUE, stripped = FALSE, lower = 0.5, upper = 0.5, trunc.pow = FALSE, weights = NULL, ...)
X | a matrix of observations. |
---|---|
Y | a vector or matrix of responses. |
ncomp | the number of components to be used in the modelling. |
Y.add | a vector or matrix of additional responses containing relevant information about the observations. |
center | logical, determines if the \(X\) and \(Y\) matrices are mean centered or not. Default is to perform mean centering. |
stripped | logical. If |
lower | a vector of lower limits for power optimisation. Defaults to |
upper | a vector of upper limits for power optimisation. Defaults to |
trunc.pow | logical. If |
weights | a vector of individual weights for the observations. (Optional) |
... | other arguments. Currently ignored. |
This function should not be called directly, but through
the generic functions cppls
or mvr
with the argument
method="cppls"
. Canonical Powered PLS (CPPLS)
is a generalisation of PLS incorporating discrete and continuous
responses (also simultaneously), additional responses, individual
weighting of observations and power methodology for sharpening
focus on groups of variables. Depending on the input to cppls
it
can produce the following special cases:
PLS: uni-response continuous Y
PPLS: uni-response continuous Y
, (lower || upper) != 0.5
PLS-DA (using correlation maximisation - B/W): dummy-coded descrete response Y
PPLS-DA: dummy-coded descrete response Y
, (lower || upper) != 0.5
CPLS: multi-response Y
(continuous, discrete or combination)
CPPLS: multi-response Y
(continuous, discrete or combination), (lower || upper) != 0.5
The name "canonical" comes from canonical correlation analysis which is used when calculating vectors of loading weights, while "powered" refers to a reparameterisation of the vectors of loading weights which can be optimised over a given interval.
A list containing the following components is returned:
an array of regression coefficients for 1, ...,
ncomp
components. The dimensions of coefficients
are
c(nvar, npred, ncomp)
with nvar
the number
of X
variables and npred
the number of variables to be
predicted in Y
.
a matrix of scores.
a matrix of loadings.
a matrix of loading weights.
a matrix of Y-scores.
a matrix of Y-loadings.
the projection matrix used to convert X to scores.
a vector of means of the X variables.
a vector of means of the Y variables.
an array of fitted values. The dimensions of
fitted.values
are c(nobj, npred, ncomp)
with
nobj
the number samples and npred
the number of
Y variables.
an array of regression residuals. It has the same
dimensions as fitted.values
.
a vector with the amount of X-variance explained by each component.
total variance in X
.
gamma-values obtained in power optimisation.
Canonical correlation values from the calculations of loading weights.
matrix containing vectors of weights a
from
canonical correlation (cor(Za,Yb)
).
vector of indices of explanatory variables of length close to or equal to 0.
Indahl, U. (2005) A twist to partial least squares regression. Journal of Chemometrics, 19, 32--44.
Liland, K.H and Indahl, U.G (2009) Powered partial least squares discriminant analysis, Journal of Chemometrics, 23, 7--18.
Indahl, U.G., Liland, K.H. and Næs, T. (2009) Canonical partial least squares - a unified PLS approach to classification and regression problems. Journal of Chemometrics, 23, 495--504.
data(mayonnaise) # Create dummy response mayonnaise$dummy <- I(model.matrix(~y-1, data.frame(y = factor(mayonnaise$oil.type)))) # Predict CPLS scores for test data may.cpls <- cppls(dummy ~ NIR, 10, data = mayonnaise, subset = train) may.test <- predict(may.cpls, newdata = mayonnaise[!mayonnaise$train,], type = "score") # Predict CPLS scores for test data (experimental used design as additional Y information) may.cpls.yadd <- cppls(dummy ~ NIR, 10, data = mayonnaise, subset = train, Y.add=design) may.test.yadd <- predict(may.cpls.yadd, newdata = mayonnaise[!mayonnaise$train,], type = "score") # Classification by linear discriminant analysis (LDA) library(MASS) error <- matrix(ncol = 10, nrow = 2) dimnames(error) <- list(Model = c('CPLS', 'CPLS (Y.add)'), ncomp = 1:10) for (i in 1:10) { fitdata1 <- data.frame(oil.type = mayonnaise$oil.type[mayonnaise$train], NIR.score = I(may.cpls$scores[,1:i,drop=FALSE])) testdata1 <- data.frame(oil.type = mayonnaise$oil.type[!mayonnaise$train], NIR.score = I(may.test[,1:i,drop=FALSE])) error[1,i] <- (42 - sum(predict(lda(oil.type ~ NIR.score, data = fitdata1), newdata = testdata1)$class == testdata1$oil.type)) / 42 fitdata2 <- data.frame(oil.type = mayonnaise$oil.type[mayonnaise$train], NIR.score = I(may.cpls.yadd$scores[,1:i,drop=FALSE])) testdata2 <- data.frame(oil.type = mayonnaise$oil.type[!mayonnaise$train], NIR.score = I(may.test.yadd[,1:i,drop=FALSE])) error[2,i] <- (42 - sum(predict(lda(oil.type ~ NIR.score, data = fitdata2), newdata = testdata2)$class == testdata2$oil.type)) / 42 } round(error,2)#> ncomp #> Model 1 2 3 4 5 6 7 8 9 10 #> CPLS 0.29 0.29 0.19 0.17 0.24 0.14 0.05 0.02 0 0 #> CPLS (Y.add) 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0 0