Naive Bayes Classifier

Computes the conditional a-posterior probabilities of a categorical class variable given independent predictor variables using the Bayes rule.

# S3 method for formula
naiveBayes(formula, data, laplace = 0, ..., subset, na.action = na.pass)
# S3 method for default
naiveBayes(x, y, laplace = 0, ...)


# S3 method for naiveBayes
predict(object, newdata,
  type = c("class", "raw"), threshold = 0.001, eps = 0, ...)

Arguments

x	A numeric matrix, or a data frame of categorical and/or numeric variables.
y	Class vector.
formula	A formula of the form `class ~ x1 + x2 + ...`. Interactions are not allowed.
data	Either a data frame of predictors (categorical and/or numeric) or a contingency table.
laplace	positive double controlling Laplace smoothing. The default (0) disables Laplace smoothing.
...	Currently not used.
subset	For data given in a data frame, an index vector specifying the cases to be used in the training sample. (NOTE: If given, this argument must be named.)
na.action	A function to specify the action to be taken if `NA`s are found. The default action is not to count them for the computation of the probability factors. An alternative is na.omit, which leads to rejection of cases with missing values on any required variable. (NOTE: If given, this argument must be named.)
object	An object of class `"naiveBayes"`.
newdata	A dataframe with new predictors (with possibly fewer columns than the training data). Note that the column names of `newdata` are matched against the training data ones.
type	If `"raw"`, the conditional a-posterior probabilities for each class are returned, and the class with maximal probability else.
threshold	Value replacing cells with probabilities within `eps` range.
eps	double for specifying an epsilon-range to apply laplace smoothing (to replace zero or close-zero probabilities by `theshold`.)

Value

An object of class "naiveBayes" including components:

apriori

Class distribution for the dependent variable.

tables

A list of tables, one for each predictor variable. For each categorical variable a table giving, for each attribute level, the conditional probabilities given the target class. For each numeric variable, a table giving, for each target class, mean and standard deviation of the (sub-)variable.

Details

The standard naive Bayes classifier (at least this implementation) assumes independence of the predictor variables, and Gaussian distribution (given the target class) of metric predictors. For attributes with missing values, the corresponding table entries are omitted for prediction.

Examples

## Categorical data only:
data(HouseVotes84, package = "mlbench")
model <- naiveBayes(Class ~ ., data = HouseVotes84)
predict(model, HouseVotes84[1:10,])
#>  [1] republican republican republican democrat   democrat   democrat  
#>  [7] republican republican republican democrat  
#> Levels: democrat republican
predict(model, HouseVotes84[1:10,], type = "raw")
#>           democrat   republican
#>  [1,] 1.029209e-07 9.999999e-01
#>  [2,] 5.820415e-08 9.999999e-01
#>  [3,] 5.684937e-03 9.943151e-01
#>  [4,] 9.985798e-01 1.420152e-03
#>  [5,] 9.666720e-01 3.332802e-02
#>  [6,] 8.121430e-01 1.878570e-01
#>  [7,] 1.751512e-04 9.998248e-01
#>  [8,] 8.300100e-06 9.999917e-01
#>  [9,] 8.277705e-08 9.999999e-01
#> [10,] 1.000000e+00 5.029425e-11

pred <- predict(model, HouseVotes84)
table(pred, HouseVotes84$Class)
#>             
#> pred         democrat republican
#>   democrat        238         13
#>   republican       29        155

## using laplace smoothing:
model <- naiveBayes(Class ~ ., data = HouseVotes84, laplace = 3)
pred <- predict(model, HouseVotes84[,-1])
table(pred, HouseVotes84$Class)
#>             
#> pred         democrat republican
#>   democrat        237         12
#>   republican       30        156


## Example of using a contingency table:
data(Titanic)
m <- naiveBayes(Survived ~ ., data = Titanic)
m
#> 
#> Naive Bayes Classifier for Discrete Predictors
#> 
#> Call:
#> naiveBayes.formula(formula = Survived ~ ., data = Titanic)
#> 
#> A-priori probabilities:
#> Survived
#>       No      Yes 
#> 0.676965 0.323035 
#> 
#> Conditional probabilities:
#>         Class
#> Survived        1st        2nd        3rd       Crew
#>      No  0.08187919 0.11208054 0.35436242 0.45167785
#>      Yes 0.28551336 0.16596343 0.25035162 0.29817159
#> 
#>         Sex
#> Survived       Male     Female
#>      No  0.91543624 0.08456376
#>      Yes 0.51617440 0.48382560
#> 
#>         Age
#> Survived      Child      Adult
#>      No  0.03489933 0.96510067
#>      Yes 0.08016878 0.91983122
#> 
predict(m, as.data.frame(Titanic))
#>  [1] Yes No  No  No  Yes Yes Yes Yes No  No  No  No  Yes Yes Yes Yes Yes No  No 
#> [20] No  Yes Yes Yes Yes No  No  No  No  Yes Yes Yes Yes
#> Levels: No Yes

## Example with metric predictors:
data(iris)
m <- naiveBayes(Species ~ ., data = iris)
## alternatively:
m <- naiveBayes(iris[,-5], iris[,5])
m
#> 
#> Naive Bayes Classifier for Discrete Predictors
#> 
#> Call:
#> naiveBayes.default(x = iris[, -5], y = iris[, 5])
#> 
#> A-priori probabilities:
#> iris[, 5]
#>     setosa versicolor  virginica 
#>  0.3333333  0.3333333  0.3333333 
#> 
#> Conditional probabilities:
#>             Sepal.Length
#> iris[, 5]     [,1]      [,2]
#>   setosa     5.006 0.3524897
#>   versicolor 5.936 0.5161711
#>   virginica  6.588 0.6358796
#> 
#>             Sepal.Width
#> iris[, 5]     [,1]      [,2]
#>   setosa     3.428 0.3790644
#>   versicolor 2.770 0.3137983
#>   virginica  2.974 0.3224966
#> 
#>             Petal.Length
#> iris[, 5]     [,1]      [,2]
#>   setosa     1.462 0.1736640
#>   versicolor 4.260 0.4699110
#>   virginica  5.552 0.5518947
#> 
#>             Petal.Width
#> iris[, 5]     [,1]      [,2]
#>   setosa     0.246 0.1053856
#>   versicolor 1.326 0.1977527
#>   virginica  2.026 0.2746501
#> 
table(predict(m, iris), iris[,5])
#>             
#>              setosa versicolor virginica
#>   setosa         50          0         0
#>   versicolor      0         47         3
#>   virginica       0          3        47

Arguments

Value

Details

Examples

Contents

Author