A simulated data set containing information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt.

Default

Format

A data frame with 10000 observations on the following 4 variables.

default

A factor with levels No and Yes indicating whether the customer defaulted on their debt

student

A factor with levels No and Yes indicating whether the customer is a student

balance

The average balance that the customer has remaining on their credit card after making their monthly payment

income

Income of customer

Source

Simulated data

References

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, www.StatLearning.com, Springer-Verlag, New York

Examples

summary(Default)
#> default student balance income #> No :9667 No :7056 Min. : 0.0 Min. : 772 #> Yes: 333 Yes:2944 1st Qu.: 481.7 1st Qu.:21340 #> Median : 823.6 Median :34553 #> Mean : 835.4 Mean :33517 #> 3rd Qu.:1166.3 3rd Qu.:43808 #> Max. :2654.3 Max. :73554
glm(default~student+balance+income,family="binomial",data=Default)
#> #> Call: glm(formula = default ~ student + balance + income, family = "binomial", #> data = Default) #> #> Coefficients: #> (Intercept) studentYes balance income #> -1.087e+01 -6.468e-01 5.737e-03 3.033e-06 #> #> Degrees of Freedom: 9999 Total (i.e. Null); 9996 Residual #> Null Deviance: 2921 #> Residual Deviance: 1572 AIC: 1580