Titanic data with passenger names and other details removed.

Format

A data frame with 1046 observations on 6 variables.

pclasspassenger class, unordered factor: 1st 2nd 3rd
survivedfactor: died or survived
sexunordered factor: male female
ageage in years, min 0.167 max 80.0
sibspnumber of siblings or spouses aboard, integer: 0...8
parchnumber of parents or children aboard, integer: 0...6

Source

The dataset was compiled by Frank Harrell and Robert Dawson:
http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.html.

See also:
http://biostat.mc.vanderbilt.edu/twiki/pub/Main/DataSets/titanic3info.txt.

For this version of the Titanic data, passenger details were deleted, survived was cast as a factor, and the name changed to ptitanic to minimize confusion with other versions.

In this data the crew are conspicuous by their absence.

Contents of ptitanic:

         pclass survived    sex    age sibsp parch
    1       1st survived female 29.000     0     0
    2       1st survived   male  0.917     1     2
    3       1st     died female  2.000     1     2
    4       1st     died   male 30.000     1     2
    5       1st     died female 25.000     1     2
    ...
    1309    3rd     died   male 29.000     0     0
    

How ptitanic was built:

    load("titanic3.sav") # from Dr. Harrell's web site
    # discard name, ticket, fare, cabin, embarked, body, home.dest
    ptitanic <- titanic3[,c(1,2,4,5,6,7)]
    # change survived from integer to factor
    ptitanic$survived <- factor(ptitanic$survived, labels = c("died", "survived"))
    save(ptitanic, file = "ptitanic.rda")

This version of the data differs from etitanic in the earth package in that here survived is a factor (not an integer) and age has some NAs.

Examples

data(ptitanic) summary(ptitanic)
#> pclass survived sex age sibsp #> 1st:323 died :809 female:466 Min. : 0.1667 Min. :0.0000 #> 2nd:277 survived:500 male :843 1st Qu.:21.0000 1st Qu.:0.0000 #> 3rd:709 Median :28.0000 Median :0.0000 #> Mean :29.8811 Mean :0.4989 #> 3rd Qu.:39.0000 3rd Qu.:1.0000 #> Max. :80.0000 Max. :8.0000 #> NA's :263 #> parch #> Min. :0.000 #> 1st Qu.:0.000 #> Median :0.000 #> Mean :0.385 #> 3rd Qu.:0.000 #> Max. :9.000 #>
# survival rate was greater for females rpart.rules(rpart(survived ~ sex, data = ptitanic))
#> survived #> 0.19 when sex is male #> 0.73 when sex is female
# survival rate was greater for higher classes rpart.rules(rpart(survived ~ pclass, data = ptitanic))
#> survived #> 0.26 when pclass is 3rd #> 0.43 when pclass is 2nd #> 0.62 when pclass is 1st
# survival rate was greater for children rpart.rules(rpart(survived ~ age, data = ptitanic))
#> survived #> 0.39 when age >= 8.5 #> 0.64 when age < 8.5
# main indicator of missing data is 3rd class esp. with many children obs.with.nas <- rowSums(is.na(ptitanic)) > 0 rpart.rules(rpart(obs.with.nas ~ ., data = ptitanic, method = "class"))
#> obs.with.nas #> 0.09 when pclass is 1st or 2nd #> 0.29 when pclass is 3rd & sibsp < 7 #> 0.89 when pclass is 3rd & sibsp >= 7