Wage and other data for a group of 3000 male workers in the Mid-Atlantic region.

Wage

Format

A data frame with 3000 observations on the following 11 variables.

year

Year that wage information was recorded

age

Age of worker

maritl

A factor with levels 1. Never Married 2. Married 3. Widowed 4. Divorced and 5. Separated indicating marital status

race

A factor with levels 1. White 2. Black 3. Asian and 4. Other indicating race

education

A factor with levels 1. < HS Grad 2. HS Grad 3. Some College 4. College Grad and 5. Advanced Degree indicating education level

region

Region of the country (mid-atlantic only)

jobclass

A factor with levels 1. Industrial and 2. Information indicating type of job

health

A factor with levels 1. <=Good and 2. >=Very Good indicating health level of worker

health_ins

A factor with levels 1. Yes and 2. No indicating whether worker has health insurance

logwage

Log of workers wage

wage

Workers raw wage

Source

Data was manually assembled by Steve Miller, of Open BI (www.openbi.com), from the March 2011 Supplement to Current Population Survey data.

http://thedataweb.rm.census.gov/TheDataWeb

References

James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013) An Introduction to Statistical Learning with applications in R, www.StatLearning.com, Springer-Verlag, New York

Examples

summary(Wage)
#> year age maritl race #> Min. :2003 Min. :18.00 1. Never Married: 648 1. White:2480 #> 1st Qu.:2004 1st Qu.:33.75 2. Married :2074 2. Black: 293 #> Median :2006 Median :42.00 3. Widowed : 19 3. Asian: 190 #> Mean :2006 Mean :42.41 4. Divorced : 204 4. Other: 37 #> 3rd Qu.:2008 3rd Qu.:51.00 5. Separated : 55 #> Max. :2009 Max. :80.00 #> #> education region jobclass #> 1. < HS Grad :268 2. Middle Atlantic :3000 1. Industrial :1544 #> 2. HS Grad :971 1. New England : 0 2. Information:1456 #> 3. Some College :650 3. East North Central: 0 #> 4. College Grad :685 4. West North Central: 0 #> 5. Advanced Degree:426 5. South Atlantic : 0 #> 6. East South Central: 0 #> (Other) : 0 #> health health_ins logwage wage #> 1. <=Good : 858 1. Yes:2083 Min. :3.000 Min. : 20.09 #> 2. >=Very Good:2142 2. No : 917 1st Qu.:4.447 1st Qu.: 85.38 #> Median :4.653 Median :104.92 #> Mean :4.654 Mean :111.70 #> 3rd Qu.:4.857 3rd Qu.:128.68 #> Max. :5.763 Max. :318.34 #>
lm(wage~year+age,data=Wage)
#> #> Call: #> lm(formula = wage ~ year + age, data = Wage) #> #> Coefficients: #> (Intercept) year age #> -2318.5309 1.1968 0.6992 #>
## maybe str(Wage) ; plot(Wage) ...