For each subset of a data frame, apply function then combine results into a
data frame.
To apply a function for each row, use adply
with
.margins
set to 1
.
ddply(.data, .variables, .fun = NULL, ..., .progress = "none", .inform = FALSE, .drop = TRUE, .parallel = FALSE, .paropts = NULL)
.data | data frame to be processed |
---|---|
.variables | variables to split data frame by, as |
.fun | function to apply to each piece |
... | other arguments passed on to |
.progress | name of the progress bar to use, see
|
.inform | produce informative error messages? This is turned off by default because it substantially slows processing speed, but is very useful for debugging |
.drop | should combinations of variables that do not appear in the input data be preserved (FALSE) or dropped (TRUE, default) |
.parallel | if |
.paropts | a list of additional options passed into
the |
A data frame, as described in the output section.
This function splits data frames by variables.
The most unambiguous behaviour is achieved when .fun
returns a
data frame - in that case pieces will be combined with
rbind.fill
. If .fun
returns an atomic vector of
fixed length, it will be rbind
ed together and converted to a data
frame. Any other values will result in an error.
If there are no results, then this function will return a data
frame with zero rows and columns (data.frame()
).
Hadley Wickham (2011). The Split-Apply-Combine Strategy for Data Analysis. Journal of Statistical Software, 40(1), 1-29. http://www.jstatsoft.org/v40/i01/.
# Summarize a dataset by two variables dfx <- data.frame( group = c(rep('A', 8), rep('B', 15), rep('C', 6)), sex = sample(c("M", "F"), size = 29, replace = TRUE), age = runif(n = 29, min = 18, max = 54) ) # Note the use of the '.' function to allow # group and sex to be used without quoting ddply(dfx, .(group, sex), summarize, mean = round(mean(age), 2), sd = round(sd(age), 2))#> group sex mean sd #> 1 A F 30.68 6.45 #> 2 A M 34.77 9.58 #> 3 B F 34.62 14.69 #> 4 B M 41.26 6.29 #> 5 C F 44.87 1.68 #> 6 C M 30.64 6.75# An example using a formula for .variables ddply(baseball[1:100,], ~ year, nrow)#> year V1 #> 1 1871 7 #> 2 1872 13 #> 3 1873 13 #> 4 1874 15 #> 5 1875 17 #> 6 1876 15 #> 7 1877 17 #> 8 1878 3#> lg nrow ncol #> 1 65 22 #> 2 AA 171 22 #> 3 AL 10007 22 #> 4 FL 37 22 #> 5 NL 11378 22 #> 6 PL 32 22 #> 7 UA 9 22