This function is somewhat similar to tapply, but is designed for use in conjunction with id. It is simpler in that it only accepts a single grouping vector (use id if you have more) and uses vapply internally, using the .default value as the template.

vaggregate(.value, .group, .fun, ..., .default = NULL, .n = nlevels(.group))

Arguments

.value

vector of values to aggregate

.group

grouping vector

.fun

aggregation function

...

other arguments passed on to .fun

.default

default value used for missing groups. This argument is also used as the template for function output.

.n

total number of groups

Details

vaggregate should be faster than tapply in most situations because it avoids making a copy of the data.

Examples

# Some examples of use borrowed from ?tapply n <- 17; fac <- factor(rep(1:3, length.out = n), levels = 1:5) table(fac)
#> fac #> 1 2 3 4 5 #> 6 6 5 0 0
vaggregate(1:n, fac, sum)
#> [1] 51 57 45 0 0
vaggregate(1:n, fac, sum, .default = NA_integer_)
#> [1] 51 57 45 NA NA
vaggregate(1:n, fac, range)
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#> [,1] [,2] [,3] [,4] [,5] #> [1,] 1 2 3 Inf Inf #> [2,] 16 17 15 -Inf -Inf
vaggregate(1:n, fac, range, .default = c(NA, NA) + 0)
#> [,1] [,2] [,3] [,4] [,5] #> [1,] 1 2 3 NA NA #> [2,] 16 17 15 NA NA
vaggregate(1:n, fac, quantile)
#> [,1] [,2] [,3] [,4] [,5] #> 0% 1.00 2.00 3 NA NA #> 25% 4.75 5.75 6 NA NA #> 50% 8.50 9.50 9 NA NA #> 75% 12.25 13.25 12 NA NA #> 100% 16.00 17.00 15 NA NA
# Unlike tapply, vaggregate does not support multi-d output: tapply(warpbreaks$breaks, warpbreaks[,-1], sum)
#> tension #> wool L M H #> A 401 216 221 #> B 254 259 169
vaggregate(warpbreaks$breaks, id(warpbreaks[,-1]), sum)
#> [1] 401 216 221 254 259 169
# But it is about 10x faster x <- rnorm(1e6) y1 <- sample.int(10, 1e6, replace = TRUE) system.time(tapply(x, y1, mean))
#> user system elapsed #> 0.037 0.004 0.042
system.time(vaggregate(x, y1, mean))
#> user system elapsed #> 0.014 0.000 0.014