The scoped variants of summarise() make it easy to apply the same
transformation to multiple variables.
There are three variants.
summarise_all() affects every variable
summarise_at() affects variables selected with a character vector or
vars()
summarise_if() affects variables selected with a predicate function
summarise_all(.tbl, .funs, ...) summarise_if(.tbl, .predicate, .funs, ...) summarise_at(.tbl, .vars, .funs, ..., .cols = NULL) summarize_all(.tbl, .funs, ...) summarize_if(.tbl, .predicate, .funs, ...) summarize_at(.tbl, .vars, .funs, ..., .cols = NULL)
| .tbl | A  | 
|---|---|
| .funs | A function  | 
| ... | Additional arguments for the function calls in
 | 
| .predicate | A predicate function to be applied to the columns
or a logical vector. The variables for which  | 
| .vars | A list of columns generated by  | 
| .cols | This argument has been renamed to  | 
A data frame. By default, the newly created columns have the shortest names needed to uniquely identify the output. To force inclusion of a name, even when not needed, name the input (see examples for details).
If applied on a grouped tibble, these operations are not applied
to the grouping variables. The behaviour depends on whether the
selection is implicit (all and if selections) or
explicit (at selections).
Grouping variables covered by explicit selections in
summarise_at() are always an error. Add -group_cols() to the
vars() selection to avoid this:
data %>% summarise_at(vars(-group_cols(), ...), myoperation)
Or remove group_vars() from the character vector of column names:
nms <- setdiff(nms, group_vars(data)) data %>% summarise_at(vars, myoperation)
Grouping variables covered by implicit selections are silently
ignored by summarise_all() and summarise_if().
The names of the created columns is derived from the names of the input variables and the names of the functions.
if there is only one unnamed function, the names of the input variables are used to name the created columns
if there is only one unnamed variable, the names of the functions are used to name the created columns.
otherwise in the most general case, the created names are created by concatenating the names of the input variables and the names of the functions.
The names of the functions here means the names of the list of functions that is supplied. When needed and not supplied, the name of a function is the prefix "fn" followed by the index of this function within the unnamed functions in the list. Ultimately, names are made unique.
by_species <- iris %>% group_by(Species) # The _at() variants directly support strings: starwars %>% summarise_at(c("height", "mass"), mean, na.rm = TRUE)#> # A tibble: 1 x 2 #> height mass #> <dbl> <dbl> #> 1 174. 97.3# You can also supply selection helpers to _at() functions but you have # to quote them with vars(): starwars %>% summarise_at(vars(height:mass), mean, na.rm = TRUE)#> # A tibble: 1 x 2 #> height mass #> <dbl> <dbl> #> 1 174. 97.3# The _if() variants apply a predicate function (a function that # returns TRUE or FALSE) to determine the relevant subset of # columns. Here we apply mean() to the numeric columns: starwars %>% summarise_if(is.numeric, mean, na.rm = TRUE)#> # A tibble: 1 x 3 #> height mass birth_year #> <dbl> <dbl> <dbl> #> 1 174. 97.3 87.6# If you want to apply multiple transformations, pass a list of # functions. When there are multiple functions, they create new # variables instead of modifying the variables in place: by_species %>% summarise_all(list(min, max))#> # A tibble: 3 x 9 #> Species Sepal.Length_fn1 Sepal.Width_fn1 Petal.Length_fn1 Petal.Width_fn1 #> <fct> <dbl> <dbl> <dbl> <dbl> #> 1 setosa 4.3 2.3 1 0.1 #> 2 versic… 4.9 2 3 1 #> 3 virgin… 4.9 2.2 4.5 1.4 #> # … with 4 more variables: Sepal.Length_fn2 <dbl>, Sepal.Width_fn2 <dbl>, #> # Petal.Length_fn2 <dbl>, Petal.Width_fn2 <dbl># Note how the new variables include the function name, in order to # keep things distinct. Passing purrr-style lambdas often creates # better default names: by_species %>% summarise_all(list(~min(.), ~max(.)))#> # A tibble: 3 x 9 #> Species Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min #> <fct> <dbl> <dbl> <dbl> <dbl> #> 1 setosa 4.3 2.3 1 0.1 #> 2 versic… 4.9 2 3 1 #> 3 virgin… 4.9 2.2 4.5 1.4 #> # … with 4 more variables: Sepal.Length_max <dbl>, Sepal.Width_max <dbl>, #> # Petal.Length_max <dbl>, Petal.Width_max <dbl># When that's not good enough, you can also supply the names explicitly: by_species %>% summarise_all(list(min = min, max = max))#> # A tibble: 3 x 9 #> Species Sepal.Length_min Sepal.Width_min Petal.Length_min Petal.Width_min #> <fct> <dbl> <dbl> <dbl> <dbl> #> 1 setosa 4.3 2.3 1 0.1 #> 2 versic… 4.9 2 3 1 #> 3 virgin… 4.9 2.2 4.5 1.4 #> # … with 4 more variables: Sepal.Length_max <dbl>, Sepal.Width_max <dbl>, #> # Petal.Length_max <dbl>, Petal.Width_max <dbl># When there's only one function in the list, it modifies existing # variables in place. Give it a name to create new variables instead: by_species %>% summarise_all(list(med = median))#> # A tibble: 3 x 5 #> Species Sepal.Length_med Sepal.Width_med Petal.Length_med Petal.Width_med #> <fct> <dbl> <dbl> <dbl> <dbl> #> 1 setosa 5 3.4 1.5 0.2 #> 2 versicolor 5.9 2.8 4.35 1.3 #> 3 virginica 6.5 3 5.55 2#> # A tibble: 3 x 5 #> Species Sepal.Length_Q3 Sepal.Width_Q3 Petal.Length_Q3 Petal.Width_Q3 #> <fct> <dbl> <dbl> <dbl> <dbl> #> 1 setosa 5.2 3.68 1.58 0.3 #> 2 versicolor 6.3 3 4.6 1.5 #> 3 virginica 6.9 3.18 5.88 2.3