Create or transform variables

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. Both functions preserve the number of rows of the input. New variables overwrite existing variables of the same name.

mutate(.data, ...)

transmute(.data, ...)

Arguments

.data

.data	A tbl. All main verbs are S3 generics and provide methods for `tbl_df()`, `dtplyr::tbl_dt()` and `dbplyr::tbl_dbi()`.
...	Name-value pairs of expressions, each with length 1 or the same length as the number of rows in the group (if using `group_by()`) or in the entire input (if not using groups). The name of each argument will be the name of a new variable, and the value will be its corresponding value. Use a `NULL` value in `mutate` to drop a variable. New variables overwrite existing variables of the same name. The arguments in `...` are automatically quoted and evaluated in the context of the data frame. They support unquoting and splicing. See `vignette("programming")` for an introduction to these concepts.

A tbl. All main verbs are S3 generics and provide methods for tbl_df(), dtplyr::tbl_dt() and dbplyr::tbl_dbi().

...

Name-value pairs of expressions, each with length 1 or the same length as the number of rows in the group (if using group_by()) or in the entire input (if not using groups). The name of each argument will be the name of a new variable, and the value will be its corresponding value. Use a NULL value in mutate to drop a variable. New variables overwrite existing variables of the same name.

The arguments in ... are automatically quoted and evaluated in the context of the data frame. They support unquoting and splicing. See vignette("programming") for an introduction to these concepts.

Value

An object of the same class as .data.

Useful functions available in calculations of variables

+, -, log(), etc., for their usual mathematical meanings
lead(), lag()
dense_rank(), min_rank(), percent_rank(), row_number(), cume_dist(), ntile()
cumsum(), cummean(), cummin(), cummax(), cumany(), cumall()
na_if(), coalesce()
if_else(), recode(), case_when()

Grouped tibbles

Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:

starwars %>%
  mutate(mass / mean(mass, na.rm = TRUE)) %>%
  pull()

With the grouped equivalent:

starwars %>%
  group_by(gender) %>%
  mutate(mass / mean(mass, na.rm = TRUE)) %>%
  pull()

The former normalises mass by the global average whereas the latter normalises by the averages within gender levels.

Note that you can't overwrite a grouping variable within mutate().

mutate() does not evaluate the expressions when the group is empty.

Scoped mutation and transmutation

The three scoped variants of mutate() (mutate_all(), mutate_if() and mutate_at()) and the three variants of transmute() (transmute_all(), transmute_if(), transmute_at()) make it easy to apply a transformation to a selection of variables.

Tidy data

When applied to a data frame, row names are silently dropped. To preserve, convert to an explicit variable with tibble::rownames_to_column().

Examples

# Newly created variables are available immediately
mtcars %>% as_tibble() %>% mutate(
  cyl2 = cyl * 2,
  cyl4 = cyl2 * 2
)
#> # A tibble: 32 x 13
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb  cyl2  cyl4
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4    12    24
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4    12    24
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     8    16
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1    12    24
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2    16    32
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1    12    24
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4    16    32
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2     8    16
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2     8    16
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4    12    24
#> # … with 22 more rows

# You can also use mutate() to remove variables and
# modify existing variables
mtcars %>% as_tibble() %>% mutate(
  mpg = NULL,
  disp = disp * 0.0163871 # convert to litres
)
#> # A tibble: 32 x 10
#>      cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1     6  2.62   110  3.9   2.62  16.5     0     1     4     4
#>  2     6  2.62   110  3.9   2.88  17.0     0     1     4     4
#>  3     4  1.77    93  3.85  2.32  18.6     1     1     4     1
#>  4     6  4.23   110  3.08  3.22  19.4     1     0     3     1
#>  5     8  5.90   175  3.15  3.44  17.0     0     0     3     2
#>  6     6  3.69   105  2.76  3.46  20.2     1     0     3     1
#>  7     8  5.90   245  3.21  3.57  15.8     0     0     3     4
#>  8     4  2.40    62  3.69  3.19  20       1     0     4     2
#>  9     4  2.31    95  3.92  3.15  22.9     1     0     4     2
#> 10     6  2.75   123  3.92  3.44  18.3     1     0     4     4
#> # … with 22 more rows


# window functions are useful for grouped mutates
mtcars %>%
 group_by(cyl) %>%
 mutate(rank = min_rank(desc(mpg)))
#> # A tibble: 32 x 12
#> # Groups:   cyl [3]
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb  rank
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#>  1  21       6  160    110  3.9   2.62  16.5     0     1     4     4     2
#>  2  21       6  160    110  3.9   2.88  17.0     0     1     4     4     2
#>  3  22.8     4  108     93  3.85  2.32  18.6     1     1     4     1     8
#>  4  21.4     6  258    110  3.08  3.22  19.4     1     0     3     1     1
#>  5  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2     2
#>  6  18.1     6  225    105  2.76  3.46  20.2     1     0     3     1     6
#>  7  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4    11
#>  8  24.4     4  147.    62  3.69  3.19  20       1     0     4     2     7
#>  9  22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2     8
#> 10  19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4     5
#> # … with 22 more rows
# see `vignette("window-functions")` for more details

# You can drop variables by setting them to NULL
mtcars %>% mutate(cyl = NULL)
#>     mpg  disp  hp drat    wt  qsec vs am gear carb
#> 1  21.0 160.0 110 3.90 2.620 16.46  0  1    4    4
#> 2  21.0 160.0 110 3.90 2.875 17.02  0  1    4    4
#> 3  22.8 108.0  93 3.85 2.320 18.61  1  1    4    1
#> 4  21.4 258.0 110 3.08 3.215 19.44  1  0    3    1
#> 5  18.7 360.0 175 3.15 3.440 17.02  0  0    3    2
#> 6  18.1 225.0 105 2.76 3.460 20.22  1  0    3    1
#> 7  14.3 360.0 245 3.21 3.570 15.84  0  0    3    4
#> 8  24.4 146.7  62 3.69 3.190 20.00  1  0    4    2
#> 9  22.8 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 10 19.2 167.6 123 3.92 3.440 18.30  1  0    4    4
#> 11 17.8 167.6 123 3.92 3.440 18.90  1  0    4    4
#> 12 16.4 275.8 180 3.07 4.070 17.40  0  0    3    3
#> 13 17.3 275.8 180 3.07 3.730 17.60  0  0    3    3
#> 14 15.2 275.8 180 3.07 3.780 18.00  0  0    3    3
#> 15 10.4 472.0 205 2.93 5.250 17.98  0  0    3    4
#> 16 10.4 460.0 215 3.00 5.424 17.82  0  0    3    4
#> 17 14.7 440.0 230 3.23 5.345 17.42  0  0    3    4
#> 18 32.4  78.7  66 4.08 2.200 19.47  1  1    4    1
#> 19 30.4  75.7  52 4.93 1.615 18.52  1  1    4    2
#> 20 33.9  71.1  65 4.22 1.835 19.90  1  1    4    1
#> 21 21.5 120.1  97 3.70 2.465 20.01  1  0    3    1
#> 22 15.5 318.0 150 2.76 3.520 16.87  0  0    3    2
#> 23 15.2 304.0 150 3.15 3.435 17.30  0  0    3    2
#> 24 13.3 350.0 245 3.73 3.840 15.41  0  0    3    4
#> 25 19.2 400.0 175 3.08 3.845 17.05  0  0    3    2
#> 26 27.3  79.0  66 4.08 1.935 18.90  1  1    4    1
#> 27 26.0 120.3  91 4.43 2.140 16.70  0  1    5    2
#> 28 30.4  95.1 113 3.77 1.513 16.90  1  1    5    2
#> 29 15.8 351.0 264 4.22 3.170 14.50  0  1    5    4
#> 30 19.7 145.0 175 3.62 2.770 15.50  0  1    5    6
#> 31 15.0 301.0 335 3.54 3.570 14.60  0  1    5    8
#> 32 21.4 121.0 109 4.11 2.780 18.60  1  1    4    2

# mutate() vs transmute --------------------------
# mutate() keeps all existing variables
mtcars %>%
  mutate(displ_l = disp / 61.0237)
#>     mpg cyl  disp  hp drat    wt  qsec vs am gear carb  displ_l
#> 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4 2.621932
#> 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4 2.621932
#> 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1 1.769804
#> 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 4.227866
#> 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2 5.899347
#> 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1 3.687092
#> 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4 5.899347
#> 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2 2.403984
#> 9  22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2 2.307300
#> 10 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 2.746474
#> 11 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4 2.746474
#> 12 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3 4.519556
#> 13 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3 4.519556
#> 14 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3 4.519556
#> 15 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4 7.734700
#> 16 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 7.538055
#> 17 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4 7.210313
#> 18 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1 1.289663
#> 19 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2 1.240502
#> 20 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1 1.165121
#> 21 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1 1.968088
#> 22 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2 5.211090
#> 23 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2 4.981671
#> 24 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4 5.735477
#> 25 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2 6.554830
#> 26 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1 1.294579
#> 27 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2 1.971365
#> 28 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 1.558411
#> 29 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4 5.751864
#> 30 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6 2.376126
#> 31 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8 4.932510
#> 32 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2 1.982836

# transmute keeps only the variables you create
mtcars %>%
  transmute(displ_l = disp / 61.0237)
#>     displ_l
#> 1  2.621932
#> 2  2.621932
#> 3  1.769804
#> 4  4.227866
#> 5  5.899347
#> 6  3.687092
#> 7  5.899347
#> 8  2.403984
#> 9  2.307300
#> 10 2.746474
#> 11 2.746474
#> 12 4.519556
#> 13 4.519556
#> 14 4.519556
#> 15 7.734700
#> 16 7.538055
#> 17 7.210313
#> 18 1.289663
#> 19 1.240502
#> 20 1.165121
#> 21 1.968088
#> 22 5.211090
#> 23 4.981671
#> 24 5.735477
#> 25 6.554830
#> 26 1.294579
#> 27 1.971365
#> 28 1.558411
#> 29 5.751864
#> 30 2.376126
#> 31 4.932510
#> 32 1.982836


# The mutate operation may yield different results on grouped
# tibbles because the expressions are computed within groups.
# The following normalises `mass` by the global average:
starwars %>%
  mutate(mass / mean(mass, na.rm = TRUE)) %>%
  pull()
#>  [1]  0.7912704  0.7707179  0.3288397  1.3975685  0.5035357  1.2331487
#>  [7]  0.7707179  0.3288397  0.8632041  0.7912704  0.8632041         NA
#> [13]  1.1509388  0.8220991  0.7604417 13.9551329  0.7912704  1.1303863
#> [19]  0.1746961  0.7707179  0.8036019  1.4386735  1.1612150  0.8118229
#> [25]  0.8118229  0.8529279         NA         NA  0.2055248  0.6987843
#> [31]  0.9145853  0.9248615         NA  0.6782318  0.8426516         NA
#> [37]         NA         NA  0.4110496         NA         NA  0.8220991
#> [43]         NA  0.5651932  0.4624308         NA  0.6679556  0.8632041
#> [49]  0.8426516  0.8940328         NA  0.5138120         NA         NA
#> [55]  0.8220991         NA  0.8734803         NA         NA  0.8220991
#> [61]  0.5775246  0.5138120         NA  0.8220991         NA  0.8118229
#> [67]  0.5651932  1.0481764  0.9043091         NA         NA  0.1541436
#> [73]         NA  0.4932595         NA  0.5857456  1.6339220  1.3975685
#> [79]  0.8118229  0.4932595  0.8220991         NA         NA         NA
#> [85]         NA         NA  0.4624308

# Whereas this normalises `mass` by the averages within gender
# levels:
starwars %>%
  group_by(gender) %>%
  mutate(mass / mean(mass, na.rm = TRUE)) %>%
  pull()
#>  [1] 0.9505639 1.6187050 0.6906475 1.6789181 0.9070715 1.4813984 1.3883747
#>  [8] 0.6906475 1.0369788 0.9505639 1.0369788        NA 1.3826385 0.9875989
#> [15] 0.9135290 1.0000000 0.9505639 1.3579485 0.2098648 0.9258740 0.9653779
#> [22] 1.0000000 1.3949834 0.9752539 0.9752539 1.0246339        NA        NA
#> [29] 0.2468997 0.8394591 1.0987038 1.1110488        NA 0.8147691 1.0122889
#> [36]        NA        NA        NA 0.4937995        NA        NA 0.9875989
#> [43]        NA 1.0181414 0.5555244        NA 0.8024241 1.0369788 1.0122889
#> [50] 1.0740138        NA 0.9255831        NA        NA 0.9875989        NA
#> [57] 1.0493238        NA        NA 0.9875989 1.0403554 0.9255831        NA
#> [64] 0.9875989        NA 0.9752539 1.0181414 1.2591886 1.0863588        NA
#> [71]        NA 0.1851748        NA 0.5925593        NA 1.0551648 1.9628528
#> [78] 1.6789181 0.9752539 0.8885598 0.9875989        NA        NA        NA
#> [85]        NA        NA 0.8330248

# Note that you can't overwrite grouping variables:
gdf <- mtcars %>% group_by(cyl)
try(mutate(gdf, cyl = cyl * 100))
#> Error : Column `cyl` can't be modified because it's a grouping variable


# Refer to column names stored as strings with the `.data` pronoun:
vars <- c("mass", "height")
mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]])
#> # A tibble: 87 x 14
#>    name  height  mass hair_color skin_color eye_color birth_year gender
#>    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
#>  1 Luke…    172    77 blond      fair       blue            19   male  
#>  2 C-3PO    167    75 <NA>       gold       yellow         112   <NA>  
#>  3 R2-D2     96    32 <NA>       white, bl… red             33   <NA>  
#>  4 Dart…    202   136 none       white      yellow          41.9 male  
#>  5 Leia…    150    49 brown      light      brown           19   female
#>  6 Owen…    178   120 brown, gr… light      blue            52   male  
#>  7 Beru…    165    75 brown      light      blue            47   female
#>  8 R5-D4     97    32 <NA>       white, red red             NA   <NA>  
#>  9 Bigg…    183    84 black      light      brown           24   male  
#> 10 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  
#> # … with 77 more rows, and 6 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>, prod <dbl>

# For more complex cases, knowledge of tidy evaluation and the
# unquote operator `!!` is required. See https://tidyeval.tidyverse.org/
#
# One useful and simple tidy eval technique is to use `!!` to
# bypass the data frame and its columns. Here is how to divide the
# column `mass` by an object of the same name:
mass <- 100
mutate(starwars, mass = mass / !!mass)
#> # A tibble: 87 x 13
#>    name  height  mass hair_color skin_color eye_color birth_year gender
#>    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
#>  1 Luke…    172  0.77 blond      fair       blue            19   male  
#>  2 C-3PO    167  0.75 <NA>       gold       yellow         112   <NA>  
#>  3 R2-D2     96  0.32 <NA>       white, bl… red             33   <NA>  
#>  4 Dart…    202  1.36 none       white      yellow          41.9 male  
#>  5 Leia…    150  0.49 brown      light      brown           19   female
#>  6 Owen…    178  1.2  brown, gr… light      blue            52   male  
#>  7 Beru…    165  0.75 brown      light      blue            47   female
#>  8 R5-D4     97  0.32 <NA>       white, red red             NA   <NA>  
#>  9 Bigg…    183  0.84 black      light      brown           24   male  
#> 10 Obi-…    182  0.77 auburn, w… fair       blue-gray       57   male  
#> # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>,
#> #   films <list>, vehicles <list>, starships <list>