mutate()
adds new variables and preserves existing ones;
transmute()
adds new variables and drops existing ones. Both
functions preserve the number of rows of the input.
New variables overwrite existing variables of the same name.
mutate(.data, ...) transmute(.data, ...)
.data | A tbl. All main verbs are S3 generics and provide methods
for |
---|---|
... | Name-value pairs of expressions, each with length 1 or the same
length as the number of rows in the group (if using The arguments in |
An object of the same class as .data
.
Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:
starwars %>% mutate(mass / mean(mass, na.rm = TRUE)) %>% pull()
With the grouped equivalent:
starwars %>% group_by(gender) %>% mutate(mass / mean(mass, na.rm = TRUE)) %>% pull()
The former normalises mass
by the global average whereas the
latter normalises by the averages within gender levels.
Note that you can't overwrite a grouping variable within
mutate()
.
mutate()
does not evaluate the expressions when the group is empty.
The three scoped variants of mutate()
(mutate_all()
,
mutate_if()
and mutate_at()
) and the three variants of
transmute()
(transmute_all()
, transmute_if()
,
transmute_at()
) make it easy to apply a transformation to a
selection of variables.
When applied to a data frame, row names are silently dropped. To preserve,
convert to an explicit variable with tibble::rownames_to_column()
.
# Newly created variables are available immediately mtcars %>% as_tibble() %>% mutate( cyl2 = cyl * 2, cyl4 = cyl2 * 2 )#> # A tibble: 32 x 13 #> mpg cyl disp hp drat wt qsec vs am gear carb cyl2 cyl4 #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 12 24 #> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 12 24 #> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 8 16 #> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 12 24 #> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 16 32 #> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 12 24 #> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 16 32 #> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 8 16 #> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 8 16 #> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 12 24 #> # … with 22 more rows# You can also use mutate() to remove variables and # modify existing variables mtcars %>% as_tibble() %>% mutate( mpg = NULL, disp = disp * 0.0163871 # convert to litres )#> # A tibble: 32 x 10 #> cyl disp hp drat wt qsec vs am gear carb #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 6 2.62 110 3.9 2.62 16.5 0 1 4 4 #> 2 6 2.62 110 3.9 2.88 17.0 0 1 4 4 #> 3 4 1.77 93 3.85 2.32 18.6 1 1 4 1 #> 4 6 4.23 110 3.08 3.22 19.4 1 0 3 1 #> 5 8 5.90 175 3.15 3.44 17.0 0 0 3 2 #> 6 6 3.69 105 2.76 3.46 20.2 1 0 3 1 #> 7 8 5.90 245 3.21 3.57 15.8 0 0 3 4 #> 8 4 2.40 62 3.69 3.19 20 1 0 4 2 #> 9 4 2.31 95 3.92 3.15 22.9 1 0 4 2 #> 10 6 2.75 123 3.92 3.44 18.3 1 0 4 4 #> # … with 22 more rows# window functions are useful for grouped mutates mtcars %>% group_by(cyl) %>% mutate(rank = min_rank(desc(mpg)))#> # A tibble: 32 x 12 #> # Groups: cyl [3] #> mpg cyl disp hp drat wt qsec vs am gear carb rank #> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> #> 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 2 #> 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 2 #> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1 8 #> 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1 1 #> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2 2 #> 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1 6 #> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4 11 #> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2 7 #> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2 8 #> 10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 5 #> # … with 22 more rows# see `vignette("window-functions")` for more details # You can drop variables by setting them to NULL mtcars %>% mutate(cyl = NULL)#> mpg disp hp drat wt qsec vs am gear carb #> 1 21.0 160.0 110 3.90 2.620 16.46 0 1 4 4 #> 2 21.0 160.0 110 3.90 2.875 17.02 0 1 4 4 #> 3 22.8 108.0 93 3.85 2.320 18.61 1 1 4 1 #> 4 21.4 258.0 110 3.08 3.215 19.44 1 0 3 1 #> 5 18.7 360.0 175 3.15 3.440 17.02 0 0 3 2 #> 6 18.1 225.0 105 2.76 3.460 20.22 1 0 3 1 #> 7 14.3 360.0 245 3.21 3.570 15.84 0 0 3 4 #> 8 24.4 146.7 62 3.69 3.190 20.00 1 0 4 2 #> 9 22.8 140.8 95 3.92 3.150 22.90 1 0 4 2 #> 10 19.2 167.6 123 3.92 3.440 18.30 1 0 4 4 #> 11 17.8 167.6 123 3.92 3.440 18.90 1 0 4 4 #> 12 16.4 275.8 180 3.07 4.070 17.40 0 0 3 3 #> 13 17.3 275.8 180 3.07 3.730 17.60 0 0 3 3 #> 14 15.2 275.8 180 3.07 3.780 18.00 0 0 3 3 #> 15 10.4 472.0 205 2.93 5.250 17.98 0 0 3 4 #> 16 10.4 460.0 215 3.00 5.424 17.82 0 0 3 4 #> 17 14.7 440.0 230 3.23 5.345 17.42 0 0 3 4 #> 18 32.4 78.7 66 4.08 2.200 19.47 1 1 4 1 #> 19 30.4 75.7 52 4.93 1.615 18.52 1 1 4 2 #> 20 33.9 71.1 65 4.22 1.835 19.90 1 1 4 1 #> 21 21.5 120.1 97 3.70 2.465 20.01 1 0 3 1 #> 22 15.5 318.0 150 2.76 3.520 16.87 0 0 3 2 #> 23 15.2 304.0 150 3.15 3.435 17.30 0 0 3 2 #> 24 13.3 350.0 245 3.73 3.840 15.41 0 0 3 4 #> 25 19.2 400.0 175 3.08 3.845 17.05 0 0 3 2 #> 26 27.3 79.0 66 4.08 1.935 18.90 1 1 4 1 #> 27 26.0 120.3 91 4.43 2.140 16.70 0 1 5 2 #> 28 30.4 95.1 113 3.77 1.513 16.90 1 1 5 2 #> 29 15.8 351.0 264 4.22 3.170 14.50 0 1 5 4 #> 30 19.7 145.0 175 3.62 2.770 15.50 0 1 5 6 #> 31 15.0 301.0 335 3.54 3.570 14.60 0 1 5 8 #> 32 21.4 121.0 109 4.11 2.780 18.60 1 1 4 2# mutate() vs transmute -------------------------- # mutate() keeps all existing variables mtcars %>% mutate(displ_l = disp / 61.0237)#> mpg cyl disp hp drat wt qsec vs am gear carb displ_l #> 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 2.621932 #> 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 2.621932 #> 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 1.769804 #> 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 4.227866 #> 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 5.899347 #> 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 3.687092 #> 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 5.899347 #> 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 2.403984 #> 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 2.307300 #> 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 2.746474 #> 11 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 2.746474 #> 12 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 4.519556 #> 13 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 4.519556 #> 14 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 4.519556 #> 15 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 7.734700 #> 16 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 7.538055 #> 17 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 7.210313 #> 18 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 1.289663 #> 19 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 1.240502 #> 20 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 1.165121 #> 21 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 1.968088 #> 22 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 5.211090 #> 23 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 4.981671 #> 24 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 5.735477 #> 25 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 6.554830 #> 26 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 1.294579 #> 27 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 1.971365 #> 28 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 1.558411 #> 29 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 5.751864 #> 30 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 2.376126 #> 31 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 4.932510 #> 32 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 1.982836# transmute keeps only the variables you create mtcars %>% transmute(displ_l = disp / 61.0237)#> displ_l #> 1 2.621932 #> 2 2.621932 #> 3 1.769804 #> 4 4.227866 #> 5 5.899347 #> 6 3.687092 #> 7 5.899347 #> 8 2.403984 #> 9 2.307300 #> 10 2.746474 #> 11 2.746474 #> 12 4.519556 #> 13 4.519556 #> 14 4.519556 #> 15 7.734700 #> 16 7.538055 #> 17 7.210313 #> 18 1.289663 #> 19 1.240502 #> 20 1.165121 #> 21 1.968088 #> 22 5.211090 #> 23 4.981671 #> 24 5.735477 #> 25 6.554830 #> 26 1.294579 #> 27 1.971365 #> 28 1.558411 #> 29 5.751864 #> 30 2.376126 #> 31 4.932510 #> 32 1.982836# The mutate operation may yield different results on grouped # tibbles because the expressions are computed within groups. # The following normalises `mass` by the global average: starwars %>% mutate(mass / mean(mass, na.rm = TRUE)) %>% pull()#> [1] 0.7912704 0.7707179 0.3288397 1.3975685 0.5035357 1.2331487 #> [7] 0.7707179 0.3288397 0.8632041 0.7912704 0.8632041 NA #> [13] 1.1509388 0.8220991 0.7604417 13.9551329 0.7912704 1.1303863 #> [19] 0.1746961 0.7707179 0.8036019 1.4386735 1.1612150 0.8118229 #> [25] 0.8118229 0.8529279 NA NA 0.2055248 0.6987843 #> [31] 0.9145853 0.9248615 NA 0.6782318 0.8426516 NA #> [37] NA NA 0.4110496 NA NA 0.8220991 #> [43] NA 0.5651932 0.4624308 NA 0.6679556 0.8632041 #> [49] 0.8426516 0.8940328 NA 0.5138120 NA NA #> [55] 0.8220991 NA 0.8734803 NA NA 0.8220991 #> [61] 0.5775246 0.5138120 NA 0.8220991 NA 0.8118229 #> [67] 0.5651932 1.0481764 0.9043091 NA NA 0.1541436 #> [73] NA 0.4932595 NA 0.5857456 1.6339220 1.3975685 #> [79] 0.8118229 0.4932595 0.8220991 NA NA NA #> [85] NA NA 0.4624308# Whereas this normalises `mass` by the averages within gender # levels: starwars %>% group_by(gender) %>% mutate(mass / mean(mass, na.rm = TRUE)) %>% pull()#> [1] 0.9505639 1.6187050 0.6906475 1.6789181 0.9070715 1.4813984 1.3883747 #> [8] 0.6906475 1.0369788 0.9505639 1.0369788 NA 1.3826385 0.9875989 #> [15] 0.9135290 1.0000000 0.9505639 1.3579485 0.2098648 0.9258740 0.9653779 #> [22] 1.0000000 1.3949834 0.9752539 0.9752539 1.0246339 NA NA #> [29] 0.2468997 0.8394591 1.0987038 1.1110488 NA 0.8147691 1.0122889 #> [36] NA NA NA 0.4937995 NA NA 0.9875989 #> [43] NA 1.0181414 0.5555244 NA 0.8024241 1.0369788 1.0122889 #> [50] 1.0740138 NA 0.9255831 NA NA 0.9875989 NA #> [57] 1.0493238 NA NA 0.9875989 1.0403554 0.9255831 NA #> [64] 0.9875989 NA 0.9752539 1.0181414 1.2591886 1.0863588 NA #> [71] NA 0.1851748 NA 0.5925593 NA 1.0551648 1.9628528 #> [78] 1.6789181 0.9752539 0.8885598 0.9875989 NA NA NA #> [85] NA NA 0.8330248# Note that you can't overwrite grouping variables: gdf <- mtcars %>% group_by(cyl) try(mutate(gdf, cyl = cyl * 100))#> Error : Column `cyl` can't be modified because it's a grouping variable# Refer to column names stored as strings with the `.data` pronoun: vars <- c("mass", "height") mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]])#> # A tibble: 87 x 14 #> name height mass hair_color skin_color eye_color birth_year gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 Luke… 172 77 blond fair blue 19 male #> 2 C-3PO 167 75 <NA> gold yellow 112 <NA> #> 3 R2-D2 96 32 <NA> white, bl… red 33 <NA> #> 4 Dart… 202 136 none white yellow 41.9 male #> 5 Leia… 150 49 brown light brown 19 female #> 6 Owen… 178 120 brown, gr… light blue 52 male #> 7 Beru… 165 75 brown light blue 47 female #> 8 R5-D4 97 32 <NA> white, red red NA <NA> #> 9 Bigg… 183 84 black light brown 24 male #> 10 Obi-… 182 77 auburn, w… fair blue-gray 57 male #> # … with 77 more rows, and 6 more variables: homeworld <chr>, species <chr>, #> # films <list>, vehicles <list>, starships <list>, prod <dbl># For more complex cases, knowledge of tidy evaluation and the # unquote operator `!!` is required. See https://tidyeval.tidyverse.org/ # # One useful and simple tidy eval technique is to use `!!` to # bypass the data frame and its columns. Here is how to divide the # column `mass` by an object of the same name: mass <- 100 mutate(starwars, mass = mass / !!mass)#> # A tibble: 87 x 13 #> name height mass hair_color skin_color eye_color birth_year gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> #> 1 Luke… 172 0.77 blond fair blue 19 male #> 2 C-3PO 167 0.75 <NA> gold yellow 112 <NA> #> 3 R2-D2 96 0.32 <NA> white, bl… red 33 <NA> #> 4 Dart… 202 1.36 none white yellow 41.9 male #> 5 Leia… 150 0.49 brown light brown 19 female #> 6 Owen… 178 1.2 brown, gr… light blue 52 male #> 7 Beru… 165 0.75 brown light blue 47 female #> 8 R5-D4 97 0.32 <NA> white, red red NA <NA> #> 9 Bigg… 183 0.84 black light brown 24 male #> 10 Obi-… 182 0.77 auburn, w… fair blue-gray 57 male #> # … with 77 more rows, and 5 more variables: homeworld <chr>, species <chr>, #> # films <list>, vehicles <list>, starships <list>