Nesting creates a list-column of data frames; unnesting flattens it back out into regular columns. Nesting is implicitly a summarising operation: you get one row for each group defined by the non-nested columns. This is useful in conjunction with other summaries that work with whole datasets, most notably models.
Learn more in vignette("nest")
.
nest(.data, ..., .key = deprecated()) unnest(data, cols, ..., keep_empty = FALSE, ptype = NULL, names_sep = NULL, names_repair = "check_unique", .drop = deprecated(), .id = deprecated(), .sep = deprecated(), .preserve = deprecated())
.data | A data frame. |
---|---|
... | Name-variable pairs of the form
:
previously you could write If you previously created new variable in |
.key |
:
No longer needed because of the new |
data | A data frame. |
cols | Names of columns to unnest. If you |
keep_empty | By default, you get one row of output for each element
of the list your unchopping/unnesting. This means that if there's a
size-0 element (like |
ptype | Optionally, supply a data frame prototype for the output |
names_sep | If If a string, the names of the new columns will be formed by pasting
together the outer column name with the inner names, separated by
|
names_repair | Used to check that output data frame has valid names. Must be one of the following options:
See |
.drop, .preserve |
:
all list-columns are now preserved; If there are any that you
don't want in the output use |
.id |
:
convert |
.sep |
:
use |
tidyr 1.0.0 introduced a new syntax for nest()
and unnest()
that's
designed to be more similar to other functions. Converting to the new syntax
should be straightforward (guided by the message you'll recieve) but if
you just need to run an old analysis, you can easily revert to the previous
behaviour using nest_legacy()
and unnest_legacy()
as follows:
library(tidyr) nest <- nest_legacy unnest <- unnest_legacy
df %>% nest(x, y)
specifies the columns to be nested; i.e. the columns that
will appear in the inner data frame. Alternatively, you can nest()
a
grouped data frame created by dplyr::group_by()
. The grouping variables
remain in the outer data frame and the others are nested. The result
preserves the grouping of the input.
Variables supplied to nest()
will override grouping variables so that
df %>% group_by(x, y) %>% nest(z)
will be equivalent to df %>% nest(z)
.
df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) # Note that we get one row of output for each unique combination of # non-nested variables df %>% nest(data = c(y, z))#> # A tibble: 3 x 2 #> x data #> <dbl> <list<df[,2]>> #> 1 1 [3 × 2] #> 2 2 [2 × 2] #> 3 3 [1 × 2]#> # A tibble: 3 x 3 #> x y z #> <dbl> <list> <list> #> 1 1 <int [3]> <int [3]> #> 2 2 <int [2]> <int [2]> #> 3 3 <int [1]> <int [1]># use tidyselect syntax and helpers, just like in dplyr::select() df %>% nest(data = one_of("y", "z"))#> # A tibble: 3 x 2 #> x data #> <dbl> <list<df[,2]>> #> 1 1 [3 × 2] #> 2 2 [2 × 2] #> 3 3 [1 × 2]iris %>% nest(data = -Species)#> # A tibble: 3 x 2 #> Species data #> <fct> <list<df[,4]>> #> 1 setosa [50 × 4] #> 2 versicolor [50 × 4] #> 3 virginica [50 × 4]#> # A tibble: 3 x 2 #> Species data #> <fct> <list<df[,4]>> #> 1 setosa [50 × 4] #> 2 versicolor [50 × 4] #> 3 virginica [50 × 4]#> # A tibble: 3 x 3 #> Species petal sepal #> <fct> <list<df[,2]>> <list<df[,2]>> #> 1 setosa [50 × 2] [50 × 2] #> 2 versicolor [50 × 2] [50 × 2] #> 3 virginica [50 × 2] [50 × 2]#> # A tibble: 3 x 3 #> Species width length #> <fct> <list<df[,2]>> <list<df[,2]>> #> 1 setosa [50 × 2] [50 × 2] #> 2 versicolor [50 × 2] [50 × 2] #> 3 virginica [50 × 2] [50 × 2]# Nesting a grouped data frame nests all variables apart from the group vars library(dplyr) fish_encounters %>% group_by(fish) %>% nest()#> # A tibble: 19 x 2 #> # Groups: fish [19] #> fish data #> <fct> <list<df[,2]>> #> 1 4842 [11 × 2] #> 2 4843 [11 × 2] #> 3 4844 [11 × 2] #> 4 4845 [5 × 2] #> 5 4847 [3 × 2] #> 6 4848 [4 × 2] #> 7 4849 [2 × 2] #> 8 4850 [6 × 2] #> 9 4851 [2 × 2] #> 10 4854 [2 × 2] #> 11 4855 [5 × 2] #> 12 4857 [9 × 2] #> 13 4858 [11 × 2] #> 14 4859 [5 × 2] #> 15 4861 [11 × 2] #> 16 4862 [9 × 2] #> 17 4863 [2 × 2] #> 18 4864 [2 × 2] #> 19 4865 [3 × 2]# Nesting is often useful for creating per group models mtcars %>% group_by(cyl) %>% nest() %>% mutate(models = lapply(data, function(df) lm(mpg ~ wt, data = df)))#> # A tibble: 3 x 3 #> # Groups: cyl [3] #> cyl data models #> <dbl> <list<df[,10]>> <list> #> 1 6 [7 × 10] <lm> #> 2 4 [11 × 10] <lm> #> 3 8 [14 × 10] <lm># unnest() is primarily designed to work with lists of data frames df <- tibble( x = 1:3, y = list( NULL, tibble(a = 1, b = 2), tibble(a = 1:3, b = 3:1) ) ) df %>% unnest(y)#> # A tibble: 4 x 3 #> x a b #> <int> <dbl> <dbl> #> 1 2 1 2 #> 2 3 1 3 #> 3 3 2 2 #> 4 3 3 1df %>% unnest(y, keep_empty = TRUE)#> # A tibble: 5 x 3 #> x a b #> <int> <dbl> <dbl> #> 1 1 NA NA #> 2 2 1 2 #> 3 3 1 3 #> 4 3 2 2 #> 5 3 3 1# If you have lists of lists, or lists of atomic vectors, instead # see hoist(), unnest_wider(), and unnest_longer() #' # You can unnest multiple columns simultaneously df <- tibble( a = list(c("a", "b"), "c"), b = list(1:2, 3), c = c(11, 22) ) df %>% unnest(c(a, b))#> # A tibble: 3 x 3 #> a b c #> <chr> <dbl> <dbl> #> 1 a 1 11 #> 2 b 2 11 #> 3 c 3 22# Compare with unnesting one column at a time, which generates # the Cartesian product df %>% unnest(a) %>% unnest(b)#> # A tibble: 5 x 3 #> a b c #> <chr> <dbl> <dbl> #> 1 a 1 11 #> 2 a 2 11 #> 3 b 1 11 #> 4 b 2 11 #> 5 c 3 22