Nesting creates a list-column of data frames; unnesting flattens it back out into regular columns. Nesting is implicitly a summarising operation: you get one row for each group defined by the non-nested columns. This is useful in conjunction with other summaries that work with whole datasets, most notably models.

Learn more in vignette("nest").

nest(.data, ..., .key = deprecated())

unnest(data, cols, ..., keep_empty = FALSE, ptype = NULL,
  names_sep = NULL, names_repair = "check_unique",
  .drop = deprecated(), .id = deprecated(), .sep = deprecated(),
  .preserve = deprecated())

Arguments

.data

A data frame.

...

Name-variable pairs of the form new_col = c(col1, col2, col3), that describe how you wish to nest existing columns into new columns. The right hand side can be any expression supported by tidyselect.

Deprecated lifecycle : previously you could write df %>% nest(x, y, z) and df %>% unnest(x, y, z). Convert to df %>% nest(data = c(x, y, z)). and df %>% unnest(c(x, y, z)).

If you previously created new variable in unnest() you'll now need to do it explicitly with mutate(). Convert df %>% unnest(y = fun(x, y, z)) to df %>% mutate(y = fun(x, y, z)) %>% unnest(y).

.key

Deprecated lifecycle : No longer needed because of the new new_col = c(col1, col2, col3) syntax.

data

A data frame.

cols

Names of columns to unnest.

If you unnest() multiple columns, parallel entries must compatible sizes, i.e. they're either equal or length 1 (following the standard tidyverse recycling rules).

keep_empty

By default, you get one row of output for each element of the list your unchopping/unnesting. This means that if there's a size-0 element (like NULL or an empty data frame), that entire row will be dropped from the output. If you want to preserve all rows, use keep_empty = TRUE to replace size-0 elements with a single row of missing values.

ptype

Optionally, supply a data frame prototype for the output cols, overriding the default that will be guessed from the combination of individual values.

names_sep

If NULL, the default, the names of new columns will come directly from the inner data frame.

If a string, the names of the new columns will be formed by pasting together the outer column name with the inner names, separated by names_sep.

names_repair

Used to check that output data frame has valid names. Must be one of the following options:

  • "minimal": no name repair or checks, beyond basic existence,

  • "unique": make sure names are unique and not empty,

  • "check_unique": (the default), no name repair, but check they are unique,

  • "universal": make the names unique and syntactic

  • a function: apply custom name repair.

  • tidyr_legacy: use the name repair from tidyr 0.8.

  • a formula: a purrr-style anonymous function (see rlang::as_function())

See vctrs::vec_as_names() for more details on these terms and the strategies used to enforce them.

.drop, .preserve

Deprecated lifecycle : all list-columns are now preserved; If there are any that you don't want in the output use select() to remove them prior to unnesting.

.id

Deprecated lifecycle : convert df %>% unnest(x, .id = "id") to df %>% mutate(id = names(x)) %>% unnest(x)).

.sep

Deprecated lifecycle : use names_sep instead.

New syntax

tidyr 1.0.0 introduced a new syntax for nest() and unnest() that's designed to be more similar to other functions. Converting to the new syntax should be straightforward (guided by the message you'll recieve) but if you just need to run an old analysis, you can easily revert to the previous behaviour using nest_legacy() and unnest_legacy() as follows:

library(tidyr)
nest <- nest_legacy
unnest <- unnest_legacy

Grouped data frames

df %>% nest(x, y) specifies the columns to be nested; i.e. the columns that will appear in the inner data frame. Alternatively, you can nest() a grouped data frame created by dplyr::group_by(). The grouping variables remain in the outer data frame and the others are nested. The result preserves the grouping of the input.

Variables supplied to nest() will override grouping variables so that df %>% group_by(x, y) %>% nest(z) will be equivalent to df %>% nest(z).

Examples

df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) # Note that we get one row of output for each unique combination of # non-nested variables df %>% nest(data = c(y, z))
#> # A tibble: 3 x 2 #> x data #> <dbl> <list<df[,2]>> #> 1 1 [3 × 2] #> 2 2 [2 × 2] #> 3 3 [1 × 2]
# chop does something similar, but retains individual columns df %>% chop(c(y, z))
#> # A tibble: 3 x 3 #> x y z #> <dbl> <list> <list> #> 1 1 <int [3]> <int [3]> #> 2 2 <int [2]> <int [2]> #> 3 3 <int [1]> <int [1]>
# use tidyselect syntax and helpers, just like in dplyr::select() df %>% nest(data = one_of("y", "z"))
#> # A tibble: 3 x 2 #> x data #> <dbl> <list<df[,2]>> #> 1 1 [3 × 2] #> 2 2 [2 × 2] #> 3 3 [1 × 2]
iris %>% nest(data = -Species)
#> # A tibble: 3 x 2 #> Species data #> <fct> <list<df[,4]>> #> 1 setosa [50 × 4] #> 2 versicolor [50 × 4] #> 3 virginica [50 × 4]
nest_vars <- names(iris)[1:4] iris %>% nest(data = one_of(nest_vars))
#> # A tibble: 3 x 2 #> Species data #> <fct> <list<df[,4]>> #> 1 setosa [50 × 4] #> 2 versicolor [50 × 4] #> 3 virginica [50 × 4]
iris %>% nest(petal = starts_with("Petal"), sepal = starts_with("Sepal"))
#> # A tibble: 3 x 3 #> Species petal sepal #> <fct> <list<df[,2]>> <list<df[,2]>> #> 1 setosa [50 × 2] [50 × 2] #> 2 versicolor [50 × 2] [50 × 2] #> 3 virginica [50 × 2] [50 × 2]
iris %>% nest(width = contains("Width"), length = contains("Length"))
#> # A tibble: 3 x 3 #> Species width length #> <fct> <list<df[,2]>> <list<df[,2]>> #> 1 setosa [50 × 2] [50 × 2] #> 2 versicolor [50 × 2] [50 × 2] #> 3 virginica [50 × 2] [50 × 2]
# Nesting a grouped data frame nests all variables apart from the group vars library(dplyr) fish_encounters %>% group_by(fish) %>% nest()
#> # A tibble: 19 x 2 #> # Groups: fish [19] #> fish data #> <fct> <list<df[,2]>> #> 1 4842 [11 × 2] #> 2 4843 [11 × 2] #> 3 4844 [11 × 2] #> 4 4845 [5 × 2] #> 5 4847 [3 × 2] #> 6 4848 [4 × 2] #> 7 4849 [2 × 2] #> 8 4850 [6 × 2] #> 9 4851 [2 × 2] #> 10 4854 [2 × 2] #> 11 4855 [5 × 2] #> 12 4857 [9 × 2] #> 13 4858 [11 × 2] #> 14 4859 [5 × 2] #> 15 4861 [11 × 2] #> 16 4862 [9 × 2] #> 17 4863 [2 × 2] #> 18 4864 [2 × 2] #> 19 4865 [3 × 2]
# Nesting is often useful for creating per group models mtcars %>% group_by(cyl) %>% nest() %>% mutate(models = lapply(data, function(df) lm(mpg ~ wt, data = df)))
#> # A tibble: 3 x 3 #> # Groups: cyl [3] #> cyl data models #> <dbl> <list<df[,10]>> <list> #> 1 6 [7 × 10] <lm> #> 2 4 [11 × 10] <lm> #> 3 8 [14 × 10] <lm>
# unnest() is primarily designed to work with lists of data frames df <- tibble( x = 1:3, y = list( NULL, tibble(a = 1, b = 2), tibble(a = 1:3, b = 3:1) ) ) df %>% unnest(y)
#> # A tibble: 4 x 3 #> x a b #> <int> <dbl> <dbl> #> 1 2 1 2 #> 2 3 1 3 #> 3 3 2 2 #> 4 3 3 1
df %>% unnest(y, keep_empty = TRUE)
#> # A tibble: 5 x 3 #> x a b #> <int> <dbl> <dbl> #> 1 1 NA NA #> 2 2 1 2 #> 3 3 1 3 #> 4 3 2 2 #> 5 3 3 1
# If you have lists of lists, or lists of atomic vectors, instead # see hoist(), unnest_wider(), and unnest_longer() #' # You can unnest multiple columns simultaneously df <- tibble( a = list(c("a", "b"), "c"), b = list(1:2, 3), c = c(11, 22) ) df %>% unnest(c(a, b))
#> # A tibble: 3 x 3 #> a b c #> <chr> <dbl> <dbl> #> 1 a 1 11 #> 2 b 2 11 #> 3 c 3 22
# Compare with unnesting one column at a time, which generates # the Cartesian product df %>% unnest(a) %>% unnest(b)
#> # A tibble: 5 x 3 #> a b c #> <chr> <dbl> <dbl> #> 1 a 1 11 #> 2 a 2 11 #> 3 b 1 11 #> 4 b 2 11 #> 5 c 3 22