hoist(), unnest_longer(), and unnest_wider() provide tools for
rectangling, collapsing deeply nested lists into regular columns.
hoist() allows you to selectively pull components of a list-column out
in to their own top-level columns, using the same syntax as purrr::pluck().
unnest_wider() turns each element of a list-column into a column, and
unnest_longer() turns each element of a list-column into a row.
unnest_auto() picks between unnest_wider() or unnest_longer()
based heuristics described below.
Learn more in vignette("rectangle").
hoist(.data, .col, ..., .remove = TRUE, .simplify = TRUE, .ptype = list()) unnest_longer(data, col, values_to = NULL, indices_to = NULL, indices_include = NULL, names_repair = "check_unique", simplify = TRUE, ptype = list()) unnest_wider(data, col, names_sep = NULL, simplify = TRUE, names_repair = "check_unique", ptype = list()) unnest_auto(data, col)
| .data, data | A data frame. |
|---|---|
| .col, col | List-column to extract components from. |
| ... | Components of |
| .remove | If |
| .simplify | If |
| .ptype | Optionally, a named list of prototypes declaring the desired output type of each component. |
| values_to | Name of column to store vector values. Defaults to |
| indices_to | A string giving the name of column which will contain the
inner names or position (if not named) of the values. Defaults to |
| indices_include | Add an index column? Defaults to |
| names_repair | Used to check that output data frame has valid names. Must be one of the following options:
See |
| simplify | If |
| ptype | Optionally, supply a data frame prototype for the output |
| names_sep | If If a string, the names of the new columns will be formed by pasting
together the outer column name with the inner names, separated by
|
The three unnest() functions differ in how they change the shape of the
output data frame:
unnest_wider() preserves the rows, but changes the columns.
unnest_longer() preserves the columns, but changes the rows
unnest() can change both rows and columns.
These principles guide their behaviour when they are called with a
non-primary data type. For example, if you unnest_wider() a list of data
frames, the number of rows must be preserved, so each column is turned into
a list column of length one. Or if you unnest_longer() a list of data
frame, the number of columns must be preserved so it creates a packed
column. I'm not sure how if these behaviours are useful in practice, but
they are theoretically pleasing.
unnest_auto() heuristicsunnest_auto() inspects the inner names of the list-col:
If all elements are unnamed, it uses unnest_longer()
If all elements are named, and there's at least one name in
common acros all components, it uses unnest_wider()
Otherwise, it falls back to unnest_longer(indices_include = TRUE).
df <- tibble( character = c("Toothless", "Dory"), metadata = list( list( species = "dragon", color = "black", films = c( "How to Train Your Dragon", "How to Train Your Dragon 2", "How to Train Your Dragon: The Hidden World" ) ), list( species = "clownfish", color = "blue", films = c("Finding Nemo", "Finding Dory") ) ) ) df#> # A tibble: 2 x 2 #> character metadata #> <chr> <list> #> 1 Toothless <named list [3]> #> 2 Dory <named list [3]># Turn all components of metadata into columns df %>% unnest_wider(metadata)#> # A tibble: 2 x 4 #> character species color films #> <chr> <chr> <chr> <list> #> 1 Toothless dragon black <chr [3]> #> 2 Dory clownfish blue <chr [2]># Extract only specified components df %>% hoist(metadata, species = "species", first_film = list("films", 1L), third_film = list("films", 3L) )#> # A tibble: 2 x 5 #> character species first_film third_film metadata #> <chr> <chr> <chr> <chr> <list> #> 1 Toothless dragon How to Train You… How to Train Your Dragon: … <named list… #> 2 Dory clownfish Finding Nemo <NA> <named list…df %>% unnest_wider(metadata) %>% unnest_longer(films)#> # A tibble: 5 x 4 #> character species color films #> <chr> <chr> <chr> <chr> #> 1 Toothless dragon black How to Train Your Dragon #> 2 Toothless dragon black How to Train Your Dragon 2 #> 3 Toothless dragon black How to Train Your Dragon: The Hidden World #> 4 Dory clownfish blue Finding Nemo #> 5 Dory clownfish blue Finding Dory# unnest_longer() is useful when each component of the list should # form a row df <- tibble( x = 1:3, y = list(NULL, 1:3, 4:5) ) df %>% unnest_longer(y)#> # A tibble: 6 x 2 #> x y #> <int> <int> #> 1 1 NA #> 2 2 1 #> 3 2 2 #> 4 2 3 #> 5 3 4 #> 6 3 5# Automatically creates names if widening df %>% unnest_wider(y)#> #> #> #>#> #> #>#> # A tibble: 3 x 4 #> x ...1 ...2 ...3 #> <int> <int> <int> <int> #> 1 1 NA NA NA #> 2 2 1 2 3 #> 3 3 4 5 NA# And similarly if the vectors are named df <- tibble( x = 1:2, y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12)) ) df %>% unnest_wider(y)#> # A tibble: 2 x 4 #> x a b c #> <int> <dbl> <dbl> <dbl> #> 1 1 1 2 NA #> 2 2 10 11 12df %>% unnest_longer(y)#> # A tibble: 5 x 3 #> x y y_id #> <int> <dbl> <chr> #> 1 1 1 a #> 2 1 2 b #> 3 2 10 a #> 4 2 11 b #> 5 2 12 c