Rectangle a nested list into a tidy tibble

hoist(), unnest_longer(), and unnest_wider() provide tools for rectangling, collapsing deeply nested lists into regular columns. hoist() allows you to selectively pull components of a list-column out in to their own top-level columns, using the same syntax as purrr::pluck(). unnest_wider() turns each element of a list-column into a column, and unnest_longer() turns each element of a list-column into a row. unnest_auto() picks between unnest_wider() or unnest_longer() based heuristics described below.

Learn more in vignette("rectangle").

hoist(.data, .col, ..., .remove = TRUE, .simplify = TRUE,
  .ptype = list())

unnest_longer(data, col, values_to = NULL, indices_to = NULL,
  indices_include = NULL, names_repair = "check_unique",
  simplify = TRUE, ptype = list())

unnest_wider(data, col, names_sep = NULL, simplify = TRUE,
  names_repair = "check_unique", ptype = list())

unnest_auto(data, col)

Arguments

.data, data	A data frame.
.col, col	List-column to extract components from.
...	Components of `.col` to turn into columns in the form `col_name = "pluck_specification"`. You can pluck by name with a character vector, by position with an integer vector, or with a combination of the two with a list. See `purrr::pluck()` for details.
.remove	If `TRUE`, the default, will remove extracted components from `.col`. This ensures that each value lives only in one place.
.simplify	If `TRUE`, will attempt to simplify lists of length-1 vectors to an atomic vector
.ptype	Optionally, a named list of prototypes declaring the desired output type of each component.
values_to	Name of column to store vector values. Defaults to `col`.
indices_to	A string giving the name of column which will contain the inner names or position (if not named) of the values. Defaults to `col` with `_id` suffix
indices_include	Add an index column? Defaults to `TRUE` when `col` has inner names.
names_repair	Used to check that output data frame has valid names. Must be one of the following options: "minimal": no name repair or checks, beyond basic existence, "unique": make sure names are unique and not empty, "check_unique": (the default), no name repair, but check they are unique, "universal": make the names unique and syntactic a function: apply custom name repair. tidyr_legacy: use the name repair from tidyr 0.8. a formula: a purrr-style anonymous function (see `rlang::as_function()`) See `vctrs::vec_as_names()` for more details on these terms and the strategies used to enforce them.
simplify	If `TRUE`, will attempt to simplify lists of length-1 vectors to an atomic vector
ptype	Optionally, supply a data frame prototype for the output `cols`, overriding the default that will be guessed from the combination of individual values.
names_sep	If `NULL`, the default, the names of new columns will come directly from the inner data frame. If a string, the names of the new columns will be formed by pasting together the outer column name with the inner names, separated by `names_sep`.

Unnest variants

The three unnest() functions differ in how they change the shape of the output data frame:

unnest_wider() preserves the rows, but changes the columns.
unnest_longer() preserves the columns, but changes the rows
unnest() can change both rows and columns.

These principles guide their behaviour when they are called with a non-primary data type. For example, if you unnest_wider() a list of data frames, the number of rows must be preserved, so each column is turned into a list column of length one. Or if you unnest_longer() a list of data frame, the number of columns must be preserved so it creates a packed column. I'm not sure how if these behaviours are useful in practice, but they are theoretically pleasing.

`unnest_auto()` heuristics

unnest_auto() inspects the inner names of the list-col:

If all elements are unnamed, it uses unnest_longer()
If all elements are named, and there's at least one name in common acros all components, it uses unnest_wider()
Otherwise, it falls back to unnest_longer(indices_include = TRUE).

Examples

df <- tibble(
  character = c("Toothless", "Dory"),
  metadata = list(
    list(
      species = "dragon",
      color = "black",
      films = c(
        "How to Train Your Dragon",
        "How to Train Your Dragon 2",
        "How to Train Your Dragon: The Hidden World"
       )
    ),
    list(
      species = "clownfish",
      color = "blue",
      films = c("Finding Nemo", "Finding Dory")
    )
  )
)
df
#> # A tibble: 2 x 2
#>   character metadata        
#>   <chr>     <list>          
#> 1 Toothless <named list [3]>
#> 2 Dory      <named list [3]>

# Turn all components of metadata into columns
df %>% unnest_wider(metadata)
#> # A tibble: 2 x 4
#>   character species   color films    
#>   <chr>     <chr>     <chr> <list>   
#> 1 Toothless dragon    black <chr [3]>
#> 2 Dory      clownfish blue  <chr [2]>

# Extract only specified components
df %>% hoist(metadata,
  species = "species",
  first_film = list("films", 1L),
  third_film = list("films", 3L)
)
#> # A tibble: 2 x 5
#>   character species   first_film        third_film                  metadata    
#>   <chr>     <chr>     <chr>             <chr>                       <list>      
#> 1 Toothless dragon    How to Train You… How to Train Your Dragon: … <named list…
#> 2 Dory      clownfish Finding Nemo      <NA>                        <named list…

df %>%
  unnest_wider(metadata) %>%
  unnest_longer(films)
#> # A tibble: 5 x 4
#>   character species   color films                                     
#>   <chr>     <chr>     <chr> <chr>                                     
#> 1 Toothless dragon    black How to Train Your Dragon                  
#> 2 Toothless dragon    black How to Train Your Dragon 2                
#> 3 Toothless dragon    black How to Train Your Dragon: The Hidden World
#> 4 Dory      clownfish blue  Finding Nemo                              
#> 5 Dory      clownfish blue  Finding Dory                              
# unnest_longer() is useful when each component of the list should
# form a row
df <- tibble(
  x = 1:3,
  y = list(NULL, 1:3, 4:5)
)
df %>% unnest_longer(y)
#> # A tibble: 6 x 2
#>       x     y
#>   <int> <int>
#> 1     1    NA
#> 2     2     1
#> 3     2     2
#> 4     2     3
#> 5     3     4
#> 6     3     5
# Automatically creates names if widening
df %>% unnest_wider(y)
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> * `` -> ...3
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> # A tibble: 3 x 4
#>       x  ...1  ...2  ...3
#>   <int> <int> <int> <int>
#> 1     1    NA    NA    NA
#> 2     2     1     2     3
#> 3     3     4     5    NA

# And similarly if the vectors are named
df <- tibble(
  x = 1:2,
  y = list(c(a = 1, b = 2), c(a = 10, b = 11, c = 12))
)
df %>% unnest_wider(y)
#> # A tibble: 2 x 4
#>       x     a     b     c
#>   <int> <dbl> <dbl> <dbl>
#> 1     1     1     2    NA
#> 2     2    10    11    12
df %>% unnest_longer(y)
#> # A tibble: 5 x 3
#>       x     y y_id 
#>   <int> <dbl> <chr>
#> 1     1     1 a    
#> 2     1     2 b    
#> 3     2    10 a    
#> 4     2    11 b    
#> 5     2    12 c

Rectangle a nested list into a tidy tibble

Arguments

Unnest variants

unnest_auto() heuristics

Examples

Contents

`unnest_auto()` heuristics