Chopping and unchopping preserve the width of a data frame, changing its
length. chop()
makes df
shorter by converting rows within each group
into list-columns. unchop()
makes df
longer by expanding list-columns
so that each element of the list-column gets its own row in the output.
chop(data, cols) unchop(data, cols, keep_empty = FALSE, ptype = NULL)
data | A data frame. |
---|---|
cols | Column to chop or unchop (automatically quoted). This should be a list-column containing generalised vectors (e.g.
any mix of |
keep_empty | By default, you get one row of output for each element
of the list your unchopping/unnesting. This means that if there's a
size-0 element (like |
ptype | Optionally, supply a data frame prototype for the output |
Generally, unchopping is more useful than chopping because it simplifies
a complex data structure, and nest()
ing is usually more appropriate
that chop()
ing` since it better preserves the connections between
observations.
# Chop ============================================================== df <- tibble(x = c(1, 1, 1, 2, 2, 3), y = 1:6, z = 6:1) # Note that we get one row of output for each unique combination of # non-chopped variables df %>% chop(c(y, z))#> # A tibble: 3 x 3 #> x y z #> <dbl> <list> <list> #> 1 1 <int [3]> <int [3]> #> 2 2 <int [2]> <int [2]> #> 3 3 <int [1]> <int [1]>#> # A tibble: 3 x 2 #> x data #> <dbl> <list<df[,2]>> #> 1 1 [3 × 2] #> 2 2 [2 × 2] #> 3 3 [1 × 2]# Unchop ============================================================ df <- tibble(x = 1:4, y = list(integer(), 1L, 1:2, 1:3)) df %>% unchop(y)#> # A tibble: 6 x 2 #> x y #> <int> <int> #> 1 2 1 #> 2 3 1 #> 3 3 2 #> 4 4 1 #> 5 4 2 #> 6 4 3df %>% unchop(y, keep_empty = TRUE)#> # A tibble: 7 x 2 #> x y #> <int> <int> #> 1 1 NA #> 2 2 1 #> 3 3 1 #> 4 3 2 #> 5 4 1 #> 6 4 2 #> 7 4 3# Incompatible types ------------------------------------------------- # If the list-col contains types that can not be natively df <- tibble(x = 1:2, y = list("1", 1:3)) try(df %>% unchop(y))#> Error : No common type for `..1$y` <character> and `..2$y` <integer>.#> # A tibble: 4 x 2 #> x y #> <int> <int> #> 1 1 1 #> 2 2 1 #> 3 2 2 #> 4 2 3#> # A tibble: 4 x 2 #> x y #> <int> <chr> #> 1 1 1 #> 2 2 1 #> 3 2 2 #> 4 2 3#> # A tibble: 4 x 2 #> x y #> <int> <list> #> 1 1 <chr [1]> #> 2 2 <int [1]> #> 3 2 <int [1]> #> 4 2 <int [1]># Unchopping data frames ----------------------------------------------------- # Unchopping a list-col of data frames must generate a df-col because # unchop leaves the column names unchanged df <- tibble(x = 1:3, y = list(NULL, tibble(x = 1), tibble(y = 1:2))) df %>% unchop(y)#> # A tibble: 3 x 2 #> x y$x $y #> <int> <dbl> <int> #> 1 2 1 NA #> 2 3 NA 1 #> 3 3 NA 2df %>% unchop(y, keep_empty = TRUE)#> # A tibble: 4 x 2 #> x y$x $y #> <int> <dbl> <int> #> 1 1 NA NA #> 2 2 1 NA #> 3 3 NA 1 #> 4 3 NA 2