Maturing lifecycle

Packing and unpacking preserve the length of a data frame, changing its width. pack() makes df narrow by collapsing a set of columns into a single df-column. unpack() makes data wider by expanding df-columns back out into individual columns.

pack(data, ...)

unpack(data, cols, names_sep = NULL, names_repair = "check_unique")

Arguments

data

A data frame.

...

Name-variable pairs of the form new_col = c(col1, col2, col3), that describe how you wish to pack existing columns into new columns. The right hand side can be any expression supported by tidyselect.

cols

Name of column that you wish to unpack.

names_sep

If NULL, the default, the names of new columns will come directly from the inner data frame.

If a string, the names of the new columns will be formed by pasting together the outer column name with the inner names, separated by names_sep.

names_repair

Used to check that output data frame has valid names. Must be one of the following options:

  • "minimal": no name repair or checks, beyond basic existence,

  • "unique": make sure names are unique and not empty,

  • "check_unique": (the default), no name repair, but check they are unique,

  • "universal": make the names unique and syntactic

  • a function: apply custom name repair.

  • tidyr_legacy: use the name repair from tidyr 0.8.

  • a formula: a purrr-style anonymous function (see rlang::as_function())

See vctrs::vec_as_names() for more details on these terms and the strategies used to enforce them.

Details

Generally, unpacking is more useful than packing because it simplifies a complex data structure. Currently, few functions work with df-cols, and they are mostly a curiosity, but seem worth exploring further because they mimic the nested column headers that are so popular in Excel.

Examples

# Packing ============================================================= # It's not currently clear why you would ever want to pack columns # since few functions work with this sort of data. df <- tibble(x1 = 1:3, x2 = 4:6, x3 = 7:9, y = 1:3) df
#> # A tibble: 3 x 4 #> x1 x2 x3 y #> <int> <int> <int> <int> #> 1 1 4 7 1 #> 2 2 5 8 2 #> 3 3 6 9 3
df %>% pack(x = starts_with("x"))
#> # A tibble: 3 x 2 #> y x$x1 $x2 $x3 #> <int> <int> <int> <int> #> 1 1 1 4 7 #> 2 2 2 5 8 #> 3 3 3 6 9
df %>% pack(x = c(x1, x2, x3), y = y)
#> # A tibble: 3 x 2 #> x$x1 $x2 $x3 y$y #> <int> <int> <int> <int> #> 1 1 4 7 1 #> 2 2 5 8 2 #> 3 3 6 9 3
# Unpacking =========================================================== df <- tibble( x = 1:3, y = tibble(a = 1:3, b = 3:1), z = tibble(X = c("a", "b", "c"), Y = runif(3), Z = c(TRUE, FALSE, NA)) ) df
#> # A tibble: 3 x 3 #> x y$a $b z$X $Y $Z #> <int> <int> <int> <chr> <dbl> <lgl> #> 1 1 1 3 a 0.401 TRUE #> 2 2 2 2 b 0.213 FALSE #> 3 3 3 1 c 0.672 NA
df %>% unpack(y)
#> # A tibble: 3 x 4 #> x a b z$X $Y $Z #> <int> <int> <int> <chr> <dbl> <lgl> #> 1 1 1 3 a 0.401 TRUE #> 2 2 2 2 b 0.213 FALSE #> 3 3 3 1 c 0.672 NA
df %>% unpack(c(y, z))
#> # A tibble: 3 x 6 #> x a b X Y Z #> <int> <int> <int> <chr> <dbl> <lgl> #> 1 1 1 3 a 0.401 TRUE #> 2 2 2 2 b 0.213 FALSE #> 3 3 3 1 c 0.672 NA
df %>% unpack(c(y, z), names_sep = "_")
#> # A tibble: 3 x 6 #> x y_a y_b z_X z_Y z_Z #> <int> <int> <int> <chr> <dbl> <lgl> #> 1 1 1 3 a 0.401 TRUE #> 2 2 2 2 b 0.213 FALSE #> 3 3 3 1 c 0.672 NA