Expand data frame to include all combinations of values

expand() is often useful in conjunction with left_join() if you want to convert implicit missing values to explicit missing values. Or you can use it in conjunction with anti_join() to figure out which combinations are missing.

expand(data, ...)

crossing(...)

nesting(...)

Arguments

data

data	A data frame.
...	Specification of columns to expand. Columns can be atomic vectors or lists. To find all unique combinations of x, y and z, including those not found in the data, supply each variable as a separate argument. To find only the combinations that occur in the data, use nest: `expand(df, nesting(x, y, z))`. You can combine the two forms. For example, `expand(df, nesting(school_id, student_id), date)` would produce a row for every student for each date. For factors, the full set of levels (not just those that appear in the data) are used. For continuous variables, you may need to fill in values that don't appear in the data: to do so use expressions like `year = 2010:2020` or `year = full_seq(year,1)`. Length-zero (empty) elements are automatically dropped.

A data frame.

...

Specification of columns to expand. Columns can be atomic vectors or lists.

To find all unique combinations of x, y and z, including those not found in the data, supply each variable as a separate argument. To find only the combinations that occur in the data, use nest: expand(df, nesting(x, y, z)).

You can combine the two forms. For example, expand(df, nesting(school_id, student_id), date) would produce a row for every student for each date.

For factors, the full set of levels (not just those that appear in the data) are used. For continuous variables, you may need to fill in values that don't appear in the data: to do so use expressions like year = 2010:2020 or year = full_seq(year,1).

Length-zero (empty) elements are automatically dropped.

Details

crossing() is a wrapper around expand_grid() that deduplicates and sorts each input. nesting() is the complement to crossing(): it only keeps combinations of values that appear in the data.

Examples

library(dplyr)
# All possible combinations of vs & cyl, even those that aren't
# present in the data
expand(mtcars, vs, cyl)
#> # A tibble: 6 x 2
#>      vs   cyl
#>   <dbl> <dbl>
#> 1     0     4
#> 2     0     6
#> 3     0     8
#> 4     1     4
#> 5     1     6
#> 6     1     8

# Only combinations of vs and cyl that appear in the data
expand(mtcars, nesting(vs, cyl))
#> # A tibble: 5 x 2
#>      vs   cyl
#>   <dbl> <dbl>
#> 1     0     4
#> 2     0     6
#> 3     0     8
#> 4     1     4
#> 5     1     6

# Implicit missings ---------------------------------------------------------
df <- tibble(
  year   = c(2010, 2010, 2010, 2010, 2012, 2012, 2012),
  qtr    = c(   1,    2,    3,    4,    1,    2,    3),
  return = rnorm(7)
)
df %>% expand(year, qtr)
#> # A tibble: 8 x 2
#>    year   qtr
#>   <dbl> <dbl>
#> 1  2010     1
#> 2  2010     2
#> 3  2010     3
#> 4  2010     4
#> 5  2012     1
#> 6  2012     2
#> 7  2012     3
#> 8  2012     4
df %>% expand(year = 2010:2012, qtr)
#> # A tibble: 12 x 2
#>     year   qtr
#>    <int> <dbl>
#>  1  2010     1
#>  2  2010     2
#>  3  2010     3
#>  4  2010     4
#>  5  2011     1
#>  6  2011     2
#>  7  2011     3
#>  8  2011     4
#>  9  2012     1
#> 10  2012     2
#> 11  2012     3
#> 12  2012     4
df %>% expand(year = full_seq(year, 1), qtr)
#> # A tibble: 12 x 2
#>     year   qtr
#>    <dbl> <dbl>
#>  1  2010     1
#>  2  2010     2
#>  3  2010     3
#>  4  2010     4
#>  5  2011     1
#>  6  2011     2
#>  7  2011     3
#>  8  2011     4
#>  9  2012     1
#> 10  2012     2
#> 11  2012     3
#> 12  2012     4
df %>% complete(year = full_seq(year, 1), qtr)
#> # A tibble: 12 x 3
#>     year   qtr   return
#>    <dbl> <dbl>    <dbl>
#>  1  2010     1 -1.40   
#>  2  2010     2  0.255  
#>  3  2010     3 -2.44   
#>  4  2010     4 -0.00557
#>  5  2011     1 NA      
#>  6  2011     2 NA      
#>  7  2011     3 NA      
#>  8  2011     4 NA      
#>  9  2012     1  0.622  
#> 10  2012     2  1.15   
#> 11  2012     3 -1.82   
#> 12  2012     4 NA      

# Nesting -------------------------------------------------------------------
# Each person was given one of two treatments, repeated three times
# But some of the replications haven't happened yet, so we have
# incomplete data:
experiment <- tibble(
  name = rep(c("Alex", "Robert", "Sam"), c(3, 2, 1)),
  trt  = rep(c("a", "b", "a"), c(3, 2, 1)),
  rep = c(1, 2, 3, 1, 2, 1),
  measurement_1 = runif(6),
  measurement_2 = runif(6)
)

# We can figure out the complete set of data with expand()
# Each person only gets one treatment, so we nest name and trt together:
all <- experiment %>% expand(nesting(name, trt), rep)
all
#> # A tibble: 9 x 3
#>   name   trt     rep
#>   <chr>  <chr> <dbl>
#> 1 Alex   a         1
#> 2 Alex   a         2
#> 3 Alex   a         3
#> 4 Robert b         1
#> 5 Robert b         2
#> 6 Robert b         3
#> 7 Sam    a         1
#> 8 Sam    a         2
#> 9 Sam    a         3

# We can use anti_join to figure out which observations are missing
all %>% anti_join(experiment)
#> Joining, by = c("name", "trt", "rep")
#> # A tibble: 3 x 3
#>   name   trt     rep
#>   <chr>  <chr> <dbl>
#> 1 Robert b         3
#> 2 Sam    a         2
#> 3 Sam    a         3

# And use right_join to add in the appropriate missing values to the
# original data
experiment %>% right_join(all)
#> Joining, by = c("name", "trt", "rep")
#> # A tibble: 9 x 5
#>   name   trt     rep measurement_1 measurement_2
#>   <chr>  <chr> <dbl>         <dbl>         <dbl>
#> 1 Alex   a         1        0.402          0.290
#> 2 Alex   a         2        0.196          0.678
#> 3 Alex   a         3        0.404          0.735
#> 4 Robert b         1        0.0637         0.196
#> 5 Robert b         2        0.389          0.981
#> 6 Robert b         3       NA             NA    
#> 7 Sam    a         1        0.976          0.742
#> 8 Sam    a         2       NA             NA    
#> 9 Sam    a         3       NA             NA    
# Or use the complete() short-hand
experiment %>% complete(nesting(name, trt), rep)
#> # A tibble: 9 x 5
#>   name   trt     rep measurement_1 measurement_2
#>   <chr>  <chr> <dbl>         <dbl>         <dbl>
#> 1 Alex   a         1        0.402          0.290
#> 2 Alex   a         2        0.196          0.678
#> 3 Alex   a         3        0.404          0.735
#> 4 Robert b         1        0.0637         0.196
#> 5 Robert b         2        0.389          0.981
#> 6 Robert b         3       NA             NA    
#> 7 Sam    a         1        0.976          0.742
#> 8 Sam    a         2       NA             NA    
#> 9 Sam    a         3       NA             NA    

# Generate all combinations with expand():
formulas <- list(
  formula1 = Sepal.Length ~ Sepal.Width,
  formula2 = Sepal.Length ~ Sepal.Width + Petal.Width,
  formula3 = Sepal.Length ~ Sepal.Width + Petal.Width + Petal.Length
)
data <- split(iris, iris$Species)
crossing(formula = formulas, data)
#> # A tibble: 9 x 2
#>   formula      data             
#>   <named list> <named list>     
#> 1 <formula>    <df[,5] [50 × 5]>
#> 2 <formula>    <df[,5] [50 × 5]>
#> 3 <formula>    <df[,5] [50 × 5]>
#> 4 <formula>    <df[,5] [50 × 5]>
#> 5 <formula>    <df[,5] [50 × 5]>
#> 6 <formula>    <df[,5] [50 × 5]>
#> 7 <formula>    <df[,5] [50 × 5]>
#> 8 <formula>    <df[,5] [50 × 5]>
#> 9 <formula>    <df[,5] [50 × 5]>

Expand data frame to include all combinations of values

Arguments

Details

See also

Examples

Contents