expand()
is often useful in conjunction with left_join()
if
you want to convert implicit missing values to explicit missing values.
Or you can use it in conjunction with anti_join()
to figure
out which combinations are missing.
expand(data, ...) crossing(...) nesting(...)
data | A data frame. |
---|---|
... | Specification of columns to expand. Columns can be atomic vectors or lists. To find all unique combinations of x, y and z, including those not
found in the data, supply each variable as a separate argument.
To find only the combinations that occur in the data, use nest:
You can combine the two forms. For example,
For factors, the full set of levels (not just those that appear in the
data) are used. For continuous variables, you may need to fill in values
that don't appear in the data: to do so use expressions like
Length-zero (empty) elements are automatically dropped. |
crossing()
is a wrapper around expand_grid()
that deduplicates and sorts
each input. nesting()
is the complement to crossing()
: it only keeps
combinations of values that appear in the data.
complete()
for a common application of expand
:
completing a data frame with missing combinations. expand_grid()
is low-level that doesn't deduplicate or sort values.
library(dplyr) # All possible combinations of vs & cyl, even those that aren't # present in the data expand(mtcars, vs, cyl)#> # A tibble: 6 x 2 #> vs cyl #> <dbl> <dbl> #> 1 0 4 #> 2 0 6 #> 3 0 8 #> 4 1 4 #> 5 1 6 #> 6 1 8# Only combinations of vs and cyl that appear in the data expand(mtcars, nesting(vs, cyl))#> # A tibble: 5 x 2 #> vs cyl #> <dbl> <dbl> #> 1 0 4 #> 2 0 6 #> 3 0 8 #> 4 1 4 #> 5 1 6# Implicit missings --------------------------------------------------------- df <- tibble( year = c(2010, 2010, 2010, 2010, 2012, 2012, 2012), qtr = c( 1, 2, 3, 4, 1, 2, 3), return = rnorm(7) ) df %>% expand(year, qtr)#> # A tibble: 8 x 2 #> year qtr #> <dbl> <dbl> #> 1 2010 1 #> 2 2010 2 #> 3 2010 3 #> 4 2010 4 #> 5 2012 1 #> 6 2012 2 #> 7 2012 3 #> 8 2012 4df %>% expand(year = 2010:2012, qtr)#> # A tibble: 12 x 2 #> year qtr #> <int> <dbl> #> 1 2010 1 #> 2 2010 2 #> 3 2010 3 #> 4 2010 4 #> 5 2011 1 #> 6 2011 2 #> 7 2011 3 #> 8 2011 4 #> 9 2012 1 #> 10 2012 2 #> 11 2012 3 #> 12 2012 4#> # A tibble: 12 x 2 #> year qtr #> <dbl> <dbl> #> 1 2010 1 #> 2 2010 2 #> 3 2010 3 #> 4 2010 4 #> 5 2011 1 #> 6 2011 2 #> 7 2011 3 #> 8 2011 4 #> 9 2012 1 #> 10 2012 2 #> 11 2012 3 #> 12 2012 4#> # A tibble: 12 x 3 #> year qtr return #> <dbl> <dbl> <dbl> #> 1 2010 1 -1.40 #> 2 2010 2 0.255 #> 3 2010 3 -2.44 #> 4 2010 4 -0.00557 #> 5 2011 1 NA #> 6 2011 2 NA #> 7 2011 3 NA #> 8 2011 4 NA #> 9 2012 1 0.622 #> 10 2012 2 1.15 #> 11 2012 3 -1.82 #> 12 2012 4 NA# Nesting ------------------------------------------------------------------- # Each person was given one of two treatments, repeated three times # But some of the replications haven't happened yet, so we have # incomplete data: experiment <- tibble( name = rep(c("Alex", "Robert", "Sam"), c(3, 2, 1)), trt = rep(c("a", "b", "a"), c(3, 2, 1)), rep = c(1, 2, 3, 1, 2, 1), measurement_1 = runif(6), measurement_2 = runif(6) ) # We can figure out the complete set of data with expand() # Each person only gets one treatment, so we nest name and trt together: all <- experiment %>% expand(nesting(name, trt), rep) all#> # A tibble: 9 x 3 #> name trt rep #> <chr> <chr> <dbl> #> 1 Alex a 1 #> 2 Alex a 2 #> 3 Alex a 3 #> 4 Robert b 1 #> 5 Robert b 2 #> 6 Robert b 3 #> 7 Sam a 1 #> 8 Sam a 2 #> 9 Sam a 3#>#> # A tibble: 3 x 3 #> name trt rep #> <chr> <chr> <dbl> #> 1 Robert b 3 #> 2 Sam a 2 #> 3 Sam a 3# And use right_join to add in the appropriate missing values to the # original data experiment %>% right_join(all)#>#> # A tibble: 9 x 5 #> name trt rep measurement_1 measurement_2 #> <chr> <chr> <dbl> <dbl> <dbl> #> 1 Alex a 1 0.402 0.290 #> 2 Alex a 2 0.196 0.678 #> 3 Alex a 3 0.404 0.735 #> 4 Robert b 1 0.0637 0.196 #> 5 Robert b 2 0.389 0.981 #> 6 Robert b 3 NA NA #> 7 Sam a 1 0.976 0.742 #> 8 Sam a 2 NA NA #> 9 Sam a 3 NA NA#> # A tibble: 9 x 5 #> name trt rep measurement_1 measurement_2 #> <chr> <chr> <dbl> <dbl> <dbl> #> 1 Alex a 1 0.402 0.290 #> 2 Alex a 2 0.196 0.678 #> 3 Alex a 3 0.404 0.735 #> 4 Robert b 1 0.0637 0.196 #> 5 Robert b 2 0.389 0.981 #> 6 Robert b 3 NA NA #> 7 Sam a 1 0.976 0.742 #> 8 Sam a 2 NA NA #> 9 Sam a 3 NA NA# Generate all combinations with expand(): formulas <- list( formula1 = Sepal.Length ~ Sepal.Width, formula2 = Sepal.Length ~ Sepal.Width + Petal.Width, formula3 = Sepal.Length ~ Sepal.Width + Petal.Width + Petal.Length ) data <- split(iris, iris$Species) crossing(formula = formulas, data)#> # A tibble: 9 x 2 #> formula data #> <named list> <named list> #> 1 <formula> <df[,5] [50 × 5]> #> 2 <formula> <df[,5] [50 × 5]> #> 3 <formula> <df[,5] [50 × 5]> #> 4 <formula> <df[,5] [50 × 5]> #> 5 <formula> <df[,5] [50 × 5]> #> 6 <formula> <df[,5] [50 × 5]> #> 7 <formula> <df[,5] [50 × 5]> #> 8 <formula> <df[,5] [50 × 5]> #> 9 <formula> <df[,5] [50 × 5]>