Unite multiple columns into one by pasting strings together

Convenience function to paste together multiple columns into one.

unite(data, col, ..., sep = "_", remove = TRUE, na.rm = FALSE)

Arguments

data	A data frame.
col	The name of the new column, as a string or symbol. This argument is passed by expression and supports quasiquotation (you can unquote strings and symbols). The name is captured from the expression with `rlang::ensym()` (note that this kind of interface where symbols do not represent actual objects is now discouraged in the tidyverse; we support it here for backward compatibility).
...	A selection of columns. If empty, all variables are selected. You can supply bare variable names, select all variables between x and z with `x:z`, exclude y with `-y`. For more options, see the `dplyr::select()` documentation. See also the section on selection rules below.
sep	Separator to use between values.
remove	If `TRUE`, remove input columns from output data frame.
na.rm	If `TRUE`, missing values will be remove prior to uniting each value.

Rules for selection

Arguments for selecting columns are passed to tidyselect::vars_select() and are treated specially. Unlike other verbs, selecting functions make a strict distinction between data expressions and context expressions.

A data expression is either a bare name like x or an expression like x:y or c(x, y). In a data expression, you can only refer to columns from the data frame.
Everything else is a context expression in which you can only refer to objects that you have defined with <-.

For instance, col1:col3 is a data expression that refers to data columns, while seq(start, end) is a context expression that refers to objects from the contexts.

If you really need to refer to contextual objects from a data expression, you can unquote them with the tidy eval operator !!. This operator evaluates its argument in the context and inlines the result in the surrounding function call. For instance, c(x, !! x) selects the x column within the data frame and the column referred to by the object x defined in the context (which can contain either a column name as string or a column position).

Examples

df <- expand_grid(x = c("a", NA), y = c("b", NA))
df
#> # A tibble: 4 x 2
#>   x     y    
#>   <chr> <chr>
#> 1 a     b    
#> 2 a     <NA> 
#> 3 <NA>  b    
#> 4 <NA>  <NA> 

df %>% unite("z", x:y, remove = FALSE)
#> # A tibble: 4 x 3
#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 a_b   a     b    
#> 2 a_NA  a     <NA> 
#> 3 NA_b  <NA>  b    
#> 4 NA_NA <NA>  <NA> 
# To remove missing values:
df %>% unite("z", x:y, na.rm = TRUE, remove = FALSE)
#> # A tibble: 4 x 3
#>   z     x     y    
#>   <chr> <chr> <chr>
#> 1 "a_b" a     b    
#> 2 "a"   a     <NA> 
#> 3 "b"   <NA>  b    
#> 4 ""    <NA>  <NA> 

# Separate is almost the complement of unite
df %>%
  unite("xy", x:y) %>%
  separate(xy, c("x", "y"))
#> # A tibble: 4 x 2
#>   x     y    
#>   <chr> <chr>
#> 1 a     b    
#> 2 a     NA   
#> 3 NA    b    
#> 4 NA    NA   
# (but note `x` and `y` contain now "NA" not NA)

Unite multiple columns into one by pasting strings together

Arguments

Rules for selection

See also

Examples

Contents