Separate a character column into multiple columns using a regular expression separator

Given either regular expression or a vector of character positions, separate() turns a single character column into multiple columns.

separate(data, col, into, sep = "[^[:alnum:]]+", remove = TRUE,
  convert = FALSE, extra = "warn", fill = "warn", ...)

Arguments

data	A data frame.
col	Column name or position. This is passed to `tidyselect::vars_pull()`. This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).
into	Names of new variables to create as character vector. Use `NA` to omit the variable in the output.
sep	Separator between columns. If character, is interpreted as a regular expression. The default value is a regular expression that matches any sequence of non-alphanumeric values. If numeric, interpreted as positions to split at. Positive values start at 1 at the far-left of the string; negative value start at -1 at the far-right of the string. The length of `sep` should be one less than `into`.
remove	If `TRUE`, remove input column from output data frame.
convert	If `TRUE`, will run `type.convert()` with `as.is = TRUE` on new columns. This is useful if the component columns are integer, numeric or logical. NB: this will cause string `"NA"`s to be converted to `NA`s.
extra	If `sep` is a character vector, this controls what happens when there are too many pieces. There are three valid options: "warn" (the default): emit a warning and drop extra values. "drop": drop any extra values without a warning. "merge": only splits at most `length(into)` times
fill	If `sep` is a character vector, this controls what happens when there are not enough pieces. There are three valid options: "warn" (the default): emit a warning and fill from the right "right": fill with missing values on the right "left": fill with missing values on the left
...	Additional arguments passed on to methods.

Examples

library(dplyr)
df <- data.frame(x = c(NA, "a.b", "a.d", "b.c"))
df %>% separate(x, c("A", "B"))
#>      A    B
#> 1 <NA> <NA>
#> 2    a    b
#> 3    a    d
#> 4    b    c

# If you just want the second variable:
df %>% separate(x, c(NA, "B"))
#>      B
#> 1 <NA>
#> 2    b
#> 3    d
#> 4    c

# If every row doesn't split into the same number of pieces, use
# the extra and fill arguments to control what happens
df <- data.frame(x = c("a", "a b", "a b c", NA))
df %>% separate(x, c("a", "b"))
#> Warning: Expected 2 pieces. Additional pieces discarded in 1 rows [3].
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [1].
#>      a    b
#> 1    a <NA>
#> 2    a    b
#> 3    a    b
#> 4 <NA> <NA>
# The same behaviour drops the c but no warnings
df %>% separate(x, c("a", "b"), extra = "drop", fill = "right")
#>      a    b
#> 1    a <NA>
#> 2    a    b
#> 3    a    b
#> 4 <NA> <NA>
# Another option:
df %>% separate(x, c("a", "b"), extra = "merge", fill = "left")
#>      a    b
#> 1 <NA>    a
#> 2    a    b
#> 3    a  b c
#> 4 <NA> <NA>
# Or you can keep all three
df %>% separate(x, c("a", "b", "c"))
#> Warning: Expected 3 pieces. Missing pieces filled with `NA` in 2 rows [1, 2].
#>      a    b    c
#> 1    a <NA> <NA>
#> 2    a    b <NA>
#> 3    a    b    c
#> 4 <NA> <NA> <NA>

# If only want to split specified number of times use extra = "merge"
df <- data.frame(x = c("x: 123", "y: error: 7"))
df %>% separate(x, c("key", "value"), ": ", extra = "merge")
#>   key    value
#> 1   x      123
#> 2   y error: 7

# Use regular expressions to separate on multiple characters:
df <- data.frame(x = c(NA, "a?b", "a.d", "b:c"))
df %>% separate(x, c("A","B"), sep = "([\\.\\?\\:])")
#>      A    B
#> 1 <NA> <NA>
#> 2    a    b
#> 3    a    d
#> 4    b    c

# convert = TRUE detects column classes
df <- data.frame(x = c("a:1", "a:2", "c:4", "d", NA))
df %>% separate(x, c("key","value"), ":") %>% str
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4].
#> 'data.frame':	5 obs. of  2 variables:
#>  $ key  : chr  "a" "a" "c" "d" ...
#>  $ value: chr  "1" "2" "4" NA ...
df %>% separate(x, c("key","value"), ":", convert = TRUE) %>% str
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4].
#> 'data.frame':	5 obs. of  2 variables:
#>  $ key  : chr  "a" "a" "c" "d" ...
#>  $ value: int  1 2 4 NA NA

# Argument col can take quasiquotation to work with strings
var <- "x"
df %>% separate(!!var, c("key","value"), ":")
#> Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [4].
#>    key value
#> 1    a     1
#> 2    a     2
#> 3    c     4
#> 4    d  <NA>
#> 5 <NA>  <NA>

Separate a character column into multiple columns using a regular expression separator

Arguments

See also

Examples

Contents