melt.data.table.Rd
An S3 method for melting data.table
s written in C for speed and memory
efficiency. Since v1.9.6
, melt.data.table
allows melting into
multiple columns simultaneously.
## fast melt a data.table # S3 method for data.table melt(data, id.vars, measure.vars, variable.name = "variable", value.name = "value", ..., na.rm = FALSE, variable.factor = TRUE, value.factor = FALSE, verbose = getOption("datatable.verbose"))
data | A |
---|---|
id.vars | vector of id variables. Can be integer (corresponding id column numbers) or character (id column names) vector. If missing, all non-measure columns will be assigned to it. If integer, must be positive; see Details. |
measure.vars | Measure variables for
For convenience/clarity in the case of multiple |
variable.name | name for the measured variable names column. The default name is |
value.name | name for the molten data values column(s). The default name is |
na.rm | If |
variable.factor | If |
value.factor | If |
verbose |
|
... | any other arguments to be passed to/from other methods. |
If id.vars
and measure.vars
are both missing, all
non-numeric/integer/logical
columns are assigned as id variables and
the rest as measure variables. If only one of id.vars
or
measure.vars
is supplied, the rest of the columns will be assigned to
the other. Both id.vars
and measure.vars
can have the same column
more than once and the same column can be both as id and measure variables.
melt.data.table
also accepts list
columns for both id and measure
variables.
When all measure.vars
are not of the same type, they'll be coerced
according to the hierarchy list
> character
> numeric >
integer > logical
. For example, if any of the measure variables is a
list
, then entire value column will be coerced to a list. Note that,
if the type of value
column is a list, na.rm = TRUE
will have no
effect.
From version 1.9.6
, melt
gains a feature with measure.vars
accepting a list of character
or integer
vectors as well to melt
into multiple columns in a single function call efficiently. The function
patterns
can be used to provide regular expression patterns. When
used along with melt
, if cols
argument is not provided, the
patterns will be matched against names(data)
, for convenience.
Attributes are preserved if all value
columns are of the same type. By
default, if any of the columns to be melted are of type factor
, it'll
be coerced to character
type. To get a factor
column, set
value.factor = TRUE
. melt.data.table
also preserves
ordered
factors.
Historical note: melt.data.table
was originally designed as an enhancement to reshape2::melt
in terms of computing and memory efficiency. reshape2
has since been deprecated, and melt
has had a generic defined within data.table
since v1.9.6
in 2015, at which point the dependency between the packages became more etymological than programmatic. We thank the reshape2
authors for the inspiration.
An unkeyed data.table
containing the molten data.
set.seed(45) require(data.table) DT <- data.table( i_1 = c(1:5, NA), i_2 = c(NA,6,7,8,9,10), f_1 = factor(sample(c(letters[1:3], NA), 6, TRUE)), f_2 = factor(c("z", "a", "x", "c", "x", "x"), ordered=TRUE), c_1 = sample(c(letters[1:3], NA), 6, TRUE), d_1 = as.Date(c(1:3,NA,4:5), origin="2013-09-01"), d_2 = as.Date(6:1, origin="2012-01-01")) # add a couple of list cols DT[, l_1 := DT[, list(c=list(rep(i_1, sample(5,1)))), by = i_1]$c]#> i_1 i_2 f_1 f_2 c_1 d_1 d_2 l_1 #> 1: 1 NA a z c 2013-09-02 2012-01-07 1,1,1,1,1 #> 2: 2 6 c a c 2013-09-03 2012-01-06 2,2,2,2 #> 3: 3 7 b x <NA> 2013-09-04 2012-01-05 3,3,3,3 #> 4: 4 8 <NA> c c <NA> 2012-01-04 4,4,4,4 #> 5: 5 9 c x <NA> 2013-09-05 2012-01-03 5 #> 6: NA 10 b x c 2013-09-06 2012-01-02 NA,NA,NA,NA,NA#> i_1 i_2 f_1 f_2 c_1 d_1 d_2 l_1 l_2 #> 1: 1 NA a z c 2013-09-02 2012-01-07 1,1,1,1,1 c,c,c,c #> 2: 2 6 c a c 2013-09-03 2012-01-06 2,2,2,2 c,c,c,c #> 3: 3 7 b x <NA> 2013-09-04 2012-01-05 3,3,3,3 NA,NA,NA #> 4: 4 8 <NA> c c <NA> 2012-01-04 4,4,4,4 c,c,c,c,c #> 5: 5 9 c x <NA> 2013-09-05 2012-01-03 5 NA #> 6: NA 10 b x c 2013-09-06 2012-01-02 NA,NA,NA,NA,NA c,c# id, measure as character/integer/numeric vectors melt(DT, id=1:2, measure="f_1")#> i_1 i_2 variable value #> 1: 1 NA f_1 a #> 2: 2 6 f_1 c #> 3: 3 7 f_1 b #> 4: 4 8 f_1 <NA> #> 5: 5 9 f_1 c #> 6: NA 10 f_1 b#> i_1 i_2 variable value #> 1: 1 NA f_1 a #> 2: 2 6 f_1 c #> 3: 3 7 f_1 b #> 4: 4 8 f_1 <NA> #> 5: 5 9 f_1 c #> 6: NA 10 f_1 bmelt(DT, id=1:2, measure=3L, value.factor=TRUE) # same, but 'value' is factor#> i_1 i_2 variable value #> 1: 1 NA f_1 a #> 2: 2 6 f_1 c #> 3: 3 7 f_1 b #> 4: 4 8 f_1 <NA> #> 5: 5 9 f_1 c #> 6: NA 10 f_1 bmelt(DT, id=1:2, measure=3:4, value.factor=TRUE) # 'value' is *ordered* factor#> i_1 i_2 variable value #> 1: 1 NA f_1 a #> 2: 2 6 f_1 c #> 3: 3 7 f_1 b #> 4: 4 8 f_1 <NA> #> 5: 5 9 f_1 c #> 6: NA 10 f_1 b #> 7: 1 NA f_2 z #> 8: 2 6 f_2 a #> 9: 3 7 f_2 x #> 10: 4 8 f_2 c #> 11: 5 9 f_2 x #> 12: NA 10 f_2 x#> f_1 f_2 variable value #> 1: a z d_1 2013-09-02 #> 2: c a d_1 2013-09-03 #> 3: b x d_1 2013-09-04 #> 4: <NA> c d_1 <NA> #> 5: c x d_1 2013-09-05 #> 6: b x d_1 2013-09-06 #> 7: a z d_2 2012-01-07 #> 8: c a d_2 2012-01-06 #> 9: b x d_2 2012-01-05 #> 10: <NA> c d_2 2012-01-04 #> 11: c x d_2 2012-01-03 #> 12: b x d_2 2012-01-02#> Warning: 'measure.vars' [i_1, d_1] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'double'. All measure variables not of type 'double' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.#> f_1 f_2 variable value #> 1: a z i_1 1 #> 2: c a i_1 2 #> 3: b x i_1 3 #> 4: <NA> c i_1 4 #> 5: c x i_1 5 #> 6: b x i_1 NA #> 7: a z d_1 15950 #> 8: c a d_1 15951 #> 9: b x d_1 15952 #> 10: <NA> c d_1 NA #> 11: c x d_1 15953 #> 12: b x d_1 15954#> i_1 variable value #> 1: 1 l_1 1,1,1,1,1 #> 2: 2 l_1 2,2,2,2 #> 3: 3 l_1 3,3,3,3 #> 4: 4 l_1 4,4,4,4 #> 5: 5 l_1 5 #> 6: NA l_1 NA,NA,NA,NA,NA #> 7: 1 l_2 c,c,c,c #> 8: 2 l_2 c,c,c,c #> 9: 3 l_2 NA,NA,NA #> 10: 4 l_2 c,c,c,c,c #> 11: 5 l_2 NA #> 12: NA l_2 c,c#> Warning: 'measure.vars' [c_1, l_1] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'list'. All measure variables not of type 'list' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.#> i_1 variable value #> 1: 1 c_1 c #> 2: 2 c_1 c #> 3: 3 c_1 NA #> 4: 4 c_1 c #> 5: 5 c_1 NA #> 6: NA c_1 c #> 7: 1 l_1 1,1,1,1,1 #> 8: 2 l_1 2,2,2,2 #> 9: 3 l_1 3,3,3,3 #> 10: 4 l_1 4,4,4,4 #> 11: 5 l_1 5 #> 12: NA l_1 NA,NA,NA,NA,NA#> i_1 variable value #> 1: 1 c_1 c #> 2: 2 c_1 c #> 3: 3 c_1 <NA> #> 4: 4 c_1 c #> 5: 5 c_1 <NA> #> 6: NA c_1 c #> 7: 1 f_1 a #> 8: 2 f_1 c #> 9: 3 f_1 b #> 10: 4 f_1 <NA> #> 11: 5 f_1 c #> 12: NA f_1 b#> Warning: 'measure.vars' [c_1, i_2] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'character'. All measure variables not of type 'character' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.#> i_1 variable value #> 1: 1 c_1 c #> 2: 2 c_1 c #> 3: 3 c_1 <NA> #> 4: 4 c_1 c #> 5: 5 c_1 <NA> #> 6: NA c_1 c #> 7: 1 i_2 <NA> #> 8: 2 i_2 6 #> 9: 3 i_2 7 #> 10: 4 i_2 8 #> 11: 5 i_2 9 #> 12: NA i_2 10# on na.rm=TRUE. NAs are removed efficiently, from within C melt(DT, id=1, measure=c("c_1", "i_2"), na.rm=TRUE) # remove NA#> Warning: 'measure.vars' [c_1, i_2] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'character'. All measure variables not of type 'character' will be coerced too. Check DETAILS in ?melt.data.table for more on coercion.#> i_1 variable value #> 1: 1 c_1 c #> 2: 2 c_1 c #> 3: 4 c_1 c #> 4: NA c_1 c #> 5: 2 i_2 6 #> 6: 3 i_2 7 #> 7: 4 i_2 8 #> 8: 5 i_2 9 #> 9: NA i_2 10# measure.vars can be also a list # melt "f_1,f_2" and "d_1,d_2" simultaneously, retain 'factor' attribute # convenient way using internal function patterns() melt(DT, id=1:2, measure=patterns("^f_", "^d_"), value.factor=TRUE)#> i_1 i_2 variable value1 value2 #> 1: 1 NA 1 a 2013-09-02 #> 2: 2 6 1 c 2013-09-03 #> 3: 3 7 1 b 2013-09-04 #> 4: 4 8 1 <NA> <NA> #> 5: 5 9 1 c 2013-09-05 #> 6: NA 10 1 b 2013-09-06 #> 7: 1 NA 2 z 2012-01-07 #> 8: 2 6 2 a 2012-01-06 #> 9: 3 7 2 x 2012-01-05 #> 10: 4 8 2 c 2012-01-04 #> 11: 5 9 2 x 2012-01-03 #> 12: NA 10 2 x 2012-01-02# same as above, but provide list of columns directly by column names or indices melt(DT, id=1:2, measure=list(3:4, c("d_1", "d_2")), value.factor=TRUE)#> i_1 i_2 variable value1 value2 #> 1: 1 NA 1 a 2013-09-02 #> 2: 2 6 1 c 2013-09-03 #> 3: 3 7 1 b 2013-09-04 #> 4: 4 8 1 <NA> <NA> #> 5: 5 9 1 c 2013-09-05 #> 6: NA 10 1 b 2013-09-06 #> 7: 1 NA 2 z 2012-01-07 #> 8: 2 6 2 a 2012-01-06 #> 9: 3 7 2 x 2012-01-05 #> 10: 4 8 2 c 2012-01-04 #> 11: 5 9 2 x 2012-01-03 #> 12: NA 10 2 x 2012-01-02# same as above, but provide names directly: melt(DT, id=1:2, measure=patterns(f="^f_", d="^d_"), value.factor=TRUE)#> i_1 i_2 variable f d #> 1: 1 NA 1 a 2013-09-02 #> 2: 2 6 1 c 2013-09-03 #> 3: 3 7 1 b 2013-09-04 #> 4: 4 8 1 <NA> <NA> #> 5: 5 9 1 c 2013-09-05 #> 6: NA 10 1 b 2013-09-06 #> 7: 1 NA 2 z 2012-01-07 #> 8: 2 6 2 a 2012-01-06 #> 9: 3 7 2 x 2012-01-05 #> 10: 4 8 2 c 2012-01-04 #> 11: 5 9 2 x 2012-01-03 #> 12: NA 10 2 x 2012-01-02# na.rm=TRUE removes rows with NAs in any 'value' columns melt(DT, id=1:2, measure=patterns("f_", "d_"), value.factor=TRUE, na.rm=TRUE)#> i_1 i_2 variable value1 value2 #> 1: 1 NA 1 a 2013-09-02 #> 2: 2 6 1 c 2013-09-03 #> 3: 3 7 1 b 2013-09-04 #> 4: 5 9 1 c 2013-09-05 #> 5: NA 10 1 b 2013-09-06 #> 6: 1 NA 2 z 2012-01-07 #> 7: 2 6 2 a 2012-01-06 #> 8: 3 7 2 x 2012-01-05 #> 9: 4 8 2 c 2012-01-04 #> 10: 5 9 2 x 2012-01-03 #> 11: NA 10 2 x 2012-01-02# return 'NA' for missing columns, 'na.rm=TRUE' ignored due to list column melt(DT, id=1:2, measure=patterns("l_", "c_"), na.rm=TRUE)#> i_1 i_2 variable value1 value2 #> 1: 1 NA 1 1,1,1,1,1 c #> 2: 2 6 1 2,2,2,2 c #> 3: 3 7 1 3,3,3,3 <NA> #> 4: 4 8 1 4,4,4,4 c #> 5: 5 9 1 5 <NA> #> 6: NA 10 1 NA,NA,NA,NA,NA c #> 7: 1 NA 2 c,c,c,c <NA> #> 8: 2 6 2 c,c,c,c <NA> #> 9: 3 7 2 NA,NA,NA <NA> #> 10: 4 8 2 c,c,c,c,c <NA> #> 11: 5 9 2 NA <NA> #> 12: NA 10 2 c,c <NA>