Fast melt for data.table
melt.data.table.Rd
melt
is data.table
's wide-to-long reshaping tool.
We provide an S3 method for melting data.table
s. It is written in C for speed and memory
efficiency. Since v1.9.6
, melt.data.table
allows melting into
multiple columns simultaneously.
Usage
## fast melt a data.table
# S3 method for data.table
melt(data, id.vars, measure.vars,
variable.name = "variable", value.name = "value",
..., na.rm = FALSE, variable.factor = TRUE,
value.factor = FALSE,
verbose = getOption("datatable.verbose"))
Arguments
- data
A
data.table
object to melt.- id.vars
vector of id variables. Can be integer (corresponding id column numbers) or character (id column names) vector. If missing, all non-measure columns will be assigned to it. If integer, must be positive; see Details.
- measure.vars
Measure variables for
melt
ing. Can be missing, vector, list, or pattern-based.When missing,
measure.vars
will become all columns outsideid.vars
.Vector can be
integer
(implying column numbers) orcharacter
(column names).list
is a generalization of the vector version -- each element of the list (which should beinteger
orcharacter
as above) will become amelt
ed column.Pattern-based column matching can be achieved with the regular expression-based
patterns
(regex without capture groups; matching column names are used in thevariable.name
output column), ormeasure
(regex with capture groups; each capture group becomes an output column).
For convenience/clarity in the case of multiple
melt
ed columns, resulting column names can be supplied as names to the elementsmeasure.vars
(in thelist
andpatterns
usages). See alsoExamples
.- variable.name
name (default
'variable'
) of output column containing information about which input column(s) were melted. Ifmeasure.vars
is an integer/character vector, then each entry of this column contains the name of a melted column fromdata
. Ifmeasure.vars
is a list of integer/character vectors, then each entry of this column contains an integer indicating an index/position in each of those vectors. Ifmeasure.vars
has attributevariable_table
then it must be a data table with nrow = length ofmeasure.vars
vector(s), each row describing the corresponding measured variables(s), (typically created viameasure
) and its columns will be output instead of thevariable.name
column.- value.name
name for the molten data values column(s). The default name is
'value'
. Multiple names can be provided here for the case whenmeasure.vars
is alist
, though note well that the names provided inmeasure.vars
take precedence.- na.rm
If
TRUE
,NA
values will be removed from the molten data.- variable.factor
If
TRUE
, thevariable
column will be converted tofactor
, else it will be acharacter
column.- value.factor
If
TRUE
, thevalue
column will be converted tofactor
, else the molten value type is left unchanged.- verbose
TRUE
turns on status and information messages to the console. Turn this on by default usingoptions(datatable.verbose=TRUE)
. The quantity and types of verbosity may be expanded in future.- ...
any other arguments to be passed to/from other methods.
Details
If id.vars
and measure.vars
are both missing, all
non-numeric/integer/logical
columns are assigned as id variables and
the rest as measure variables. If only one of id.vars
or
measure.vars
is supplied, the rest of the columns will be assigned to
the other. Both id.vars
and measure.vars
can have the same column
more than once and the same column can be both as id and measure variables.
melt.data.table
also accepts list
columns for both id and measure
variables.
When all measure.vars
are not of the same type, they'll be coerced
according to the hierarchy list
> character
> numeric >
integer > logical
. For example, if any of the measure variables is a
list
, then entire value column will be coerced to a list.
From version 1.9.6
, melt
gains a feature with measure.vars
accepting a list of character
or integer
vectors as well to melt
into multiple columns in a single function call efficiently.
If a vector in the list contains missing values, or is shorter than the
max length of the list elements, then the output will include runs of
missing values at the specified position, or at the end.
The functions
patterns
and measure
can be used to provide regular expression patterns. When
used along with melt
, if cols
argument is not provided, the
patterns will be matched against names(data)
, for convenience.
Attributes are preserved if all value
columns are of the same type. By
default, if any of the columns to be melted are of type factor
, it'll
be coerced to character
type. To get a factor
column, set
value.factor = TRUE
. melt.data.table
also preserves
ordered
factors.
Historical note: melt.data.table
was originally designed as an enhancement to reshape2::melt
in terms of computing and memory efficiency. reshape2
has since been superseded in favour of tidyr
, and melt
has had a generic defined within data.table
since v1.9.6
in 2015, at which point the dependency between the packages became more etymological than programmatic. We thank the reshape2
authors for the inspiration.
Examples
set.seed(45)
require(data.table)
DT <- data.table(
i_1 = c(1:5, NA),
n_1 = c(NA, 6, 7, 8, 9, 10),
f_1 = factor(sample(c(letters[1:3], NA), 6L, TRUE)),
f_2 = factor(c("z", "a", "x", "c", "x", "x"), ordered=TRUE),
c_1 = sample(c(letters[1:3], NA), 6L, TRUE),
c_2 = sample(c(LETTERS[1:2], NA), 6L, TRUE),
d_1 = as.Date(c(1:3,NA,4:5), origin="2013-09-01"),
d_2 = as.Date(6:1, origin="2012-01-01")
)
# add a couple of list cols
DT[, l_1 := DT[, list(c=list(rep(i_1, sample(5, 1L)))), by = i_1]$c]
#> i_1 n_1 f_1 f_2 c_1 c_2 d_1 d_2 l_1
#> <int> <num> <fctr> <ord> <char> <char> <Date> <Date> <list>
#> 1: 1 NA a z c A 2013-09-02 2012-01-07 1,1,1,1,1
#> 2: 2 6 c a c B 2013-09-03 2012-01-06 2
#> 3: 3 7 b x <NA> A 2013-09-04 2012-01-05 3,3
#> 4: 4 8 <NA> c c A <NA> 2012-01-04 4,4,4,4,4
#> 5: 5 9 c x <NA> <NA> 2013-09-05 2012-01-03 5,5,5
#> 6: NA 10 b x c <NA> 2013-09-06 2012-01-02 NA,NA
DT[, l_2 := DT[, list(c=list(rep(c_1, sample(5, 1L)))), by = i_1]$c]
#> i_1 n_1 f_1 f_2 c_1 c_2 d_1 d_2 l_1
#> <int> <num> <fctr> <ord> <char> <char> <Date> <Date> <list>
#> 1: 1 NA a z c A 2013-09-02 2012-01-07 1,1,1,1,1
#> 2: 2 6 c a c B 2013-09-03 2012-01-06 2
#> 3: 3 7 b x <NA> A 2013-09-04 2012-01-05 3,3
#> 4: 4 8 <NA> c c A <NA> 2012-01-04 4,4,4,4,4
#> 5: 5 9 c x <NA> <NA> 2013-09-05 2012-01-03 5,5,5
#> 6: NA 10 b x c <NA> 2013-09-06 2012-01-02 NA,NA
#> l_2
#> <list>
#> 1: c,c
#> 2: c,c,c
#> 3: NA,NA,NA,NA,NA
#> 4: c,c
#> 5: NA,NA,NA,NA,NA
#> 6: c
# id.vars, measure.vars as character/integer/numeric vectors
melt(DT, id.vars=1:2, measure.vars="f_1")
#> i_1 n_1 variable value
#> <int> <num> <fctr> <char>
#> 1: 1 NA f_1 a
#> 2: 2 6 f_1 c
#> 3: 3 7 f_1 b
#> 4: 4 8 f_1 <NA>
#> 5: 5 9 f_1 c
#> 6: NA 10 f_1 b
melt(DT, id.vars=c("i_1", "n_1"), measure.vars=3) # same as above
#> i_1 n_1 variable value
#> <int> <num> <fctr> <char>
#> 1: 1 NA f_1 a
#> 2: 2 6 f_1 c
#> 3: 3 7 f_1 b
#> 4: 4 8 f_1 <NA>
#> 5: 5 9 f_1 c
#> 6: NA 10 f_1 b
melt(DT, id.vars=1:2, measure.vars=3L, value.factor=TRUE) # same, but 'value' is factor
#> i_1 n_1 variable value
#> <int> <num> <fctr> <fctr>
#> 1: 1 NA f_1 a
#> 2: 2 6 f_1 c
#> 3: 3 7 f_1 b
#> 4: 4 8 f_1 <NA>
#> 5: 5 9 f_1 c
#> 6: NA 10 f_1 b
melt(DT, id.vars=1:2, measure.vars=3:4, value.factor=TRUE) # 'value' is *ordered* factor
#> i_1 n_1 variable value
#> <int> <num> <fctr> <ord>
#> 1: 1 NA f_1 a
#> 2: 2 6 f_1 c
#> 3: 3 7 f_1 b
#> 4: 4 8 f_1 <NA>
#> 5: 5 9 f_1 c
#> 6: NA 10 f_1 b
#> 7: 1 NA f_2 z
#> 8: 2 6 f_2 a
#> 9: 3 7 f_2 x
#> 10: 4 8 f_2 c
#> 11: 5 9 f_2 x
#> 12: NA 10 f_2 x
# preserves attribute when types are identical, ex: Date
melt(DT, id.vars=3:4, measure.vars=c("d_1", "d_2"))
#> f_1 f_2 variable value
#> <fctr> <ord> <fctr> <Date>
#> 1: a z d_1 2013-09-02
#> 2: c a d_1 2013-09-03
#> 3: b x d_1 2013-09-04
#> 4: <NA> c d_1 <NA>
#> 5: c x d_1 2013-09-05
#> 6: b x d_1 2013-09-06
#> 7: a z d_2 2012-01-07
#> 8: c a d_2 2012-01-06
#> 9: b x d_2 2012-01-05
#> 10: <NA> c d_2 2012-01-04
#> 11: c x d_2 2012-01-03
#> 12: b x d_2 2012-01-02
melt(DT, id.vars=3:4, measure.vars=c("n_1", "d_1")) # attribute not preserved
#> f_1 f_2 variable value
#> <fctr> <ord> <fctr> <num>
#> 1: a z n_1 NA
#> 2: c a n_1 6
#> 3: b x n_1 7
#> 4: <NA> c n_1 8
#> 5: c x n_1 9
#> 6: b x n_1 10
#> 7: a z d_1 15950
#> 8: c a d_1 15951
#> 9: b x d_1 15952
#> 10: <NA> c d_1 NA
#> 11: c x d_1 15953
#> 12: b x d_1 15954
# on list
melt(DT, id.vars=1, measure.vars=c("l_1", "l_2")) # value is a list
#> i_1 variable value
#> <int> <fctr> <list>
#> 1: 1 l_1 1,1,1,1,1
#> 2: 2 l_1 2
#> 3: 3 l_1 3,3
#> 4: 4 l_1 4,4,4,4,4
#> 5: 5 l_1 5,5,5
#> 6: NA l_1 NA,NA
#> 7: 1 l_2 c,c
#> 8: 2 l_2 c,c,c
#> 9: 3 l_2 NA,NA,NA,NA,NA
#> 10: 4 l_2 c,c
#> 11: 5 l_2 NA,NA,NA,NA,NA
#> 12: NA l_2 c
suppressWarnings(
melt(DT, id.vars=1, measure.vars=c("c_1", "l_1")) # c1 coerced to list, with warning
)
#> i_1 variable value
#> <int> <fctr> <list>
#> 1: 1 c_1 c
#> 2: 2 c_1 c
#> 3: 3 c_1 NA
#> 4: 4 c_1 c
#> 5: 5 c_1 NA
#> 6: NA c_1 c
#> 7: 1 l_1 1,1,1,1,1
#> 8: 2 l_1 2
#> 9: 3 l_1 3,3
#> 10: 4 l_1 4,4,4,4,4
#> 11: 5 l_1 5,5,5
#> 12: NA l_1 NA,NA
# on character
melt(DT, id.vars=1, measure.vars=c("c_1", "f_1")) # value is char
#> i_1 variable value
#> <int> <fctr> <char>
#> 1: 1 c_1 c
#> 2: 2 c_1 c
#> 3: 3 c_1 <NA>
#> 4: 4 c_1 c
#> 5: 5 c_1 <NA>
#> 6: NA c_1 c
#> 7: 1 f_1 a
#> 8: 2 f_1 c
#> 9: 3 f_1 b
#> 10: 4 f_1 <NA>
#> 11: 5 f_1 c
#> 12: NA f_1 b
suppressWarnings(
melt(DT, id.vars=1, measure.vars=c("c_1", "n_1")) # n_1 coerced to char, with warning
)
#> i_1 variable value
#> <int> <fctr> <char>
#> 1: 1 c_1 c
#> 2: 2 c_1 c
#> 3: 3 c_1 <NA>
#> 4: 4 c_1 c
#> 5: 5 c_1 <NA>
#> 6: NA c_1 c
#> 7: 1 n_1 <NA>
#> 8: 2 n_1 6
#> 9: 3 n_1 7
#> 10: 4 n_1 8
#> 11: 5 n_1 9
#> 12: NA n_1 10
# on na.rm=TRUE. NAs are removed efficiently, from within C
melt(DT, id.vars=1, measure.vars=c("c_1", "c_2"), na.rm=TRUE) # remove NA
#> i_1 variable value
#> <int> <fctr> <char>
#> 1: 1 c_1 c
#> 2: 2 c_1 c
#> 3: 4 c_1 c
#> 4: NA c_1 c
#> 5: 1 c_2 A
#> 6: 2 c_2 B
#> 7: 3 c_2 A
#> 8: 4 c_2 A
# measure.vars can be also a list
# melt "f_1,f_2" and "d_1,d_2" simultaneously, retain 'factor' attribute
# convenient way using internal function patterns()
melt(DT, id.vars=1:2, measure.vars=patterns("^f_", "^d_"), value.factor=TRUE)
#> i_1 n_1 variable value1 value2
#> <int> <num> <fctr> <ord> <Date>
#> 1: 1 NA 1 a 2013-09-02
#> 2: 2 6 1 c 2013-09-03
#> 3: 3 7 1 b 2013-09-04
#> 4: 4 8 1 <NA> <NA>
#> 5: 5 9 1 c 2013-09-05
#> 6: NA 10 1 b 2013-09-06
#> 7: 1 NA 2 z 2012-01-07
#> 8: 2 6 2 a 2012-01-06
#> 9: 3 7 2 x 2012-01-05
#> 10: 4 8 2 c 2012-01-04
#> 11: 5 9 2 x 2012-01-03
#> 12: NA 10 2 x 2012-01-02
# same as above, but provide list of columns directly by column names or indices
melt(DT, id.vars=1:2, measure.vars=list(3:4, c("d_1", "d_2")), value.factor=TRUE)
#> i_1 n_1 variable value1 value2
#> <int> <num> <fctr> <ord> <Date>
#> 1: 1 NA 1 a 2013-09-02
#> 2: 2 6 1 c 2013-09-03
#> 3: 3 7 1 b 2013-09-04
#> 4: 4 8 1 <NA> <NA>
#> 5: 5 9 1 c 2013-09-05
#> 6: NA 10 1 b 2013-09-06
#> 7: 1 NA 2 z 2012-01-07
#> 8: 2 6 2 a 2012-01-06
#> 9: 3 7 2 x 2012-01-05
#> 10: 4 8 2 c 2012-01-04
#> 11: 5 9 2 x 2012-01-03
#> 12: NA 10 2 x 2012-01-02
# same as above, but provide names directly:
melt(DT, id.vars=1:2, measure.vars=patterns(f="^f_", d="^d_"), value.factor=TRUE)
#> i_1 n_1 variable f d
#> <int> <num> <fctr> <ord> <Date>
#> 1: 1 NA 1 a 2013-09-02
#> 2: 2 6 1 c 2013-09-03
#> 3: 3 7 1 b 2013-09-04
#> 4: 4 8 1 <NA> <NA>
#> 5: 5 9 1 c 2013-09-05
#> 6: NA 10 1 b 2013-09-06
#> 7: 1 NA 2 z 2012-01-07
#> 8: 2 6 2 a 2012-01-06
#> 9: 3 7 2 x 2012-01-05
#> 10: 4 8 2 c 2012-01-04
#> 11: 5 9 2 x 2012-01-03
#> 12: NA 10 2 x 2012-01-02
# na.rm=TRUE removes rows with NAs in any 'value' columns
melt(DT, id.vars=1:2, measure.vars=patterns("f_", "d_"), value.factor=TRUE, na.rm=TRUE)
#> i_1 n_1 variable value1 value2
#> <int> <num> <fctr> <ord> <Date>
#> 1: 1 NA 1 a 2013-09-02
#> 2: 2 6 1 c 2013-09-03
#> 3: 3 7 1 b 2013-09-04
#> 4: 5 9 1 c 2013-09-05
#> 5: NA 10 1 b 2013-09-06
#> 6: 1 NA 2 z 2012-01-07
#> 7: 2 6 2 a 2012-01-06
#> 8: 3 7 2 x 2012-01-05
#> 9: 4 8 2 c 2012-01-04
#> 10: 5 9 2 x 2012-01-03
#> 11: NA 10 2 x 2012-01-02
# 'na.rm=TRUE' also works with list column, but note that is.na only
# returns TRUE if the list element is a length=1 vector with an NA.
is.na(list(one.NA=NA, two.NA=c(NA,NA)))
#> one.NA two.NA
#> TRUE FALSE
melt(DT, id.vars=1:2, measure.vars=patterns("l_", "d_"), na.rm=FALSE)
#> i_1 n_1 variable value1 value2
#> <int> <num> <fctr> <list> <Date>
#> 1: 1 NA 1 1,1,1,1,1 2013-09-02
#> 2: 2 6 1 2 2013-09-03
#> 3: 3 7 1 3,3 2013-09-04
#> 4: 4 8 1 4,4,4,4,4 <NA>
#> 5: 5 9 1 5,5,5 2013-09-05
#> 6: NA 10 1 NA,NA 2013-09-06
#> 7: 1 NA 2 c,c 2012-01-07
#> 8: 2 6 2 c,c,c 2012-01-06
#> 9: 3 7 2 NA,NA,NA,NA,NA 2012-01-05
#> 10: 4 8 2 c,c 2012-01-04
#> 11: 5 9 2 NA,NA,NA,NA,NA 2012-01-03
#> 12: NA 10 2 c 2012-01-02
melt(DT, id.vars=1:2, measure.vars=patterns("l_", "d_"), na.rm=TRUE)
#> i_1 n_1 variable value1 value2
#> <int> <num> <fctr> <list> <Date>
#> 1: 1 NA 1 1,1,1,1,1 2013-09-02
#> 2: 2 6 1 2 2013-09-03
#> 3: 3 7 1 3,3 2013-09-04
#> 4: 5 9 1 5,5,5 2013-09-05
#> 5: NA 10 1 NA,NA 2013-09-06
#> 6: 1 NA 2 c,c 2012-01-07
#> 7: 2 6 2 c,c,c 2012-01-06
#> 8: 3 7 2 NA,NA,NA,NA,NA 2012-01-05
#> 9: 4 8 2 c,c 2012-01-04
#> 10: 5 9 2 NA,NA,NA,NA,NA 2012-01-03
#> 11: NA 10 2 c 2012-01-02
# measure list with missing/short entries results in output with runs of NA
DT.missing.cols <- DT[, .(d_1, d_2, c_1, f_2)]
melt(DT.missing.cols, measure.vars=list(d=1:2, c="c_1", f=c(NA, "f_2")))
#> variable d c f
#> <fctr> <Date> <char> <char>
#> 1: 1 2013-09-02 c <NA>
#> 2: 1 2013-09-03 c <NA>
#> 3: 1 2013-09-04 <NA> <NA>
#> 4: 1 <NA> c <NA>
#> 5: 1 2013-09-05 <NA> <NA>
#> 6: 1 2013-09-06 c <NA>
#> 7: 2 2012-01-07 <NA> z
#> 8: 2 2012-01-06 <NA> a
#> 9: 2 2012-01-05 <NA> x
#> 10: 2 2012-01-04 <NA> c
#> 11: 2 2012-01-03 <NA> x
#> 12: 2 2012-01-02 <NA> x
# specifying columns to melt via separator.
melt(DT.missing.cols, measure.vars=measure(value.name, number=as.integer, sep="_"))
#> number d c f
#> <int> <Date> <char> <char>
#> 1: 1 2013-09-02 c <NA>
#> 2: 1 2013-09-03 c <NA>
#> 3: 1 2013-09-04 <NA> <NA>
#> 4: 1 <NA> c <NA>
#> 5: 1 2013-09-05 <NA> <NA>
#> 6: 1 2013-09-06 c <NA>
#> 7: 2 2012-01-07 <NA> z
#> 8: 2 2012-01-06 <NA> a
#> 9: 2 2012-01-05 <NA> x
#> 10: 2 2012-01-04 <NA> c
#> 11: 2 2012-01-03 <NA> x
#> 12: 2 2012-01-02 <NA> x
# specifying columns to melt via regex.
melt(DT.missing.cols, measure.vars=measure(value.name, number=as.integer, pattern="(.)_(.)"))
#> number d c f
#> <int> <Date> <char> <char>
#> 1: 1 2013-09-02 c <NA>
#> 2: 1 2013-09-03 c <NA>
#> 3: 1 2013-09-04 <NA> <NA>
#> 4: 1 <NA> c <NA>
#> 5: 1 2013-09-05 <NA> <NA>
#> 6: 1 2013-09-06 c <NA>
#> 7: 2 2012-01-07 <NA> z
#> 8: 2 2012-01-06 <NA> a
#> 9: 2 2012-01-05 <NA> x
#> 10: 2 2012-01-04 <NA> c
#> 11: 2 2012-01-03 <NA> x
#> 12: 2 2012-01-02 <NA> x
melt(DT.missing.cols, measure.vars=measure(value.name, number=as.integer, pattern="([dc])_(.)"))
#> f_2 number d c
#> <ord> <int> <Date> <char>
#> 1: z 1 2013-09-02 c
#> 2: a 1 2013-09-03 c
#> 3: x 1 2013-09-04 <NA>
#> 4: c 1 <NA> c
#> 5: x 1 2013-09-05 <NA>
#> 6: x 1 2013-09-06 c
#> 7: z 2 2012-01-07 <NA>
#> 8: a 2 2012-01-06 <NA>
#> 9: x 2 2012-01-05 <NA>
#> 10: c 2 2012-01-04 <NA>
#> 11: x 2 2012-01-03 <NA>
#> 12: x 2 2012-01-02 <NA>
# cols arg of measure can be used if you do not want to use regex
melt(DT.missing.cols, measure.vars=measure(
value.name, number=as.integer, sep="_", cols=c("d_1","d_2","c_1")))
#> f_2 number d c
#> <ord> <int> <Date> <char>
#> 1: z 1 2013-09-02 c
#> 2: a 1 2013-09-03 c
#> 3: x 1 2013-09-04 <NA>
#> 4: c 1 <NA> c
#> 5: x 1 2013-09-05 <NA>
#> 6: x 1 2013-09-06 c
#> 7: z 2 2012-01-07 <NA>
#> 8: a 2 2012-01-06 <NA>
#> 9: x 2 2012-01-05 <NA>
#> 10: c 2 2012-01-04 <NA>
#> 11: x 2 2012-01-03 <NA>
#> 12: x 2 2012-01-02 <NA>