Skip to contents

melt is data.table's wide-to-long reshaping tool. We provide an S3 method for melting data.tables. It is written in C for speed and memory efficiency. Since v1.9.6, melt.data.table allows melting into multiple columns simultaneously.

Usage

## fast melt a data.table
# S3 method for data.table
melt(data, id.vars, measure.vars,
    variable.name = "variable", value.name = "value",
    ..., na.rm = FALSE, variable.factor = TRUE,
    value.factor = FALSE,
    verbose = getOption("datatable.verbose"))

Arguments

data

A data.table object to melt.

id.vars

vector of id variables. Can be integer (corresponding id column numbers) or character (id column names) vector. If missing, all non-measure columns will be assigned to it. If integer, must be positive; see Details.

measure.vars

Measure variables for melting. Can be missing, vector, list, or pattern-based.

  • When missing, measure.vars will become all columns outside id.vars.

  • Vector can be integer (implying column numbers) or character (column names).

  • list is a generalization of the vector version -- each element of the list (which should be integer or character as above) will become a melted column.

  • Pattern-based column matching can be achieved with the regular expression-based patterns (regex without capture groups; matching column names are used in the variable.name output column), or measure (regex with capture groups; each capture group becomes an output column).

For convenience/clarity in the case of multiple melted columns, resulting column names can be supplied as names to the elements measure.vars (in the list and patterns usages). See also Examples.

variable.name

name (default 'variable') of output column containing information about which input column(s) were melted. If measure.vars is an integer/character vector, then each entry of this column contains the name of a melted column from data. If measure.vars is a list of integer/character vectors, then each entry of this column contains an integer indicating an index/position in each of those vectors. If measure.vars has attribute variable_table then it must be a data table with nrow = length of measure.vars vector(s), each row describing the corresponding measured variables(s), (typically created via measure) and its columns will be output instead of the variable.name column.

value.name

name for the molten data values column(s). The default name is 'value'. Multiple names can be provided here for the case when measure.vars is a list, though note well that the names provided in measure.vars take precedence.

na.rm

If TRUE, NA values will be removed from the molten data.

variable.factor

If TRUE, the variable column will be converted to factor, else it will be a character column.

value.factor

If TRUE, the value column will be converted to factor, else the molten value type is left unchanged.

verbose

TRUE turns on status and information messages to the console. Turn this on by default using options(datatable.verbose=TRUE). The quantity and types of verbosity may be expanded in future.

...

any other arguments to be passed to/from other methods.

Details

If id.vars and measure.vars are both missing, all non-numeric/integer/logical columns are assigned as id variables and the rest as measure variables. If only one of id.vars or measure.vars is supplied, the rest of the columns will be assigned to the other. Both id.vars and measure.vars can have the same column more than once and the same column can be both as id and measure variables.

melt.data.table also accepts list columns for both id and measure variables.

When all measure.vars are not of the same type, they'll be coerced according to the hierarchy list > character > numeric > integer > logical. For example, if any of the measure variables is a list, then entire value column will be coerced to a list.

From version 1.9.6, melt gains a feature with measure.vars accepting a list of character or integer vectors as well to melt into multiple columns in a single function call efficiently. If a vector in the list contains missing values, or is shorter than the max length of the list elements, then the output will include runs of missing values at the specified position, or at the end. The function patterns can be used to provide regular expression patterns. When used along with melt, if cols argument is not provided, the patterns will be matched against names(data), for convenience.

Attributes are preserved if all value columns are of the same type. By default, if any of the columns to be melted are of type factor, it'll be coerced to character type. To get a factor column, set value.factor = TRUE. melt.data.table also preserves ordered factors.

Historical note: melt.data.table was originally designed as an enhancement to reshape2::melt in terms of computing and memory efficiency. reshape2 has since been superseded in favour of tidyr, and melt has had a generic defined within data.table since v1.9.6 in 2015, at which point the dependency between the packages became more etymological than programmatic. We thank the reshape2 authors for the inspiration.

Value

An unkeyed data.table containing the molten data.

Examples

set.seed(45)
require(data.table)
DT <- data.table(
  i_1 = c(1:5, NA),
  n_1 = c(NA, 6, 7, 8, 9, 10),
  f_1 = factor(sample(c(letters[1:3], NA), 6L, TRUE)),
  f_2 = factor(c("z", "a", "x", "c", "x", "x"), ordered=TRUE),
  c_1 = sample(c(letters[1:3], NA), 6L, TRUE),
  c_2 = sample(c(LETTERS[1:2], NA), 6L, TRUE),
  d_1 = as.Date(c(1:3,NA,4:5), origin="2013-09-01"),
  d_2 = as.Date(6:1, origin="2012-01-01")
)
# add a couple of list cols
DT[, l_1 := DT[, list(c=list(rep(i_1, sample(5, 1L)))), by = i_1]$c]
#>      i_1   n_1    f_1   f_2    c_1    c_2        d_1        d_2       l_1
#>    <int> <num> <fctr> <ord> <char> <char>     <Date>     <Date>    <list>
#> 1:     1    NA      a     z      c      A 2013-09-02 2012-01-07 1,1,1,1,1
#> 2:     2     6      c     a      c      B 2013-09-03 2012-01-06         2
#> 3:     3     7      b     x   <NA>      A 2013-09-04 2012-01-05       3,3
#> 4:     4     8   <NA>     c      c      A       <NA> 2012-01-04 4,4,4,4,4
#> 5:     5     9      c     x   <NA>   <NA> 2013-09-05 2012-01-03     5,5,5
#> 6:    NA    10      b     x      c   <NA> 2013-09-06 2012-01-02     NA,NA
DT[, l_2 := DT[, list(c=list(rep(c_1, sample(5, 1L)))), by = i_1]$c]
#>      i_1   n_1    f_1   f_2    c_1    c_2        d_1        d_2       l_1
#>    <int> <num> <fctr> <ord> <char> <char>     <Date>     <Date>    <list>
#> 1:     1    NA      a     z      c      A 2013-09-02 2012-01-07 1,1,1,1,1
#> 2:     2     6      c     a      c      B 2013-09-03 2012-01-06         2
#> 3:     3     7      b     x   <NA>      A 2013-09-04 2012-01-05       3,3
#> 4:     4     8   <NA>     c      c      A       <NA> 2012-01-04 4,4,4,4,4
#> 5:     5     9      c     x   <NA>   <NA> 2013-09-05 2012-01-03     5,5,5
#> 6:    NA    10      b     x      c   <NA> 2013-09-06 2012-01-02     NA,NA
#>               l_2
#>            <list>
#> 1:            c,c
#> 2:          c,c,c
#> 3: NA,NA,NA,NA,NA
#> 4:            c,c
#> 5: NA,NA,NA,NA,NA
#> 6:              c

# id.vars, measure.vars as character/integer/numeric vectors
melt(DT, id.vars=1:2, measure.vars="f_1")
#>      i_1   n_1 variable  value
#>    <int> <num>   <fctr> <char>
#> 1:     1    NA      f_1      a
#> 2:     2     6      f_1      c
#> 3:     3     7      f_1      b
#> 4:     4     8      f_1   <NA>
#> 5:     5     9      f_1      c
#> 6:    NA    10      f_1      b
melt(DT, id.vars=c("i_1", "n_1"), measure.vars=3) # same as above
#>      i_1   n_1 variable  value
#>    <int> <num>   <fctr> <char>
#> 1:     1    NA      f_1      a
#> 2:     2     6      f_1      c
#> 3:     3     7      f_1      b
#> 4:     4     8      f_1   <NA>
#> 5:     5     9      f_1      c
#> 6:    NA    10      f_1      b
melt(DT, id.vars=1:2, measure.vars=3L, value.factor=TRUE) # same, but 'value' is factor
#>      i_1   n_1 variable  value
#>    <int> <num>   <fctr> <fctr>
#> 1:     1    NA      f_1      a
#> 2:     2     6      f_1      c
#> 3:     3     7      f_1      b
#> 4:     4     8      f_1   <NA>
#> 5:     5     9      f_1      c
#> 6:    NA    10      f_1      b
melt(DT, id.vars=1:2, measure.vars=3:4, value.factor=TRUE) # 'value' is *ordered* factor
#>       i_1   n_1 variable value
#>     <int> <num>   <fctr> <ord>
#>  1:     1    NA      f_1     a
#>  2:     2     6      f_1     c
#>  3:     3     7      f_1     b
#>  4:     4     8      f_1  <NA>
#>  5:     5     9      f_1     c
#>  6:    NA    10      f_1     b
#>  7:     1    NA      f_2     z
#>  8:     2     6      f_2     a
#>  9:     3     7      f_2     x
#> 10:     4     8      f_2     c
#> 11:     5     9      f_2     x
#> 12:    NA    10      f_2     x

# preserves attribute when types are identical, ex: Date
melt(DT, id.vars=3:4, measure.vars=c("d_1", "d_2"))
#>        f_1   f_2 variable      value
#>     <fctr> <ord>   <fctr>     <Date>
#>  1:      a     z      d_1 2013-09-02
#>  2:      c     a      d_1 2013-09-03
#>  3:      b     x      d_1 2013-09-04
#>  4:   <NA>     c      d_1       <NA>
#>  5:      c     x      d_1 2013-09-05
#>  6:      b     x      d_1 2013-09-06
#>  7:      a     z      d_2 2012-01-07
#>  8:      c     a      d_2 2012-01-06
#>  9:      b     x      d_2 2012-01-05
#> 10:   <NA>     c      d_2 2012-01-04
#> 11:      c     x      d_2 2012-01-03
#> 12:      b     x      d_2 2012-01-02
melt(DT, id.vars=3:4, measure.vars=c("n_1", "d_1")) # attribute not preserved
#>        f_1   f_2 variable value
#>     <fctr> <ord>   <fctr> <num>
#>  1:      a     z      n_1    NA
#>  2:      c     a      n_1     6
#>  3:      b     x      n_1     7
#>  4:   <NA>     c      n_1     8
#>  5:      c     x      n_1     9
#>  6:      b     x      n_1    10
#>  7:      a     z      d_1 15950
#>  8:      c     a      d_1 15951
#>  9:      b     x      d_1 15952
#> 10:   <NA>     c      d_1    NA
#> 11:      c     x      d_1 15953
#> 12:      b     x      d_1 15954

# on list
melt(DT, id.vars=1, measure.vars=c("l_1", "l_2")) # value is a list
#>       i_1 variable          value
#>     <int>   <fctr>         <list>
#>  1:     1      l_1      1,1,1,1,1
#>  2:     2      l_1              2
#>  3:     3      l_1            3,3
#>  4:     4      l_1      4,4,4,4,4
#>  5:     5      l_1          5,5,5
#>  6:    NA      l_1          NA,NA
#>  7:     1      l_2            c,c
#>  8:     2      l_2          c,c,c
#>  9:     3      l_2 NA,NA,NA,NA,NA
#> 10:     4      l_2            c,c
#> 11:     5      l_2 NA,NA,NA,NA,NA
#> 12:    NA      l_2              c
suppressWarnings(
  melt(DT, id.vars=1, measure.vars=c("c_1", "l_1")) # c1 coerced to list, with warning
)
#>       i_1 variable     value
#>     <int>   <fctr>    <list>
#>  1:     1      c_1         c
#>  2:     2      c_1         c
#>  3:     3      c_1        NA
#>  4:     4      c_1         c
#>  5:     5      c_1        NA
#>  6:    NA      c_1         c
#>  7:     1      l_1 1,1,1,1,1
#>  8:     2      l_1         2
#>  9:     3      l_1       3,3
#> 10:     4      l_1 4,4,4,4,4
#> 11:     5      l_1     5,5,5
#> 12:    NA      l_1     NA,NA

# on character
melt(DT, id.vars=1, measure.vars=c("c_1", "f_1")) # value is char
#>       i_1 variable  value
#>     <int>   <fctr> <char>
#>  1:     1      c_1      c
#>  2:     2      c_1      c
#>  3:     3      c_1   <NA>
#>  4:     4      c_1      c
#>  5:     5      c_1   <NA>
#>  6:    NA      c_1      c
#>  7:     1      f_1      a
#>  8:     2      f_1      c
#>  9:     3      f_1      b
#> 10:     4      f_1   <NA>
#> 11:     5      f_1      c
#> 12:    NA      f_1      b
suppressWarnings(
  melt(DT, id.vars=1, measure.vars=c("c_1", "n_1")) # n_1 coerced to char, with warning
)
#>       i_1 variable  value
#>     <int>   <fctr> <char>
#>  1:     1      c_1      c
#>  2:     2      c_1      c
#>  3:     3      c_1   <NA>
#>  4:     4      c_1      c
#>  5:     5      c_1   <NA>
#>  6:    NA      c_1      c
#>  7:     1      n_1   <NA>
#>  8:     2      n_1      6
#>  9:     3      n_1      7
#> 10:     4      n_1      8
#> 11:     5      n_1      9
#> 12:    NA      n_1     10

# on na.rm=TRUE. NAs are removed efficiently, from within C
melt(DT, id.vars=1, measure.vars=c("c_1", "c_2"), na.rm=TRUE) # remove NA
#>      i_1 variable  value
#>    <int>   <fctr> <char>
#> 1:     1      c_1      c
#> 2:     2      c_1      c
#> 3:     4      c_1      c
#> 4:    NA      c_1      c
#> 5:     1      c_2      A
#> 6:     2      c_2      B
#> 7:     3      c_2      A
#> 8:     4      c_2      A

# measure.vars can be also a list
# melt "f_1,f_2" and "d_1,d_2" simultaneously, retain 'factor' attribute
# convenient way using internal function patterns()
melt(DT, id.vars=1:2, measure.vars=patterns("^f_", "^d_"), value.factor=TRUE)
#>       i_1   n_1 variable value1     value2
#>     <int> <num>   <fctr>  <ord>     <Date>
#>  1:     1    NA        1      a 2013-09-02
#>  2:     2     6        1      c 2013-09-03
#>  3:     3     7        1      b 2013-09-04
#>  4:     4     8        1   <NA>       <NA>
#>  5:     5     9        1      c 2013-09-05
#>  6:    NA    10        1      b 2013-09-06
#>  7:     1    NA        2      z 2012-01-07
#>  8:     2     6        2      a 2012-01-06
#>  9:     3     7        2      x 2012-01-05
#> 10:     4     8        2      c 2012-01-04
#> 11:     5     9        2      x 2012-01-03
#> 12:    NA    10        2      x 2012-01-02
# same as above, but provide list of columns directly by column names or indices
melt(DT, id.vars=1:2, measure.vars=list(3:4, c("d_1", "d_2")), value.factor=TRUE)
#>       i_1   n_1 variable value1     value2
#>     <int> <num>   <fctr>  <ord>     <Date>
#>  1:     1    NA        1      a 2013-09-02
#>  2:     2     6        1      c 2013-09-03
#>  3:     3     7        1      b 2013-09-04
#>  4:     4     8        1   <NA>       <NA>
#>  5:     5     9        1      c 2013-09-05
#>  6:    NA    10        1      b 2013-09-06
#>  7:     1    NA        2      z 2012-01-07
#>  8:     2     6        2      a 2012-01-06
#>  9:     3     7        2      x 2012-01-05
#> 10:     4     8        2      c 2012-01-04
#> 11:     5     9        2      x 2012-01-03
#> 12:    NA    10        2      x 2012-01-02
# same as above, but provide names directly:
melt(DT, id.vars=1:2, measure.vars=patterns(f="^f_", d="^d_"), value.factor=TRUE)
#>       i_1   n_1 variable     f          d
#>     <int> <num>   <fctr> <ord>     <Date>
#>  1:     1    NA        1     a 2013-09-02
#>  2:     2     6        1     c 2013-09-03
#>  3:     3     7        1     b 2013-09-04
#>  4:     4     8        1  <NA>       <NA>
#>  5:     5     9        1     c 2013-09-05
#>  6:    NA    10        1     b 2013-09-06
#>  7:     1    NA        2     z 2012-01-07
#>  8:     2     6        2     a 2012-01-06
#>  9:     3     7        2     x 2012-01-05
#> 10:     4     8        2     c 2012-01-04
#> 11:     5     9        2     x 2012-01-03
#> 12:    NA    10        2     x 2012-01-02

# na.rm=TRUE removes rows with NAs in any 'value' columns
melt(DT, id.vars=1:2, measure.vars=patterns("f_", "d_"), value.factor=TRUE, na.rm=TRUE)
#>       i_1   n_1 variable value1     value2
#>     <int> <num>   <fctr>  <ord>     <Date>
#>  1:     1    NA        1      a 2013-09-02
#>  2:     2     6        1      c 2013-09-03
#>  3:     3     7        1      b 2013-09-04
#>  4:     5     9        1      c 2013-09-05
#>  5:    NA    10        1      b 2013-09-06
#>  6:     1    NA        2      z 2012-01-07
#>  7:     2     6        2      a 2012-01-06
#>  8:     3     7        2      x 2012-01-05
#>  9:     4     8        2      c 2012-01-04
#> 10:     5     9        2      x 2012-01-03
#> 11:    NA    10        2      x 2012-01-02

# 'na.rm=TRUE' also works with list column, but note that is.na only
# returns TRUE if the list element is a length=1 vector with an NA.
is.na(list(one.NA=NA, two.NA=c(NA,NA)))
#> one.NA two.NA 
#>   TRUE  FALSE 
melt(DT, id.vars=1:2, measure.vars=patterns("l_", "d_"), na.rm=FALSE)
#>       i_1   n_1 variable         value1     value2
#>     <int> <num>   <fctr>         <list>     <Date>
#>  1:     1    NA        1      1,1,1,1,1 2013-09-02
#>  2:     2     6        1              2 2013-09-03
#>  3:     3     7        1            3,3 2013-09-04
#>  4:     4     8        1      4,4,4,4,4       <NA>
#>  5:     5     9        1          5,5,5 2013-09-05
#>  6:    NA    10        1          NA,NA 2013-09-06
#>  7:     1    NA        2            c,c 2012-01-07
#>  8:     2     6        2          c,c,c 2012-01-06
#>  9:     3     7        2 NA,NA,NA,NA,NA 2012-01-05
#> 10:     4     8        2            c,c 2012-01-04
#> 11:     5     9        2 NA,NA,NA,NA,NA 2012-01-03
#> 12:    NA    10        2              c 2012-01-02
melt(DT, id.vars=1:2, measure.vars=patterns("l_", "d_"), na.rm=TRUE)
#>       i_1   n_1 variable         value1     value2
#>     <int> <num>   <fctr>         <list>     <Date>
#>  1:     1    NA        1      1,1,1,1,1 2013-09-02
#>  2:     2     6        1              2 2013-09-03
#>  3:     3     7        1            3,3 2013-09-04
#>  4:     5     9        1          5,5,5 2013-09-05
#>  5:    NA    10        1          NA,NA 2013-09-06
#>  6:     1    NA        2            c,c 2012-01-07
#>  7:     2     6        2          c,c,c 2012-01-06
#>  8:     3     7        2 NA,NA,NA,NA,NA 2012-01-05
#>  9:     4     8        2            c,c 2012-01-04
#> 10:     5     9        2 NA,NA,NA,NA,NA 2012-01-03
#> 11:    NA    10        2              c 2012-01-02

# measure list with missing/short entries results in output with runs of NA
DT.missing.cols <- DT[, .(d_1, d_2, c_1, f_2)]
melt(DT.missing.cols, measure.vars=list(d=1:2, c="c_1", f=c(NA, "f_2")))
#>     variable          d      c      f
#>       <fctr>     <Date> <char> <char>
#>  1:        1 2013-09-02      c   <NA>
#>  2:        1 2013-09-03      c   <NA>
#>  3:        1 2013-09-04   <NA>   <NA>
#>  4:        1       <NA>      c   <NA>
#>  5:        1 2013-09-05   <NA>   <NA>
#>  6:        1 2013-09-06      c   <NA>
#>  7:        2 2012-01-07   <NA>      z
#>  8:        2 2012-01-06   <NA>      a
#>  9:        2 2012-01-05   <NA>      x
#> 10:        2 2012-01-04   <NA>      c
#> 11:        2 2012-01-03   <NA>      x
#> 12:        2 2012-01-02   <NA>      x

# specifying columns to melt via separator.
melt(DT.missing.cols, measure.vars=measure(value.name, number=as.integer, sep="_"))
#>     number          d      c      f
#>      <int>     <Date> <char> <char>
#>  1:      1 2013-09-02      c   <NA>
#>  2:      1 2013-09-03      c   <NA>
#>  3:      1 2013-09-04   <NA>   <NA>
#>  4:      1       <NA>      c   <NA>
#>  5:      1 2013-09-05   <NA>   <NA>
#>  6:      1 2013-09-06      c   <NA>
#>  7:      2 2012-01-07   <NA>      z
#>  8:      2 2012-01-06   <NA>      a
#>  9:      2 2012-01-05   <NA>      x
#> 10:      2 2012-01-04   <NA>      c
#> 11:      2 2012-01-03   <NA>      x
#> 12:      2 2012-01-02   <NA>      x

# specifying columns to melt via regex.
melt(DT.missing.cols, measure.vars=measure(value.name, number=as.integer, pattern="(.)_(.)"))
#>     number          d      c      f
#>      <int>     <Date> <char> <char>
#>  1:      1 2013-09-02      c   <NA>
#>  2:      1 2013-09-03      c   <NA>
#>  3:      1 2013-09-04   <NA>   <NA>
#>  4:      1       <NA>      c   <NA>
#>  5:      1 2013-09-05   <NA>   <NA>
#>  6:      1 2013-09-06      c   <NA>
#>  7:      2 2012-01-07   <NA>      z
#>  8:      2 2012-01-06   <NA>      a
#>  9:      2 2012-01-05   <NA>      x
#> 10:      2 2012-01-04   <NA>      c
#> 11:      2 2012-01-03   <NA>      x
#> 12:      2 2012-01-02   <NA>      x
melt(DT.missing.cols, measure.vars=measure(value.name, number=as.integer, pattern="([dc])_(.)"))
#>       f_2 number          d      c
#>     <ord>  <int>     <Date> <char>
#>  1:     z      1 2013-09-02      c
#>  2:     a      1 2013-09-03      c
#>  3:     x      1 2013-09-04   <NA>
#>  4:     c      1       <NA>      c
#>  5:     x      1 2013-09-05   <NA>
#>  6:     x      1 2013-09-06      c
#>  7:     z      2 2012-01-07   <NA>
#>  8:     a      2 2012-01-06   <NA>
#>  9:     x      2 2012-01-05   <NA>
#> 10:     c      2 2012-01-04   <NA>
#> 11:     x      2 2012-01-03   <NA>
#> 12:     x      2 2012-01-02   <NA>

# cols arg of measure can be used if you do not want to use regex
melt(DT.missing.cols, measure.vars=measure(
  value.name, number=as.integer, sep="_", cols=c("d_1","d_2","c_1")))
#>       f_2 number          d      c
#>     <ord>  <int>     <Date> <char>
#>  1:     z      1 2013-09-02      c
#>  2:     a      1 2013-09-03      c
#>  3:     x      1 2013-09-04   <NA>
#>  4:     c      1       <NA>      c
#>  5:     x      1 2013-09-05   <NA>
#>  6:     x      1 2013-09-06      c
#>  7:     z      2 2012-01-07   <NA>
#>  8:     a      2 2012-01-06   <NA>
#>  9:     x      2 2012-01-05   <NA>
#> 10:     c      2 2012-01-04   <NA>
#> 11:     x      2 2012-01-03   <NA>
#> 12:     x      2 2012-01-02   <NA>