tstrsplit {data.table}R Documentation

strsplit and transpose the resulting list efficiently

Description

This is equivalent to transpose(strsplit(...)). This is a convenient wrapper function to split a column using strsplit and assign the transposed result to individual columns. See examples.

Usage

tstrsplit(x, ..., fill=NA, type.convert=FALSE, keep, names=FALSE)

Arguments

x

The vector to split (and transpose).

...

All the arguments to be passed to strsplit.

fill

Default is NA. It is used to fill shorter list elements so as to return each element of the transposed result of equal lengths.

type.convert

TRUE calls type.convert with as.is=TRUE on the columns. May also be a function, list of functions, or named list of functions to apply to each part; see examples.

keep

Specify indices corresponding to just those list elements to retain in the transposed result. Default is to return all.

names

TRUE auto names the list with V1, V2 etc. Default (FALSE) is to return an unnamed list.

Details

It internally calls strsplit first, and then transpose on the result.

names argument can be used to return an auto named list, although this argument does not have any effect when used with :=, which requires names to be provided explicitly. It might be useful in other scenarios.

Value

A transposed list after splitting by the pattern provided.

See Also

data.table, transpose, type.convert

Examples

x = c("abcde", "ghij", "klmnopq")
strsplit(x, "", fixed=TRUE)
tstrsplit(x, "", fixed=TRUE)
tstrsplit(x, "", fixed=TRUE, fill="<NA>")

# using keep to return just 1,3,5
tstrsplit(x, "", fixed=TRUE, keep=c(1,3,5))

# names argument
tstrsplit(x, "", fixed=TRUE, keep=c(1,3,5), names=LETTERS[1:3])

DT = data.table(x=c("A/B", "A", "B"), y=1:3)
DT[, c("c1") := tstrsplit(x, "/", fixed=TRUE, keep=1L)][]
DT[, c("c1", "c2") := tstrsplit(x, "/", fixed=TRUE)][]

# type.convert argument
DT = data.table(
  w = c("Yes/F", "No/M"),
  x = c("Yes 2000-03-01 A/T", "No 2000-04-01 E/R"),
  y = c("1/1/2", "2/5/2.5"),
  z = c("Yes/1/2", "No/5/3.5"),
  v = c("Yes 10 30.5 2000-03-01 A/T", "No 20 10.2 2000-04-01 E/R"))

# convert each element in the transpose list to type factor
DT[, tstrsplit(w, "/", type.convert=as.factor)]

# convert part and leave any others
DT[, tstrsplit(z, "/", type.convert=list(as.numeric=2:3))]

# convert part with one function and any others with another
DT[, tstrsplit(z, "/", type.convert=list(as.factor=1L, as.numeric))]

# convert the remaining using 'type.convert(x, as.is=TRUE)' (i.e. what type.convert=TRUE does)
DT[, tstrsplit(v, " ", type.convert=list(as.IDate=4L, function(x) type.convert(x, as.is=TRUE)))]

[Package data.table version 1.16.99 Index]