id_tbl.Rd
In order to simplify handling or tabular ICU data, ricu
provides
S3 classes, id_tbl
, ts_tbl
, and win_tbl
. These classes essentially
consist of a data.table
object, alongside some meta data and S3 dispatch
is used to enable more natural behavior for some data manipulation tasks.
For example, when merging two tables, a default for the by
argument can
be chosen more sensibly if columns representing patient ID and timestamp
information can be identified.
id_tbl(..., id_vars = 1L)
is_id_tbl(x)
as_id_tbl(x, id_vars = NULL, by_ref = FALSE)
ts_tbl(..., id_vars = 1L, index_var = NULL, interval = NULL)
is_ts_tbl(x)
as_ts_tbl(x, id_vars = NULL, index_var = NULL, interval = NULL, by_ref = FALSE)
win_tbl(..., id_vars = NULL, index_var = NULL, interval = NULL, dur_var = NULL)
is_win_tbl(x)
as_win_tbl(
x,
id_vars = NULL,
index_var = NULL,
interval = NULL,
dur_var = NULL,
by_ref = FALSE
)
# S3 method for id_tbl
as.data.table(x, keep.rownames = FALSE, by_ref = FALSE, ...)
# S3 method for id_tbl
as.data.frame(x, row.names = NULL, optional = FALSE, ...)
validate_tbl(x)
forwarded to data.table::data.table()
or generic consistency
Column name(s) to be used as id
column(s)
Object to query/operate on
Logical flag indicating whether to perform the operation by reference
Column name of the index column
Time series interval length specified as scalar-valued
difftime
object
Column name of the duration column
Default is FALSE
. If TRUE
, adds the input object's names as a separate column named "rn"
. keep.rownames = "id"
names the column "id"
instead.
NULL
or a character vector giving the row
names for the data frame. Missing values are not allowed.
logical. If TRUE
, setting row names and
converting column names (to syntactic names: see
make.names
) is optional. Note that all of R's
base package as.data.frame()
methods use
optional
only for column names treatment, basically with the
meaning of data.frame(*, check.names = !optional)
.
See also the make.names
argument of the matrix
method.
Constructors id_tbl()
/ts_tbl()
/win_tbl()
, as well as coercion
functions as_id_tbl()
/as_ts_tbl()
/as_win_tbl()
return
id_tbl
/ts_tbl
/win_tbl
objects respectively,
while inheritance testers is_id_tbl()
/is_ts_tbl()
/is_win_tbl()
return
logical flags and validate_tbl()
returns either TRUE
or a string
describing the validation failure.
The introduced classes are designed for several often encountered data scenarios:
id_tbl
objects can be used to represent static (with respect to
relevant time scales) patient data such as patient age and such an object
is simply a data.table
combined with a non-zero length character vector
valued attribute marking the columns tracking patient ID information
(id_vars
). All further columns are considered as
data_vars.
ts_tbl
objects are used for grouped time series data. A data.table
object again is augmented by attributes, including a non-zero length
character vector identifying patient ID columns (id_vars),
a string, tracking the column holding time-stamps
(index_var) and a scalar difftime
object determining
the time-series step size interval. Again, all further
columns are treated as data_vars.
win_tbl
: In addition to representing grouped time-series data as does
a ts_tbl
, win_tbl
objects also encode a validity interval for each
time-stamped measurement (as dur_var). This can for example
be useful when a drug is administered at a certain infusion rate for a
given time period.
Owing to the nested structure of required meta data, ts_tbl
inherits from
id_tbl
and win_tbl
from ts_tbl
. Furthermore, both classes inherit from
data.table
. As such, data.table
reference semantics
are available for some operations, indicated by presence of a by_ref
argument. At default, value, by_ref
is set to FALSE
as this is in line
with base R behavior at the cost of potentially incurring unnecessary data
copies. Some care has to be taken when passing by_ref = TRUE
and enabling
by reference operations as this can have side effects (see examples).
For instantiating ts_tbl
objects, both index_var
and interval
can be
automatically determined if not specified. For the index column, the only
requirement is that a single difftime
column is
present, while for the time step, the minimal difference between two
consecutive observations is chosen (and all differences are therefore
required to be multiples of the minimum difference). Similarly, for a
win_tbl
, exactly two difftime
columns are required
where the first is assumed to be corresponding to the index_var
and the
second to the dur_var
.
Upon instantiation, the data might be rearranged: columns are reordered
such that ID columns are moved to the front, followed by the index column
and a data.table::key()
is set on meta columns, causing rows to be sorted
accordingly. Moving meta columns to the front is done for reasons of
convenience for printing, while setting a key on meta columns is done to
improve efficiency of subsequent transformations such as merging or grouped
operations. Furthermore, NA
values in either ID or index columns are not
allowed and therefore corresponding rows are silently removed.
Coercion between id_tbl
and ts_tbl
(and win_tbl
) by default keeps
intersecting attributes fixed and new attributes are by default inferred as
for class instantiation. Each class comes with a class-specific
implementation of the S3 generic function validate_tbl()
which returns
TRUE
if the object is considered valid or a string outlining the type of
validation failure that was encountered. Validity requires
inheriting from data.table
and unique column names
for id_tbl
that all columns specified by the non-zero length character
vector holding onto the id_vars
specification are available
for ts_tbl
that the string-valued index_var
column is available and
does not intersect with id_vars
and that the index column obeys the
specified interval.
for win_tbl
that the string-valued dur_var
corresponds to a
difftime
vector and is not among the columns marked as index or ID
variables
Finally, inheritance can be checked by calling is_id_tbl()
and
is_ts_tbl()
. Note that due to ts_tbl
inheriting from id_tbl
,
is_id_tbl()
returns TRUE
for both id_tbl
and ts_tbl
objects (and
similarly for win_tbl
), while is_ts_tbl()
only returns TRUE
for
ts_tbl
objects.
data.table
Both id_tbl
and ts_tbl
inherit from data.table
and as such, functions
intended for use with data.table
objects can be applied to id_tbl
and
ts_tbl
as well. But there are some caveats: Many functions introduced by
data.table
are not S3 generic and therefore they would have to be masked
in order to retain control over how they operate on objects inheriting form
data.table
. Take for example the function data.table::setnames()
, which
changes column names by reference. Using this function, the name of an
index column of an id_tbl
object can me changed without updating the
attribute marking the column as such and thusly leaving the object in an
inconsistent state. Instead of masking the function setnames()
, an
alternative is provided as rename_cols()
. In places where it is possible
to seamlessly insert the appropriate function (such as
base::names<-()
or base::colnames<-()
) and the responsibility for not
using data.table::setnames()
in a way that breaks the id_tbl
object is
left to the user.
Owing to data.table
heritage, one of the functions that is often called
on id_tbl
and ts_tbl
objects is base S3 generic [base::[
()]. As this
function is capable of modifying the object in a way that makes it
incompatible with attached meta data, an attempt is made at preserving as
much as possible and if all fails, a data.table
object is returned
instead of an object inheriting form id_tbl
. If for example the index
column is removed (or modified in a way that makes it incompatible with the
interval specification) from a ts_tbl
, an id_tbl
is returned. If
however the ID column is removed the only sensible thing to return is a
data.table
(see examples).
tbl <- id_tbl(a = 1:10, b = rnorm(10))
is_id_tbl(tbl)
#> [1] TRUE
is_ts_tbl(tbl)
#> [1] FALSE
dat <- data.frame(a = 1:10, b = hours(1:10), c = rnorm(10))
tbl <- as_ts_tbl(dat, "a")
is_id_tbl(tbl)
#> [1] TRUE
is_ts_tbl(tbl)
#> [1] TRUE
tmp <- as_id_tbl(tbl)
is_ts_tbl(tbl)
#> [1] TRUE
is_ts_tbl(tmp)
#> [1] FALSE
tmp <- as_id_tbl(tbl, by_ref = TRUE)
is_ts_tbl(tbl)
#> [1] FALSE
is_ts_tbl(tmp)
#> [1] FALSE
tbl <- id_tbl(a = 1:10, b = rnorm(10))
names(tbl) <- c("c", "b")
tbl
#> c b
#> 1: 1 0.65792122
#> 2: 2 -1.97460795
#> 3: 3 0.91742095
#> 4: 4 0.45130698
#> 5: 5 -0.69770467
#> 6: 6 1.23500108
#> 7: 7 0.25311800
#> 8: 8 0.01198755
#> 9: 9 -0.34960946
#> 10: 10 1.34992290
tbl <- id_tbl(a = 1:10, b = rnorm(10))
validate_tbl(data.table::setnames(tbl, c("c", "b")))
#> [1] "x does not contain column `a`"
#> attr(,"assert_class")
#> [1] "has_cols_assert"
tbl <- id_tbl(a = 1:10, b = rnorm(10))
validate_tbl(rename_cols(tbl, c("c", "b")))
#> [1] TRUE
tbl <- ts_tbl(a = rep(1:2, each = 5), b = hours(rep(1:5, 2)), c = rnorm(10))
tbl[, c("a", "c"), with = FALSE]
#> a c
#> 1: 1 -0.07805869
#> 2: 1 0.38505883
#> 3: 1 0.11761105
#> 4: 1 -1.19667589
#> 5: 1 -0.41364533
#> 6: 2 -0.12231509
#> 7: 2 -1.44613744
#> 8: 2 -0.68789827
#> 9: 2 -0.71442763
#> 10: 2 -0.04419048
tbl[, c("b", "c"), with = FALSE]
#> b c
#> 1: 1 hours -0.07805869
#> 2: 2 hours 0.38505883
#> 3: 3 hours 0.11761105
#> 4: 4 hours -1.19667589
#> 5: 5 hours -0.41364533
#> 6: 1 hours -0.12231509
#> 7: 2 hours -1.44613744
#> 8: 3 hours -0.68789827
#> 9: 4 hours -0.71442763
#> 10: 5 hours -0.04419048
tbl[, list(a, b = as.double(b), c)]
#> a b c
#> 1: 1 1 -0.07805869
#> 2: 1 2 0.38505883
#> 3: 1 3 0.11761105
#> 4: 1 4 -1.19667589
#> 5: 1 5 -0.41364533
#> 6: 2 1 -0.12231509
#> 7: 2 2 -1.44613744
#> 8: 2 3 -0.68789827
#> 9: 2 4 -0.71442763
#> 10: 2 5 -0.04419048