tbl_meta.Rd
The two data classes id_tbl
and ts_tbl
, used by ricu
to represent ICU
patient data, consist of a data.table
alongside some meta data. This
includes marking columns that have special meaning and for data
representing measurements ordered in time, the step size. The following
utility functions can be used to extract columns and column names with
special meaning, as well as query a ts_tbl
object regarding its time
series related meta data.
id_vars(x)
id_var(x)
id_col(x)
index_var(x)
index_col(x)
dur_var(x)
dur_col(x)
dur_unit(x)
meta_vars(x)
data_vars(x)
data_var(x)
data_col(x)
interval(x)
time_unit(x)
time_step(x)
time_vars(x)
Object to query
Mostly column names as character vectors, in case of id_var()
,
index_var()
, data_var()
and time_unit()
of length 1, else of variable
length. Functions id_col()
, index_col()
and data_col()
return table
columns as vectors, while interval()
returns a scalar valued difftime
object and time_step()
a number.
The following functions can be used to query an object for columns or column names that represent a distinct aspect of the data:
id_vars()
: ID variables are one or more column names with the
interaction of corresponding columns identifying a grouping of the data.
Most commonly this is some sort of patient identifier.
id_var()
: This function either fails or returns a string and can
therefore be used in case only a single column provides grouping
information.
id_col()
: Again, in case only a single column provides grouping
information, this column can be extracted using this function.
index_var()
: Suitable for use as index variable is a column that encodes
a temporal ordering of observations as difftime
vector. Only a single column can be marked as index variable and this
function queries a ts_tbl
object for its name.
index_col()
: similarly to id_col()
, this function extracts the column
with the given designation. As a ts_tbl
object is required to have
exactly one column marked as index, this function always returns for
ts_tbl
objects (and fails for id_tbl
objects).
dur_var()
: For win_tbl
objects, this returns the name of the column
encoding the data validity interval.
dur_col()
: Similarly to index_col()
, this returns the difftime
vector corresponding to the dur_var()
.
meta_vars()
: For ts_tbl
objects, meta variables represent the union
of ID and index variables (for win_tbl
, this also includes the
dur_var()
), while for id_tbl
objects meta variables consist pf ID
variables.
data_vars()
: Data variables on the other hand are all columns that are
not meta variables.
data_var()
: Similarly to id_var()
, this function either returns the
name of a single data variable or fails.
data_col()
: Building on data_var()
, in situations where only a single
data variable is present, it is returned or if multiple data column
exists, an error is thrown.
time_vars()
: Time variables are all columns in an object inheriting
from data.frame
that are of type
difftime
. Therefore in a ts_tbl
object the index
column is one of (potentially) several time variables. For a win_tbl
,
however the dur_var()
is not among the time_vars()
.
interval()
: The time series interval length is represented a scalar
valued difftime
object.
time_unit()
: The time unit of the time series interval, represented by
a string such as "hours" or "mins" (see difftime
).
time_step()
: The time series step size represented by a numeric value
in the unit as returned by time_unit()
.
tbl <- id_tbl(a = rep(1:2, each = 5), b = rep(1:5, 2), c = rnorm(10),
id_vars = c("a", "b"))
id_vars(tbl)
#> [1] "a" "b"
tryCatch(id_col(tbl), error = function(...) "no luck")
#> [1] "no luck"
data_vars(tbl)
#> [1] "c"
data_col(tbl)
#> [1] -1.67729715 3.20206629 1.19129155 0.82700078 0.67559457 -0.50015815
#> [7] -1.01176763 2.06611666 0.92338063 0.01363021
tmp <- as_id_tbl(tbl, id_vars = "a")
id_vars(tmp)
#> [1] "a"
id_col(tmp)
#> [1] 1 1 1 1 1 2 2 2 2 2
tbl <- ts_tbl(a = rep(1:2, each = 5), b = hours(rep(1:5, 2)), c = rnorm(10))
index_var(tbl)
#> [1] "b"
index_col(tbl)
#> Time differences in hours
#> [1] 1 2 3 4 5 1 2 3 4 5
identical(index_var(tbl), time_vars(tbl))
#> [1] TRUE
interval(tbl)
#> Time difference of 1 hours
time_unit(tbl)
#> [1] "hours"
time_step(tbl)
#> [1] 1