load_src.Rd
Data loading involves a cascade of S3 generic functions, which can
individually be adapted to the specifics of individual data sources. A the
lowest level, load_scr
is called, followed by load_difftime()
.
Functions up the chain, are described in load_id()
.
load_src(x, ...)
# S3 method for src_tbl
load_src(x, rows, cols = colnames(x), ...)
# S3 method for character
load_src(x, src, ...)
load_difftime(x, ...)
# S3 method for mimic_tbl
load_difftime(
x,
rows,
cols = colnames(x),
id_hint = id_vars(x),
time_vars = ricu::time_vars(x),
...
)
# S3 method for eicu_tbl
load_difftime(
x,
rows,
cols = colnames(x),
id_hint = id_vars(x),
time_vars = ricu::time_vars(x),
...
)
# S3 method for hirid_tbl
load_difftime(
x,
rows,
cols = colnames(x),
id_hint = id_vars(x),
time_vars = ricu::time_vars(x),
...
)
# S3 method for aumc_tbl
load_difftime(
x,
rows,
cols = colnames(x),
id_hint = id_vars(x),
time_vars = ricu::time_vars(x),
...
)
# S3 method for miiv_tbl
load_difftime(
x,
rows,
cols = colnames(x),
id_hint = id_vars(x),
time_vars = ricu::time_vars(x),
...
)
# S3 method for character
load_difftime(x, src, ...)
Object for which to load data
Generic consistency
Expression used for row subsetting (NSE)
Character vector of column names
Passed to as_src_tbl()
in order to determine the data source
String valued id column selection (not necessarily honored)
Character vector enumerating the columns to be treated as
timestamps and thus returned as base::difftime()
vectors
A data.table
object.
A function extending the S3 generic load_src()
is expected to load a
subset of rows/columns from a tabular data source. While the column
specification is provided as character vector of column names, the row
subsetting involves non-standard evaluation (NSE). Data-sets that are
included with ricu
are represented by prt
objects,
which use rlang::eval_tidy()
to evaluate NSE expressions. Furthermore,
prt
objects potentially represent tabular data split into partitions and
row-subsetting expressions are evaluated per partition (see the part_safe
flag in prt::subset.prt()
). The return value of load_src()
is expected
to be of type data.table
.
Timestamps are represented differently among the included data sources:
while MIMIC-III and HiRID use absolute date/times, eICU provides temporal
information as minutes relative to ICU admission. Other data sources, such
as the ICU dataset provided by Amsterdam UMC, opt for relative times as
well, but not in minutes since admission, but in milliseconds. In order to
smoothen out such discrepancies, the next function in the data loading
hierarchy is load_difftime()
. This function is expected to call
load_src()
in order to load a subset of rows/columns from a table stored
on disk and convert all columns that represent timestamps (as specified by
the argument time_vars
) into base::difftime()
vectors using mins
as
time unit.
The returned object should be of type id_tbl
, with the ID vars
identifying the ID system the times are relative to. If for example all
times are relative to ICU admission, the ICU stay ID should be returned as
ID column. The argument id_hint
may suggest an ID type, but if in the raw
data, this ID is not available, load_difftime()
may return data using a
different ID system. In MIMIC-III, for example, data in the labevents
table is available for subject_id
(patient ID) pr hadm_id
(hospital
admission ID). If data is requested for icustay_id
(ICU stay ID), this
request cannot be fulfilled and data is returned using the ID system with
the highest cardinality (among the available ones). Utilities such as
change_id()
can the later be used to resolve data to icustay_id
.
if (require(mimic.demo)) {
tbl <- mimic_demo$labevents
col <- c("charttime", "value")
load_src(tbl, itemid == 50809)
colnames(
load_src("labevents", "mimic_demo", itemid == 50809, cols = col)
)
load_difftime(tbl, itemid == 50809)
colnames(
load_difftime(tbl, itemid == 50809, col)
)
id_vars(
load_difftime(tbl, itemid == 50809, id_hint = "icustay_id")
)
id_vars(
load_difftime(tbl, itemid == 50809, id_hint = "subject_id")
)
}
#> [1] "subject_id"