ICU datasets such as MIMIC-III or eICU typically represent patients by multiple ID systems such as patient IDs, hospital stay IDs and ICU admission IDs. Even if the raw data is available in only one such ID system, given a mapping of IDs alongside start and end times, it is possible to convert data from one ID system to another. The function change_id() provides such a conversion utility, internally either calling upgrade_id() when moving to an ID system with higher cardinality and downgrade_id() when the target ID system is of lower cardinality

change_id(x, target_id, src, ..., keep_old_id = TRUE, id_type = FALSE)

upgrade_id(x, target_id, src, cols = time_vars(x), ...)

downgrade_id(x, target_id, src, cols = time_vars(x), ...)

# S3 method for ts_tbl
upgrade_id(x, target_id, src, cols = time_vars(x), ...)

# S3 method for id_tbl
upgrade_id(x, target_id, src, cols = time_vars(x), ...)

# S3 method for ts_tbl
downgrade_id(x, target_id, src, cols = time_vars(x), ...)

# S3 method for id_tbl
downgrade_id(x, target_id, src, cols = time_vars(x), ...)



icu_tbl object for which to make the id change


The destination id name


Passed to as_id_cfg() and as_src_env()


Passed to upgrade_id()/downgrade_id()


Logical flag indicating whether to keep the previous ID column


Logical flag indicating whether target_id is specified as ID name (e.g. icustay_id on MIMIC) or ID type (e.g. icustay)


Column names that require time-adjustment


An object of the same type as x with modified IDs.


In order to provide ID system conversion for a data source, the (internal) function id_map() must be able to construct an ID mapping for that data source. Constructing such a mapping can be expensive w.r.t. the frequency it might be re-used and therefore, id_map() provides caching infrastructure. The mapping itself is constructed by the (internal) function id_map_helper(), which is expected to provide source and destination ID columns as well as start and end columns corresponding to the destination ID, relative to the source ID system. In the following example, we request for mimic_demo, with ICU stay IDs as source and hospital admissions as destination IDs.

id_map_helper(mimic_demo, "icustay_id", "hadm_id")

## # An `id_tbl`: 136 x 4
## # Id var:      `icustay_id`
##     icustay_id hadm_id hadm_id_start hadm_id_end
##          <int>   <int> <drtn>        <drtn>
##   1     201006  198503 ~17h          ~6d
##   2     201204  114648 ~23h          ~4d
##   3     203766  126949 ~1h           ~6d
##   4     204132  157609 ~23h          ~7d
##   5     204201  177678 ~17h          ~6d
## ...
## 132     295043  170883 ~18h          ~21d
## 133     295741  176805 ~23h          ~2d
## 134     296804  110244 ~2h           ~3d
## 135     297782  167612 ~23h          ~3h
## 136     298685  151323 ~23h          ~13d
## # ... with 126 more rows

Both start and end columns encode the hospital admission windows relative to each corresponding ICU stay start time. It therefore comes as no surprise that most start times are negative (hospital admission typically occurs before ICU stay start time), while end times are often days in the future (as hospital discharge typically occurs several days after ICU admission).

In order to use the ID conversion infrastructure offered by ricu for a new dataset, it typically suffices to provide an id_cfg entry in the source configuration (see load_src_cfg()), outlining the available ID systems alongside an ordering, as well as potentially a class specific implementation of id_map_helper() for the given source class, specifying the corresponding time windows in 1 minute resolution (for every possible pair of IDs).

While both up- and downgrades for id_tbl objects, as well as downgrades for ts_tbl objects are simple merge operations based on the ID mapping provided by id_map(), ID upgrades for ts_tbl objects are slightly more involved. As an example, consider the following setting: we have data associated with hadm_id IDs and times relative to hospital admission:

               1      2       3        4       5       6        7      8
data        ---*------*-------*--------*-------*-------*--------*------*---
               3h    10h     18h      27h     35h     43h      52h    59h

            0h     7h                26h        37h             53h      62h
hadm_id     |-------------------------------------------------------------|
icustay_id         |------------------|          |---------------|
                   0h                19h         0h             16h
                           ICU_1                       ICU_2

The mapping of data points from hadm_id to icustay_id is created as follows: ICU stay end times mark boundaries and all data that is recorded after the last ICU stay ended is assigned to the last ICU stay. Therefore data points 1-3 are assigned to ICU_1, while 4-8 are assigned to ICU_2. Times have to be shifted as well, as timestamps are expected to be relative to the current ID system. Data points 1-3 therefore are assigned to time stamps -4h, 3h and 11h, while data points 4-8 are assigned to -10h, -2h, 6h, 15h and 22h. Implementation-wise, the mapping is computed using an efficient data.table rolling join.


if (require(mimic.demo)) { tbl <- mimic_demo$labevents dat <- load_difftime(tbl, itemid == 50809, c("charttime", "valuenum")) dat change_id(dat, "icustay_id", tbl, keep_old_id = FALSE) }
#> Loading required package: mimic.demo
#> # An `id_tbl`: 284 ✖ 3 #> # Id var: `icustay_id` #> icustay_id charttime valuenum #> <int> <drtn> <dbl> #> 1 201006 689 mins 129 #> 2 201006 877 mins 144 #> 3 203766 726 mins 164 #> 4 203766 766 mins 185 #> 5 203766 833 mins 181 #> #> 280 295043 13746 mins 156 #> 281 295741 -125 mins 122 #> 282 296804 638 mins 105 #> 283 298685 4472 mins 88 #> 284 298685 13861 mins 138 #> # … with 274 more rows