Time series utility functions — expand • ICU data with R

ICU data as handled by ricu is mostly comprised of time series data and as such, several utility functions are available for working with time series data in addition to a class dedicated to representing time series data (see ts_tbl()). Some terminology to begin with: a time series is considered to have gaps if, per (combination of) ID variable value(s), some time steps are missing. Expanding and collapsing mean to change between representations where time steps are explicit or encoded as interval with start and end times. For sliding window-type operations, slide() means to iterate over time-windows, slide_index() means to iterate over certain time-windows, selected relative to the index and hop() means to iterate over time-windows selected in absolute terms.

expand(
  x,
  start_var = index_var(x),
  end_var = NULL,
  step_size = time_step(x),
  new_index = start_var,
  keep_vars = NULL,
  aggregate = FALSE
)

collapse(
  x,
  id_vars = NULL,
  index_var = NULL,
  start_var = "start",
  end_var = "end",
  env = NULL,
  as_win_tbl = TRUE,
  ...
)

has_no_gaps(x)

has_gaps(...)

is_regular(x)

fill_gaps(x, limits = collapse(x), start_var = "start", end_var = "end")

remove_gaps(x)

slide(x, expr, before, after = hours(0L), ...)

slide_index(x, expr, index, before, after = hours(0L), ...)

hop(
  x,
  expr,
  windows,
  full_window = FALSE,
  lwr_col = "min_time",
  upr_col = "max_time",
  left_closed = TRUE,
  right_closed = TRUE,
  eval_env = NULL,
  ...
)

Arguments

x: ts_tbl object to use
start_var, end_var: Name of the columns that represent lower and upper windows bounds
step_size: Controls the step size used to interpolate between start_var and end_var
new_index: Name of the new index column
keep_vars: Names of the columns to hold onto
aggregate: Function for aggregating values in overlapping intervals
id_vars, index_var: ID and index variables
env: Environment used as parent to the environment used to evaluate expressions passes as ...
as_win_tbl: Logical flag indicating whether to return a win_tbl or an id_tbl
...: Passed to hop_quo() and ultimately to data.table::[()
limits: A table with columns for lower and upper window bounds or a length 2 difftime vector
expr: Expression (quoted for *_quo and unquoted otherwise) to be evaluated over each window
before, after: Time span to look back/forward
index: A vector of times around which windows are spanned (relative to the index)
windows: An icu_tbl defining the windows to span
full_window: Logical flag controlling how the situation is handled where the sliding window extends beyond available data
lwr_col, upr_col: Names of columns (in windows) of lower/upper window bounds
left_closed, right_closed: Logical flag indicating whether intervals are closed (default) or open.
eval_env: Environment in which expr is substituted; NULL resolves to the environment in which expr was created

Value

Most functions return ts_tbl objects with the exception of has_gaps()/has_no_gaps()/is_regular(), which return logical flags.

Details

A gap in a ts_tbl object is a missing time step, i.e. a missing entry in the sequence seq(min(index), max(index), by = interval) in at least one group (as defined by id_vars(), where the extrema are calculated per group. In this case, has_gaps() will return TRUE. The function is_regular() checks whether the time series has no gaps, in addition to the object being sorted and unique (see is_sorted() and is_unique()). In order to transform a time series containing gaps into a regular time series, fill_gaps() will fill missing time steps with NA values in all data_vars() columns, while remove_gaps() provides the inverse operation of removing time steps that consist of NA values in data_vars() columns.

An expand() operation performed on an object inheriting from data.table yields a ts_tbl where time-steps encoded by columns start_var and end_var are made explicit with values in keep_vars being appropriately repeated. The inverse operation is available as collapse(), which groups by id_vars, represents index_var as group-wise extrema in two new columns start_var and end_var and allows for further data summary using .... An aspect to keep in mind when applying expand() to a win_tbl object is that values simply are repeated for all time-steps that fall into a given validity interval. This gives correct results when a win_tbl for example contains data on infusions as rates, but might not lead to correct results when infusions are represented as drug amounts administered over a given time-span. In such a scenario it might be desirable to evenly distribute the total amount over the corresponding time steps (currently not implemented).

Sliding-window type operations are available as slide(), slide_index() and hop() (function naming is inspired by the CRAN package slider). The most flexible of the three, hop takes as input a ts_tbl object x containing the data, an id_tbl object windows, containing for each ID the desired windows represented by two columns lwr_col and upr_col, as well as an expression expr to be evaluated per window. At the other end of the spectrum, slide() spans windows for every ID and available time-step using the arguments before and after, while slide_index() can be seen as a compromise between the two, where windows are spanned for certain time-points, specified by index.

Examples

tbl <- ts_tbl(x = 1:5, y = hours(1:5), z = hours(2:6), val = rnorm(5),
              index_var = "y")
exp <- expand(tbl, "y", "z", step_size = 1L, new_index = "y",
              keep_vars = c("x", "val"))
col <- collapse(exp, start_var = "y", end_var = "z", val = unique(val))
all.equal(tbl, col, check.attributes = FALSE)
#> [1] "Column 'z': Mean relative difference: 0.75"

tbl <- ts_tbl(x = rep(1:5, 1:5), y = hours(sequence(1:5)), z = 1:15)

win <- id_tbl(x = c(3, 4), a = hours(c(2, 1)), b = hours(c(3, 4)))
hop(tbl, list(z = sum(z)), win, lwr_col = "a", upr_col = "b")
#>    x       b       a  z
#> 1: 3 3 hours 2 hours 11
#> 2: 4 4 hours 1 hours 34
slide_index(tbl, list(z = sum(z)), hours(c(4, 5)), before = hours(2))
#>     x       y  z
#>  1: 1 4 hours NA
#>  2: 1 5 hours NA
#>  3: 2 4 hours  3
#>  4: 2 5 hours NA
#>  5: 3 4 hours 11
#>  6: 3 5 hours  6
#>  7: 4 4 hours 27
#>  8: 4 5 hours 19
#>  9: 5 4 hours 39
#> 10: 5 5 hours 42
slide(tbl, list(z = sum(z)), before = hours(2))
#>     x       y  z
#>  1: 1 1 hours  1
#>  2: 2 1 hours  2
#>  3: 2 2 hours  5
#>  4: 3 1 hours  4
#>  5: 3 2 hours  9
#> ---             
#> 11: 5 1 hours 11
#> 12: 5 2 hours 23
#> 13: 5 3 hours 36
#> 14: 5 4 hours 39
#> 15: 5 5 hours 42

tbl <- ts_tbl(x = rep(3:4, 3:4), y = hours(sequence(3:4)), z = 1:7)
has_no_gaps(tbl)
#> [1] TRUE
is_regular(tbl)
#> [1] TRUE

tbl[1, 2] <- hours(2)
has_no_gaps(tbl)
#> [1] TRUE
is_regular(tbl)
#> [1] FALSE

tbl[6, 2] <- hours(2)
has_no_gaps(tbl)
#> [1] FALSE
is_regular(tbl)
#> [1] FALSE