Sepsis-3 • icubdc

Medical Background

Sepsis is a life-threatening reaction to an infection that causes damage to different tissues and organs in the body. It remains a major public health issue associated with high mortality, morbidity, and related health costs [1–4]. The current clinical gold standard for identifying sepsis is the so-called Sepsis-3 consensus, which highlighted the central role of organ dysfunction in the definition, identification, prognostication, and pathophysiological understanding of sepsis [5].

After a sepsis onset, delay in effective antimicrobial therapy, significantly increases mortality [ferrer2014, pruinelli2018, seymour2017]. Furthermore, sepsis is a complex and heterogeneous syndrome, potentially reversible in its early stages, yet difficult to identify, while its more advanced stages become easier to recognize, but also more challenging to successfully treat [7]. Unfortunately, identifying bacterial species in blood samples can take up to 48 hours [8], causing significant and potentially detrimental delays in confirming a suspected infection. Meanwhile, an abundance of clinical and laboratory data is being routinely collected, the richest set of which is accumulated in the intensive care unit (ICU). While it has become harder for intensivists to manually process the increasing quantities of patient information [9], machine learning (ML) systems have the potential to leverage this data in order to provide alarm systems that can aid the clinician in recognizing sepsis in its early stages, which can possibly lead to earlier treatment and better patient outcomes in the ICU.

Suggested Prediction Problem

According to sepsis-3 consensus, sepsis is defined by the co-occurrence of the following two:

suspected infection (defined as antibiotic treatment and body fluid sampling within a specified time window),
an acute increase in the SOFA [10] score which measures the degree of organ dysfunction.

A graphical representation of how a sepsis-3 onset might occur in the ICU.

In particular, as the above figure indicates, suspected infection (SI, for short), is defined as:

antibiotic administration (labeled ABX) at time \(t_{abx}\) followed by body fluid sampling (Sampling, at time \(t_{samp}\)) within a 24 hour window
body fluid sampling followed by antibiotic administration within a 72 hour window

The earlier of the two times is taken as the time of suspected infection. Then, a 72 hour window is spanned around the SI time (48 hours prior, 24 hours after). Within this SI window, the level of the SOFA score is monitored, and an increase of 2 or more points defines the onset of sepsis, and this time is labeled \(t_{sep3}\) (the earliest such time is taken, if any exists, and subsequent increases are ignored).

Sepsis-3 is not a stopping time

The onset of sepsis-3 is not a stopping time [11]. That is, at time when the sepsis-3 onset occurs, it is not necessarily possible to verify that the onset occurred (for example, consider a SOFA increase of 2 points at 0 hours, followed by body fluid sampling at 24 hours, and antibiotic treatment at 48 hours; in this case, the sepsis-3 event occurring at 0 hours is only confirmed 48 hours later).

The following parameters are especially relevant for the prediction problem: \[\begin{align*} L &= \text{how early prior to onset is the prediction valid} \\ R &= \text{how much in advance is the prediction valid} \\ \end{align*}\]

We consider the choices of \(L = 24\) hours and \(R = t_{confirm} := \max\Big(t_{samp}, t_{abx}, t_{sep3}\Big)\) as appropriate. In words, an alarm is considered to be timely if it happens no earlier than 24 hours before the sepsis-3 onset. Furthermore, the alarm is considered to be informative if it happens no later than the time at which sepsis-3 can be definitively confirmed.

Finally, when designing an alarm system, to prevent alarm fatigue, alarms cannot ring too frequently, so we consider using alarm silencing, defined by the parameter:

\[\begin{align*} \delta = \text{duration of alarm silencing after the alarm is raised} \end{align*}\]

For sepsis-3, we consider the fixed value of \(\delta = \infty\) hours, meaning that the alarm system is silenced indefinitely after it is raised for the first time.

Dataset limitations

There are two major data limitations when trying to compute the sepsis-3 labels on the publicly available datasets considered in this repository. In particular, we highlight that:

eICU dataset reports body fluid sampling information for only a very small fraction of patients,
HiRID dataset reports no body fluid sampling information.

For this reason, an alternative definition needs to be used for these datasets. The alternative definition requires two or more antibiotic administrations within a 24-hour window.

Reproducible Code

After defining the prediction problem and the key parameters, we provide code that can be used to generate data which is ready for AI prediction models.

srcs <- c("mimic_demo", "eicu_demo")

phys <- c(
  "alt", "alp", "ast", "basos", "bicar", "bili", "ca", "cai", "tco2", "cl",
  "crea", "dbp", "eos", "fio2", "hr", "hct", "hgb", "inr_pt", "lact", "lymph",
  "mg", "mch", "mchc", "mcv", "map", "neut", "o2sat", "pco2", "po2", "ph",
  "phos", "plt", "k", "pt", "ptt", "rdw", "rbc", "resp", "na", "sbp", "bun",
  "wbc", "temp", "alb", "ck", "etco2", "crp", "gcs", "pafi", "safi",
  "map_beta50", "map_beta100", "map_beta200", "be", "bili_dir", "bnd", "ckmb",
  "esr", "fgn", "glu", "bmi", "age"
)

meds <- c("cortico", "mech_vent", "dex_amount", "ins_ifx")

dat <- load_concepts(c(phys, meds), srcs, verbose = FALSE)

# carry-forward for physiological variable
dat <- fill_gaps(dat)
dat <- replace_na(dat, type = "locf", vars = phys)

# 0-imputation for treatment variables
dat <- replace_na(dat, 0, vars = meds)

# obtain onset information
sep3 <- sep3_info(srcs)
dat <- merge(dat, sep3, all.x = TRUE)

# remove times after t_confirm
dat <- dat[, t_confirm := max_or_na(t_confirm), by = c(id_vars(dat))]
dat <- dat[is.na(t_confirm), t_confirm := hours(Inf)]
dat <- dat[get(index_var(dat)) <= t_confirm]

# carry-backward the sepsis-3 label for L = 24 hours
dat <- slide(dat, sep3_lab := max_or_na(sep3), before = hours(0L),
             after = hours(24L))

# carry-forward the sepsis-3 label until t_confirm
dat <- dat[is.na(sep3_lab), sep3_lab := 0]
dat <- dat[, sep3_lab := cummax(sep3_lab), by = c(id_vars(dat))]

# remove extra columns & times before ICU
dat <- dat[, c("sep3", "t_confirm") := NULL]
dat <- dat[get(index_var(dat)) >= hours(0L)]

The .parquet files generated by the above code can be used for training and testing.

Evaluation Code

After developing the AI model for prediction, a key step is to evaluate its potential clinical utility. For this purpose, we suggest an evaluation scheme, which can be performed using the patient_eval() function exposed in our repository. The data that needs to be fed into the function should be formatted as follows:

In particular, the input to the patient evaluation is a long-format table that contains a column named sep3_prob with the probability predictions. The label column sep3_lab determines whether it is desired to raise an alarm at this time point. The patient_eval() function then computes the sensitivity, specificity and the positive predictive value (PPV) over a range of prediction thresholds, and can be simply ran using the following code:

patient_eval(evl_dat, delta = hours(Inf), score_col = "sep3_prob",
             tpp = "sep3_lab")

Epidemiology

When consider prediction, the epidemiology of the prediction problem is often very relevant. For this reason, we investigate the following

prevalence of sepsis-3 in each dataset,
onset times of sepsis-3 in each dataset (\(t_{sep3}\)),
time duration between sepsis-3 onset, and the time at which sepsis-3 is confirmed (\(t_{confirm} - t_{sep3}\)).

Prevalence, Onset Times and Confirmation Duration Times

The overall prevalence in each of the datasets is:

References

1. Dellinger RP, Levy MM, Rhodes A, Annane D, Gerlach H, Opal SM, et al. Surviving sepsis campaign: International guidelines for management of severe sepsis and septic shock: 2012. Critical Care Medicine. 2013;41:580–637.

2. Hotchkiss RS, Moldawer LL, Opal SM, Reinhart K, Turnbull IR, Vincent J-L. Sepsis and septic shock. Nature Reviews Disease Primers. 2016;2:16045.

3. Kaukonen K-M, Bailey M, Suzuki S, Pilcher D, Bellomo R. Mortality related to severe sepsis and septic shock among critically ill patients in australia and new zealand, 2000-2012. JAMA: The Journal of the American Medical Association. 2014;311:1308–16.

4. Peake SL, Bellomo R, Cameron PA, Cross A, Delaney A, Finfer S, et al. The outcome of patients with sepsis and septic shock presenting to emergency departments in australia and new zealand. Critical Care and Resuscitation. 2007;9:8–18.

5. Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The third international consensus definitions for sepsis and septic shock (sepsis-3). JAMA: The Journal of the American Medical Association. 2016;315:801–10.

6. Levy MM, Evans LE, Rhodes A. The surviving sepsis campaign bundle: 2018 update. Critical Care Medicine. 2018;46:997–1000.

7. Rhodes A, Evans LE, Alhazzani W, Levy MM, Antonelli M, Ferrer R, et al. Surviving sepsis campaign: International guidelines for management of sepsis and septic shock: 2016. Intensive Care Medicine. 2017;43:304–77.

8. Osthoff M, Gürtler N, Bassetti S, Balestra G, Marsch S, Pargger H, et al. Impact of MALDI-TOF-MS-based identification directly from positive blood cultures on patient management: A controlled clinical trial. Clinical Microbiology and Infection. 2017;23:78–85.

9. Pickering BW, Gajic O, Ahmed A, Herasevich V, Keegan MT. Data utilization for medical decision making at the time of patient admission to ICU. Critical Care Medicine. 2013;41:1502–10.

10. Vincent J-L, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, et al. The SOFA (sepsis-related organ failure assessment) score to describe organ dysfunction/failure. Intensive Care Medicine. 1996;22:707–10.

11. Norris JR, Norris JR. Markov chains. Cambridge university press; 1998.