Load concept dictionaries — load_dictionary • ICU data with R

Data concepts can be specified in JSON format as a concept dictionary which can be read and parsed into concept/item objects. Dictionary loading can either be performed on the default included dictionary or on a user- specified custom dictionary. Furthermore, a mechanism is provided for adding concepts and/or data sources to the existing dictionary (see the Details section).

load_dictionary(
  src = NULL,
  concepts = NULL,
  name = "concept-dict",
  cfg_dirs = NULL
)

concept_availability(dict = NULL, include_rec = FALSE, ...)

explain_dictionary(
  dict = NULL,
  cols = c("name", "category", "description"),
  ...
)

Arguments

src: NULL or the name of one or several data sources
concepts: A character vector used to subset the concept dictionary or NULL indicating no subsetting
name: Name of the dictionary to be read
cfg_dirs: File name of the dictionary
dict: A dictionary (conncept object) or NULL
include_rec: Logical flag indicating whether to include rec_cncpt concepts as well
...: Forwarded to load_dictionary() in case NULL is passed as dict argument
cols: Columns to include in the output of explain_dictionary()

Value

A concept object containing several data concepts as cncpt

objects.

Details

A default dictionary is provided at

system.file(
  file.path("extdata", "config", "concept-dict.json"),
  package = "ricu"
)

and can be loaded in to an R session by calling get_config("concept-dict"). The default dictionary can be extended by adding a file concept-dict.json to the path specified by the environment variable RICU_CONFIG_PATH. New concepts can be added to this file and existing concepts can be extended (by adding new data sources). Alternatively, load_dictionary() can be called on non-default dictionaries using the file argument.

In order to specify a concept as JSON object, for example the numeric concept for glucose, is given by

{
  "glu": {
    "unit": "mg/dL",
    "min": 0,
    "max": 1000,
    "description": "glucose",
    "category": "chemistry",
    "sources": {
      "mimic_demo": [
        {
          "ids": [50809, 50931],
          "table": "labevents",
          "sub_var": "itemid"
        }
      ]
    }
  }
}

Using such a specification, constructors for cncpt and itm objects are called either using default arguments or as specified by the JSON object, with the above corresponding to a call like

concept(
  name = "glu",
  items = item(
    src = "mimic_demo", table = "labevents", sub_var = "itemid",
    ids = list(c(50809L, 50931L))
  ),
  description = "glucose", category = "chemistry",
  unit = "mg/dL", min = 0, max = 1000
)

The arguments src and concepts can be used to only load a subset of a dictionary by specifying a character vector of data sources and/or concept names.

A summary of item availability for a set of concepts can be created using concept_availability(). This produces a logical matrix with TRUE entries corresponding to concepts where for the given data source, at least a single item has been defined. If data is loaded for a combination of concept and data source, where the corresponding entry is FALSE, this will yield either a zero-row id_tbl object or an object inheriting form id_tbl where the column corresponding to the concept is NA throughout, depending on whether the concept was loaded alongside other concepts where data is available or not.

Whether to include rec_cncpt concepts in the overview produced by concept_availability() can be controlled via the logical flag include_rec. A recursive concept is considered available simply if all its building blocks are available. This can, however lead to slightly confusing output as a recursive concept might not strictly depend on one of its sub-concepts but handle such missingness by design. In such a scenario, the availability summary might report FALSE even though data can still be produced.

Examples

if (require(mimic.demo)) {
head(load_dictionary("mimic_demo"))
load_dictionary("mimic_demo", c("glu", "lact"))
}
#> <concept[2]>
#>                    glu                   lact 
#> glucose <num_cncpt[1]> lactate <num_cncpt[1]>