Attaching a data source (see attach_src()) instantiates two types of S3 classes: a single src_env object, representing the data source as collection of tables, as well as a src_tbl objects per table, representing the given table. Upon package loading, src_env objects including the respective src_tbl objects are created for all data sources that are configured for auto-attaching, irrespective of whether data is actually available. If some (or all) data is missing, the user is asked for permission to download in interactive sessions and an error is thrown in non-interactive sessions. See setup_src_env() for manually downloading and setting up data sources.

new_src_tbl(files, col_cfg, tbl_cfg, prefix, src_env)

is_src_tbl(x)

as_src_tbl(x, ...)

# S3 method for src_env
as_src_tbl(x, tbl, ...)

new_src_env(x, env = new.env(parent = data_env()), link = NULL)

is_src_env(x)

# S3 method for src_env
as.list(x, ...)

as_src_env(x)

attached_srcs()

is_tbl_avail(tbl, env)

src_tbl_avail(env, tbls = ls(envir = env))

src_data_avail(src = auto_attach_srcs())

is_data_avail(src = auto_attach_srcs())

Arguments

files

File names of fst files that will be used to create a prt object (see also prt::new_prt())

col_cfg

Coerced to col_cfg by calling as_col_cfg()

tbl_cfg

Coerced to tbl_cfg by calling as_tbl_cfg()

prefix

Character vector valued data source name(s) (used as class prefix)

src_env

The data source environment (as src_env object)

x

Object to test/coerce

tbl

String-valued table name

env

Environment used as src_env

link

NULL or a second environment (in addition to data_env()) in which the resulting src_env is bound to a name

tbls

Character vector of table names

src

Character vector of data source names or any other object (or list thereof) for which an as_src_env() method exists

Value

The constructors new_src_env()/new_src_tbl() as well as coercion functions as_src_env()/as_src_tbl() return src_env and src_tbl

objects respectively, while inheritance testers is_src_env()/ is_src_tbl() return logical flags. For data availability utilities, see Details section.

Details

A src_env object is an environment with attributes src_name (a string-valued data source name, such as mimic_demo) and id_cfg (describing the possible patient IDs for the given data source). In addition to the src_env class attribute, sub-classes are defined by the source class_prefix configuration setting (see load_src_cfg()). Such data source environments are intended to contain several corresponding src_tbl objects (or rather active bindings that evaluate to src_tbl objects; see setup_src_env()).

The S3 class src_tbl inherits from prt, which represents a partitioned fst file. In addition to the prt object, meta data in the form of col_cfg and tbl_cfg is associated with a src_tbl object (see load_src_cfg()). Furthermore, sub-classes are added as specified by the source configuration class_prefix entry, as with src_env objects. This allows certain functionality, for example data loading, to be adapted to data source-specific requirements.

Instantiation and set up of src_env objects is possible irrespective of whether the underlying data is available. If some (or all) data is missing, the user is asked for permission to download in interactive sessions and an error is thrown in non-interactive sessions upon first access of a src_tbl bound as set up by setup_src_env(). Data availability can be checked with the following utilities:

  • is_tbl_avail(): Returns a logical flag indicating whether all required data for the table passed as tbl which may be a string or any object that has a tbl_name() implementation is available from the environment env (requires an as_src_env() method).

  • src_tbl_avail(): Returns a named logical vector, indicating which tables have all required data available. As above, both tbls (arbitrary length) and env (scalar-valued) may be character vectors or objects with corresponding tbl_name() and as_src_env() methods.

  • src_data_avail(): The most comprehensive data availability report can be generated by calling src_data_avail(), returning a data.frame with columns name (the data source name), available (logical vector indicating whether all data is available), tables (the number of available tables) and total (the total number of tables). As input, src may be an arbitrary length character vector, an object for which an as_src_env() method is defined or an arbitrary-length list thereof.

  • is_data_avail(): Returns a named logical vector, indicating for which data sources all required data is available. As above, src may be an arbitrary length character vector, an object for which an as_src_env() method is defined or an arbitrary-length list thereof.