src_env.Rd
Attaching a data source (see attach_src()
) instantiates two types of S3
classes: a single src_env
object, representing the data source as
collection of tables, as well as a src_tbl
objects per table,
representing the given table. Upon package loading, src_env
objects
including the respective src_tbl
objects are created for all data sources
that are configured for auto-attaching, irrespective of whether data is
actually available. If some (or all) data is missing, the user is asked for
permission to download in interactive sessions and an error is thrown in
non-interactive sessions. See setup_src_env()
for manually downloading
and setting up data sources.
new_src_tbl(files, col_cfg, tbl_cfg, prefix, src_env)
is_src_tbl(x)
as_src_tbl(x, ...)
# S3 method for src_env
as_src_tbl(x, tbl, ...)
new_src_env(x, env = new.env(parent = data_env()), link = NULL)
is_src_env(x)
# S3 method for src_env
as.list(x, ...)
as_src_env(x)
attached_srcs()
is_tbl_avail(tbl, env)
src_tbl_avail(env, tbls = ls(envir = env))
src_data_avail(src = auto_attach_srcs())
is_data_avail(src = auto_attach_srcs())
File names of fst
files that will be used to create a prt
object (see also prt::new_prt()
)
Coerced to col_cfg
by calling as_col_cfg()
Coerced to tbl_cfg
by calling as_tbl_cfg()
Character vector valued data source name(s) (used as class prefix)
The data source environment (as src_env
object)
Object to test/coerce
String-valued table name
Environment used as src_env
NULL
or a second environment (in addition to data_env()
) in
which the resulting src_env
is bound to a name
Character vector of table names
Character vector of data source names or any other object (or
list thereof) for which an as_src_env()
method exists
The constructors new_src_env()
/new_src_tbl()
as well as coercion
functions as_src_env()
/as_src_tbl()
return src_env
and src_tbl
objects respectively, while inheritance testers is_src_env()
/
is_src_tbl()
return logical flags. For data availability utilities, see
Details section.
A src_env
object is an environment with attributes src_name
(a
string-valued data source name, such as mimic_demo
) and id_cfg
(describing the possible patient IDs for the given data source). In
addition to the src_env
class attribute, sub-classes are defined by the
source class_prefix
configuration setting (see load_src_cfg()
). Such
data source environments are intended to contain several corresponding
src_tbl
objects (or rather active bindings that evaluate to src_tbl
objects; see setup_src_env()
).
The S3 class src_tbl
inherits from prt
, which
represents a partitioned fst
file. In addition to the prt
object, meta data in the form of col_cfg
and tbl_cfg
is associated with
a src_tbl
object (see load_src_cfg()
). Furthermore, sub-classes are
added as specified by the source configuration class_prefix
entry, as
with src_env
objects. This allows certain functionality, for example data
loading, to be adapted to data source-specific requirements.
Instantiation and set up of src_env
objects is possible irrespective of
whether the underlying data is available. If some (or all) data is missing,
the user is asked for permission to download in interactive sessions and an
error is thrown in non-interactive sessions upon first access of a
src_tbl
bound as set up by setup_src_env()
. Data availability can be
checked with the following utilities:
is_tbl_avail()
: Returns a logical flag indicating whether all required
data for the table passed as tbl
which may be a string or any object
that has a tbl_name()
implementation is available from the environment
env
(requires an as_src_env()
method).
src_tbl_avail()
: Returns a named logical vector, indicating which tables
have all required data available. As above, both tbls
(arbitrary
length) and env
(scalar-valued) may be character vectors or objects
with corresponding tbl_name()
and as_src_env()
methods.
src_data_avail()
: The most comprehensive data availability report can
be generated by calling src_data_avail()
, returning a data.frame
with
columns name
(the data source name), available
(logical vector
indicating whether all data is available), tables
(the number of
available tables) and total
(the total number of tables). As input,
src
may be an arbitrary length character vector, an object for which an
as_src_env()
method is defined or an arbitrary-length list thereof.
is_data_avail()
: Returns a named logical vector, indicating for which
data sources all required data is available. As above, src
may be an
arbitrary length character vector, an object for which an as_src_env()
method is defined or an arbitrary-length list thereof.