Skip to contents

The catalog function returns a data catalog for a data source. A data catalog is like a collection of data dictionaries for all the datasets in the data source. The catalog allows you to examine the datasets in the data source without yet loading anything into memory. Once you decide which data items you want to load, use the fetch function to load that item into memory.

Usage

catalog(source, engine, pattern = NULL, where = NULL, import_specs = NULL)

Arguments

source

The source for the data. This parameter is required. Normally the source is passed as a full or relative path.

engine

The data engine to use for this data source. This parameter is required. The available data engines are available on the engines enumeration. For example, engines$csv will specify the CSV engine, and engines$rdata will specify the RDATA engine.

pattern

A pattern to use when loading data items from the data source. The pattern can be a name or a vector of names. Names also accept wildcards. The supplied pattern will be used to filter which data items are loaded into the catalog. For example, the pattern pattern = "AD*" will load only datasets that start with "AD".

where

A where expression to use when fetching the data. This expression will apply to all fetch operations on this catalog. The where expression should be defined with the Base R expression function. The expression is unquoted and can use any Base R operators or functions.

import_specs

The import specs to use for any fetch operation on this catalog. The import spec can be used to control the data types on the incoming columns. You can create separate import specs for each dataset, or one import spec to use for all datasets. See the import_spec and specs functions for more information about this capability.

Value

The loaded data catalog, as class "dcat". The catalog will be a list of data dictionaries. Each data dictionary is a tibble.

See also

The fetch function to retrieve data from the catalog, and the import_spec function to create import specifications.

Examples

# Get data directory
pkg <- system.file("extdata", package = "fetch")

# Create catalog
ct <- catalog(pkg, engines$csv)

# Example 1: Catalog all rows

# View catalog
ct
# data catalog: 6 items
# - Source: C:/packages/fetch/inst/extdata
# - Engine: csv
# - Items:
  # data item 'ADAE': 56 cols 150 rows
  # data item 'ADEX': 17 cols 348 rows
  # data item 'ADPR': 37 cols 552 rows
  # data item 'ADPSGA': 42 cols 695 rows
  # data item 'ADSL': 56 cols 87 rows
  # data item 'ADVS': 37 cols 3617 rows

# View catalog item
ct$ADEX
# data item 'ADEX': 17 cols 348 rows
# - Engine: csv
# - Size: 70.7 Kb
# - Last Modified: 2020-09-18 14:30:22
#    Name   Column     Class Label Format NAs MaxChar
# 1  ADEX  STUDYID character  <NA>     NA   0       3
# 2  ADEX  USUBJID character  <NA>     NA   0      10
# 3  ADEX   SUBJID character  <NA>     NA   0       3
# 4  ADEX   SITEID character  <NA>     NA   0       2
# 5  ADEX     TRTP character  <NA>     NA   8       5
# 6  ADEX    TRTPN   numeric  <NA>     NA   8       1
# 7  ADEX     TRTA character  <NA>     NA   8       5
# 8  ADEX    TRTAN   numeric  <NA>     NA   8       1
# 9  ADEX   RANDFL character  <NA>     NA   0       1
# 10 ADEX    SAFFL character  <NA>     NA   0       1
# 11 ADEX   MITTFL character  <NA>     NA   0       1
# 12 ADEX  PPROTFL character  <NA>     NA   0       1
# 13 ADEX    PARAM character  <NA>     NA   0      45
# 14 ADEX  PARAMCD character  <NA>     NA   0       8
# 15 ADEX   PARAMN   numeric  <NA>     NA   0       1
# 16 ADEX     AVAL   numeric  <NA>     NA  16       4
# 17 ADEX AVALCAT1 character  <NA>     NA  87      10


# Example 2: Catalog with where expression
ct <- catalog(pkg, engines$csv, where = expression(SUBJID == '049'))

# View catalog item - Now only 4 rows
ct$ADEX
# data item 'ADEX': 17 cols 4 rows
#- Where: SUBJID == "049"
#- Engine: csv
#- Size: 4.5 Kb
#- Last Modified: 2020-09-18 14:30:22
#Name   Column     Class Label Format NAs MaxChar
#1  ADEX  STUDYID character  <NA>     NA   0       3
#2  ADEX  USUBJID character  <NA>     NA   0      10
#3  ADEX   SUBJID character  <NA>     NA   0       3
#4  ADEX   SITEID character  <NA>     NA   0       2
#5  ADEX     TRTP character  <NA>     NA   0       5
#6  ADEX    TRTPN   numeric  <NA>     NA   0       1
#7  ADEX     TRTA character  <NA>     NA   0       5
#8  ADEX    TRTAN   numeric  <NA>     NA   0       1
#9  ADEX   RANDFL character  <NA>     NA   0       1
#10 ADEX    SAFFL character  <NA>     NA   0       1
#11 ADEX   MITTFL character  <NA>     NA   0       1
#12 ADEX  PPROTFL character  <NA>     NA   0       1
#13 ADEX    PARAM character  <NA>     NA   0      45
#14 ADEX  PARAMCD character  <NA>     NA   0       8
#15 ADEX   PARAMN   numeric  <NA>     NA   0       1
#16 ADEX     AVAL   numeric  <NA>     NA   0       4
#17 ADEX AVALCAT1 character  <NA>     NA   1      10