Skip to contents

In 2020 I wrote a package called libr to deal with the problem of managing multiple related datasets. This package allows you to manage data from several popular data sources: SAS, Excel, CSV, and R. While the package received generally positive feedback, users requested some features that the package did not support:

  • Users wanted to examine the data before loading it into memory.
  • Users wanted to pick only some datasets to load into memory.
  • Users wanted to subset all datasets in the library based on a condition.

After consideration of these requests, I determined that these features would be most easily satisfied by creating a new package. That package is fetch.

Key Features

The fetch package allows you to retrieve data from any of several different data sources while minimizing memory usage. There are only two steps to retrieve data:

  • Create a data catalog
  • Fetch data from the catalog

The above two steps are accomplished by the following functions:

  • catalog(): Defines a data catalog for a particular data source
  • fetch(): Pulls a data item from the catalog and loads it into memory

The package has the following key features:

  • The catalog() function allows you to explore the data before reading from the source.
  • The fetch() function loads only the data you need for your analysis.
  • The “where” parameter on the catalog() function allows you to subset across all datasets in the catalog.
  • The “select” and “top” parameters on the fetch() function let you further limit the data returned.
  • The “import_spec” parameters on the catalog() and fetch() functions give you control over the column data types.

How to use fetch

Let’s start with a simple example. In this example we will create a data catalog, examine its contents, and fetch a dataset from the catalog. First, create and view the data catalog:

# Get sample data directory
pkg <- system.file("extdata", package = "fetch")
 
# Create catalog
ct <- catalog(pkg, engines$csv)

# View catalog
ct
# data catalog: 6 items
# - Source: C:/packages/fetch/inst/extdata
# - Engine: csv
# - Items:
    # data item 'ADAE': 56 cols 150 rows
    # data item 'ADEX': 17 cols 348 rows
    # data item 'ADPR': 37 cols 552 rows
    # data item 'ADPSGA': 42 cols 695 rows
    # data item 'ADSL': 56 cols 87 rows
    # data item 'ADVS': 37 cols 3617 rows

As can be seen above, this catalog has 6 csv files in it. The number of columns and rows are displayed for each file. Information about each file is contained in the catalog. This information can be accessed using list notion, like so:

# View info for catalog item 
ct$ADEX
# data item 'ADEX': 17 cols 348 rows
# - Engine: csv
# - Size: 70.7 Kb
# - Last Modified: 2020-09-18 14:30:22
#    Name   Column     Class Label Format NAs MaxChar
# 1  ADEX  STUDYID character  <NA>     NA   0       3
# 2  ADEX  USUBJID character  <NA>     NA   0      10
# 3  ADEX   SUBJID character  <NA>     NA   0       3
# 4  ADEX   SITEID character  <NA>     NA   0       2
# 5  ADEX     TRTP character  <NA>     NA   8       5
# 6  ADEX    TRTPN   numeric  <NA>     NA   8       1
# 7  ADEX     TRTA character  <NA>     NA   8       5
# 8  ADEX    TRTAN   numeric  <NA>     NA   8       1
# 9  ADEX   RANDFL character  <NA>     NA   0       1
# 10 ADEX    SAFFL character  <NA>     NA   0       1
# 11 ADEX   MITTFL character  <NA>     NA   0       1
# 12 ADEX  PPROTFL character  <NA>     NA   0       1
# 13 ADEX    PARAM character  <NA>     NA   0      45
# 14 ADEX  PARAMCD character  <NA>     NA   0       8
# 15 ADEX   PARAMN   numeric  <NA>     NA   0       1
# 16 ADEX     AVAL   numeric  <NA>     NA  16       4
# 17 ADEX AVALCAT1 character  <NA>     NA  87      10

When the catalog item is printed, it shows a data dictionary for the specified dataset. Column names and data types are displayed, along with some other useful attributes of your data.

Once the catalog is created, you are ready to fetch data from the catalog:


# Fetch data from a catalog
dt <- fetch(ct$ADEX)

# View data
dt
# A tibble: 348 × 17
#    STUDYID USUBJID    SUBJID SITEID TRTP  TRTPN TRTA  TRTAN RANDFL SAFFL MITTFL PPROTFL
#    <chr>   <chr>      <chr>  <chr>  <chr> <dbl> <chr> <dbl> <chr>  <chr> <chr>  <chr>  
#  1 ABC     ABC-01-049 049    01     ARM D     4 ARM D     4 Y      Y     Y      Y      
#  2 ABC     ABC-01-049 049    01     ARM D     4 ARM D     4 Y      Y     Y      Y      
#  3 ABC     ABC-01-049 049    01     ARM D     4 ARM D     4 Y      Y     Y      Y      
#  4 ABC     ABC-01-049 049    01     ARM D     4 ARM D     4 Y      Y     Y      Y      
#  5 ABC     ABC-01-050 050    01     ARM B     2 ARM B     2 Y      Y     Y      Y      
#  6 ABC     ABC-01-050 050    01     ARM B     2 ARM B     2 Y      Y     Y      Y      
#  7 ABC     ABC-01-050 050    01     ARM B     2 ARM B     2 Y      Y     Y      Y      
#  8 ABC     ABC-01-050 050    01     ARM B     2 ARM B     2 Y      Y     Y      Y      
#  9 ABC     ABC-01-051 051    01     ARM A     1 ARM A     1 Y      Y     Y      Y      
# 10 ABC     ABC-01-051 051    01     ARM A     1 ARM A     1 Y      Y     Y      Y      
#  338 more rows
#  5 more variables: PARAM <chr>, PARAMCD <chr>, PARAMN <dbl>, AVAL <dbl>,
#   AVALCAT1 <chr>
#  Use `print(n = ...)` to see more rows

At this point, you can proceed with your analysis, or fetch additional datasets from the catalog as needed. So easy!

Additional Features

The above example covers the very basics. Now let’s look at some more useful features of the fetch package.

Apply a Where Expression to a Fetch Operation

What if you are performing an analysis on a subset of data? Is there a way to apply a subset when the data is fetched? You can apply subset criteria using the “where” parameter on the fetch() function. Like this:

# Get sample data directory
pkg <- system.file("extdata", package = "fetch")
 
# Create catalog
ct <- catalog(pkg, engines$csv)

# Subset data for a specific subject
dt <- fetch(ct$ADVS, where = expression(SUBJID == '049'))

# View data subset
dt
# A tibble: 44 × 37
#    STUDYID USUBJID    SUBJID SITEID SRCDOM SRCVAR   SRCSEQ TRTP  TRTPN TRTA  TRTAN RANDFL
#    <chr>   <chr>      <chr>  <chr>  <chr>  <chr>     <dbl> <chr> <dbl> <chr> <dbl> <chr> 
#  1 ABC     ABC-01-049 049    01     VS     VSSTRESN      9 ARM D     4 ARM D     4 Y     
#  2 ABC     ABC-01-049 049    01     VS     VSSTRESN     41 ARM D     4 ARM D     4 Y     
#  3 ABC     ABC-01-049 049    01     NA     NA           NA ARM D     4 ARM D     4 Y     
#  4 ABC     ABC-01-049 049    01     VS     VSSTRESN     34 ARM D     4 ARM D     4 Y     
#  5 ABC     ABC-01-049 049    01     VS     VSSTRESN     35 ARM D     4 ARM D     4 Y     
#  6 ABC     ABC-01-049 049    01     VS     VSSTRESN     36 ARM D     4 ARM D     4 Y     
#  7 ABC     ABC-01-049 049    01     VS     VSSTRESN     37 ARM D     4 ARM D     4 Y     
#  8 ABC     ABC-01-049 049    01     VS     VSSTRESN     38 ARM D     4 ARM D     4 Y     
#  9 ABC     ABC-01-049 049    01     VS     VSSTRESN     39 ARM D     4 ARM D     4 Y     
# 10 ABC     ABC-01-049 049    01     VS     VSSTRESN     40 ARM D     4 ARM D     4 Y     
#  34 more rows
#  25 more variables: SAFFL <chr>, MITTFL <chr>, PPROTFL <chr>, TRTSDT <chr>,
#   TRTEDT <chr>, ADT <chr>, ADY <dbl>, ADTF <lgl>, AVISIT <chr>, AVISITN <dbl>,
#   PARAM <chr>, PARAMCD <chr>, PARAMN <dbl>, PARAMTYP <chr>, AVAL <dbl>, BASE <dbl>,
#   CHG <dbl>, AWRANGE <chr>, AWTARGET <dbl>, AWTDIFF <dbl>, AWLO <dbl>, AWHI <dbl>,
#   AWU <chr>, ABLFL <chr>, ANL01FL <chr>
#  Use `print(n = ...)` to see more rows

The expression function can be used to define any R expression. This function accepts unquoted variable names, logical operators, and R functions like is.null() and is.na(). You can use the the “where” parameter with an expression to narrow down the data you are working with and reduce the memory footprint of your program.

Get Top N Rows

Another way to limit the data returned by your program is to use the “top” parameter. This parameter allows you to return only the “top N” number of rows in the target data item. The “top” parameter is useful for exploratory purposes. Here is an example:

# Get sample data directory
pkg <- system.file("extdata", package = "fetch")
 
# Create catalog
ct <- catalog(pkg, engines$csv)

# Subset data for a specific subject
dt <- fetch(ct$ADVS, top = 5, where = expression(SUBJID == '049'))

# View results
dt
# A tibble: 5 × 37
#   STUDYID USUBJID  SUBJID SITEID SRCDOM SRCVAR SRCSEQ TRTP  TRTPN TRTA  TRTAN RANDFL SAFFL
#   <chr>   <chr>    <chr>  <chr>  <chr>  <chr>   <dbl> <chr> <dbl> <chr> <dbl> <chr>  <chr>
# 1 ABC     ABC-01-… 049    01     VS     VSSTR…      9 ARM D     4 ARM D     4 Y      Y    
# 2 ABC     ABC-01-… 049    01     VS     VSSTR…     41 ARM D     4 ARM D     4 Y      Y    
# 3 ABC     ABC-01-… 049    01     NA     NA         NA ARM D     4 ARM D     4 Y      Y    
# 4 ABC     ABC-01-… 049    01     VS     VSSTR…     34 ARM D     4 ARM D     4 Y      Y    
# 5 ABC     ABC-01-… 049    01     VS     VSSTR…     35 ARM D     4 ARM D     4 Y      Y    
#   24 more variables: MITTFL <chr>, PPROTFL <chr>, TRTSDT <chr>, TRTEDT <chr>,
#   ADT <chr>, ADY <dbl>, ADTF <lgl>, AVISIT <chr>, AVISITN <dbl>, PARAM <chr>,
#   PARAMCD <chr>, PARAMN <dbl>, PARAMTYP <chr>, AVAL <dbl>, BASE <dbl>, CHG <dbl>,
#   AWRANGE <chr>, AWTARGET <dbl>, AWTDIFF <dbl>, AWLO <dbl>, AWHI <dbl>, AWU <chr>,
#   ABLFL <chr>, ANL01FL <chr>

The above fetch returned the top 5 rows of the dataset for subject ‘049’. Note that the “top” parameter can be used with or without the “where” parameter.

Apply a Pattern to a Catalog

The previous examples loaded all datasets from the data source into the data catalog. There may be instances, however, when you have a very large number of datasets, and want to limit the data items loaded into the catalog. For this situation, you can use the “pattern” parameter on the catalog() function, like so:

# Get sample data directory
pkg <- system.file("extdata", package = "fetch")
 
# Create catalog, applying pattern to dataset names
ct <- catalog(pkg, engines$csv, pattern = "*S*")

# View catalog
ct
# data catalog: 3 items
# - Source: C:/packages/fetch/inst/extdata
# - Engine: csv
# - Pattern: *S*
# - Items:
# data item 'ADPSGA': 42 cols 695 rows
# data item 'ADSL': 56 cols 87 rows
# data item 'ADVS': 37 cols 3617 rows

The above example limits the items in the catalog to those with an “S” in the name. While this is not a realistic example, it nevertheless demonstrates how the “pattern” parameter can be used to reduce the datasets loaded into the catalog.

Apply a Where Expression to an Entire Catalog

Sometimes you want to subset all the datasets in your catalog. This subset can be accomplished with the “where” parameter on the catalog() function. Observe:

# Get sample data directory
pkg <- system.file("extdata", package = "fetch")
 
# Create catalog without where expression
ct1 <- catalog(pkg, engines$csv)

# View catalog
ct1
# data catalog: 6 items
# - Source: C:/packages/fetch/inst/extdata
# - Engine: csv
# - Items:
# data item 'ADAE': 56 cols 150 rows
# data item 'ADEX': 17 cols 348 rows
# data item 'ADPR': 37 cols 552 rows
# data item 'ADPSGA': 42 cols 695 rows
# data item 'ADSL': 56 cols 87 rows
# data item 'ADVS': 37 cols 3617 rows

# Create catalog with where expression
ct2 <- catalog(pkg, engines$csv, where = expression(SUBJID == '049'))

# View catalog
ct2
# data catalog: 6 items
# - Source: C:/packages/fetch/inst/extdata
# - Engine: csv
# - Where: SUBJID == "049"
# - Items:
# # data item 'ADAE': 56 cols 5 rows
# - Where: SUBJID == "049"
# # data item 'ADEX': 17 cols 4 rows
# - Where: SUBJID == "049"
# # data item 'ADPR': 37 cols 7 rows
# - Where: SUBJID == "049"
# # data item 'ADPSGA': 42 cols 10 rows
# - Where: SUBJID == "049"
# # data item 'ADSL': 56 cols 1 rows
# - Where: SUBJID == "049"
# # data item 'ADVS': 37 cols 44 rows
# - Where: SUBJID == "049"

Notice that a where expression has been added to each data item in the catalog, and the row counts for each item have been reduced. Now let’s fetch some data from this catalog:

# Subset data for a specific subject
dt1 <- fetch(ct2$ADVS)

# View results of ADVS fetch
dt1
# A tibble: 44 × 37
#    STUDYID USUBJID    SUBJID SITEID SRCDOM SRCVAR   SRCSEQ TRTP  TRTPN TRTA  TRTAN RANDFL SAFFL
#    <chr>   <chr>      <chr>  <chr>  <chr>  <chr>     <dbl> <chr> <dbl> <chr> <dbl> <chr>  <chr>
#  1 ABC     ABC-01-049 049    01     VS     VSSTRESN      9 ARM D     4 ARM D     4 Y      Y    
#  2 ABC     ABC-01-049 049    01     VS     VSSTRESN     41 ARM D     4 ARM D     4 Y      Y    
#  3 ABC     ABC-01-049 049    01     NA     NA           NA ARM D     4 ARM D     4 Y      Y    
#  4 ABC     ABC-01-049 049    01     VS     VSSTRESN     34 ARM D     4 ARM D     4 Y      Y    
#  5 ABC     ABC-01-049 049    01     VS     VSSTRESN     35 ARM D     4 ARM D     4 Y      Y    
#  6 ABC     ABC-01-049 049    01     VS     VSSTRESN     36 ARM D     4 ARM D     4 Y      Y    
#  7 ABC     ABC-01-049 049    01     VS     VSSTRESN     37 ARM D     4 ARM D     4 Y      Y    
#  8 ABC     ABC-01-049 049    01     VS     VSSTRESN     38 ARM D     4 ARM D     4 Y      Y    
#  9 ABC     ABC-01-049 049    01     VS     VSSTRESN     39 ARM D     4 ARM D     4 Y      Y    
# 10 ABC     ABC-01-049 049    01     VS     VSSTRESN     40 ARM D     4 ARM D     4 Y      Y    
#  34 more rows
#  24 more variables: MITTFL <chr>, PPROTFL <chr>, TRTSDT <chr>, TRTEDT <chr>, ADT <chr>,
#   ADY <dbl>, ADTF <lgl>, AVISIT <chr>, AVISITN <dbl>, PARAM <chr>, PARAMCD <chr>,
#   PARAMN <dbl>, PARAMTYP <chr>, AVAL <dbl>, BASE <dbl>, CHG <dbl>, AWRANGE <chr>,
#   AWTARGET <dbl>, AWTDIFF <dbl>, AWLO <dbl>, AWHI <dbl>, AWU <chr>, ABLFL <chr>,
#   ANL01FL <chr>
#  Use `print(n = ...)` to see more rows


# View results of ADSL fetch
dt2 <- fetch(ct2$ADSL)
# A tibble: 1 × 56
#   STUDYID USUBJID    SUBJID SITEID   AGE AGEU  AGEGR1   SEX   RACE  RACEN ETHNIC ETHNICN COUNTRY
#   <chr>   <chr>      <chr>  <chr>  <dbl> <chr> <chr>    <chr> <chr> <dbl> <chr>    <dbl> <lgl>  
# 1 ABC     ABC-01-049 049    01        39 YEARS 30-39 y… M     WHITE     5 NOT H…       2 NA     
#   43 more variables: ARM <chr>, ACTARM <lgl>, TRT01P <chr>, TRT01PN <dbl>, TRT01A <chr>,
#   TRT01AN <dbl>, TRTSDT <chr>, TRTEDT <chr>, TRTDURN <dbl>, TRTDURU <chr>, TR01SDT <chr>,
#   TR01EDT <chr>, INCNFL <chr>, RANDFL <chr>, RANDEXC1 <lgl>, RANDEXC2 <chr>, RANDEXC3 <chr>,
#   RANDEXC4 <chr>, SAFFL <chr>, SAFEXC1 <chr>, SAFEXC2 <chr>, MITTFL <chr>, MITTEXC1 <chr>,
#   MITTEXC2 <chr>, PPROTFL <chr>, PPROTEX1 <chr>, PPROTEX2 <chr>, PPROTEX3 <chr>,
#   PPROTEX4 <chr>, COMPLFL <chr>, STDYDISP <chr>, STDYREAS <chr>, INCNDT <chr>, RANDDT <chr>,
#   DTHDT <lgl>, DTHFL <chr>, MISSDOSE <dbl>, TP1TRTR <chr>, TP2TRTR <chr>, TP3TRTR <chr>,...

As you can see, both the ADVS and ADSL datasets have been subset for subject ‘049’. This feature of the fetch package makes it easy to target just the rows you are looking for.

Add Import Specifications to a Catalog

Some data files contain accurate data type information within them, and others do not. For example, CSV files do not contain data type information, but RDATA files do. If the data type information is not available, the data engine will try to guess the data types. However, it does not always guess accurately. Sometimes you need to help the data engine determine what the data type of a column is supposed to be. Import specs allow you to to specify the correct data type for a column.

Examine the dictionary for the ADVS dataset:

# Get sample data directory
pkg <- system.file("extdata", package = "fetch")
 
# Create catalog without import spec
ct <- catalog(pkg, engines$csv)

# View dictionary for ADVS
ct$ADVS
# data item 'ADVS': 37 cols 3617 rows
# - Engine: csv
# - Size: 1.1 Mb
# - Last Modified: 2020-09-18 14:30:22
#    Name   Column     Class Label Format  NAs MaxChar
# 1  ADVS  STUDYID character  <NA>     NA    0       3
# 2  ADVS  USUBJID character  <NA>     NA    0      10
# 3  ADVS   SUBJID character  <NA>     NA    0       3
# 4  ADVS   SITEID character  <NA>     NA    0       2
# 5  ADVS   SRCDOM character  <NA>     NA   85       2
# 6  ADVS   SRCVAR character  <NA>     NA   85       8
# 7  ADVS   SRCSEQ   numeric  <NA>     NA   85       2
# 8  ADVS     TRTP character  <NA>     NA   85       5
# 9  ADVS    TRTPN   numeric  <NA>     NA   85       1
# 10 ADVS     TRTA character  <NA>     NA   85       5
# 11 ADVS    TRTAN   numeric  <NA>     NA   85       1
# 12 ADVS   RANDFL character  <NA>     NA    0       1
# 13 ADVS    SAFFL character  <NA>     NA    0       1
# 14 ADVS   MITTFL character  <NA>     NA    0       1
# 15 ADVS  PPROTFL character  <NA>     NA    0       1
# 16 ADVS   TRTSDT character  <NA>     NA   54       9 # Character by default
# 17 ADVS   TRTEDT character  <NA>     NA  119       9 # Character by default
# 18 ADVS      ADT character  <NA>     NA    0       9
# 19 ADVS      ADY   numeric  <NA>     NA   54       4
# 20 ADVS     ADTF   logical  <NA>     NA 3617       0
# 21 ADVS   AVISIT character  <NA>     NA   50      14
# 22 ADVS  AVISITN   numeric  <NA>     NA   50       2
# 23 ADVS    PARAM character  <NA>     NA    0      35
# 24 ADVS  PARAMCD character  <NA>     NA    0       6
# 25 ADVS   PARAMN   numeric  <NA>     NA    0       1
# 26 ADVS PARAMTYP character  <NA>     NA 3532       7
# 27 ADVS     AVAL   numeric  <NA>     NA    0       5
# 28 ADVS     BASE   numeric  <NA>     NA   70       5
# 29 ADVS      CHG   numeric  <NA>     NA 1312       4
# 30 ADVS  AWRANGE character  <NA>     NA 1331      25
# 31 ADVS AWTARGET   numeric  <NA>     NA 1331       3
# 32 ADVS  AWTDIFF   numeric  <NA>     NA 1331       2
# 33 ADVS     AWLO   numeric  <NA>     NA 1331       3
# 34 ADVS     AWHI   numeric  <NA>     NA 1331       3
# 35 ADVS      AWU character  <NA>     NA 1331       4
# 36 ADVS    ABLFL character  <NA>     NA 2869       1
# 37 ADVS  ANL01FL character  <NA>     NA  448       1

Note that the data types of the “TRTSDT” and “TRTEDT” are listed as character. In fact, these columns are supposed to be a date. You can force the date data type assignment with an import spec, as follows:

# Get sample data directory
pkg <- system.file("extdata", package = "fetch")
 
# Create import spec
spc <- import_spec(TRTSDT = "date=%d%b%Y",
                   TRTEDT = "date=%d%b%Y")

# Create catalog with import spec
ct <- catalog(pkg, engines$csv, import_specs = spc)

# View dictionary for ADVS with Import Spec
ct$ADVS
# data item 'ADVS': 37 cols 3617 rows
# - Engine: csv
# - Size: 1.1 Mb
# - Last Modified: 2020-09-18 14:30:22
#    Name   Column     Class Label Format  NAs MaxChar
# 1  ADVS  STUDYID character  <NA>     NA    0       3
# 2  ADVS  USUBJID character  <NA>     NA    0      10
# 3  ADVS   SUBJID character  <NA>     NA    0       3
# 4  ADVS   SITEID character  <NA>     NA    0       2
# 5  ADVS   SRCDOM character  <NA>     NA   85       2
# 6  ADVS   SRCVAR character  <NA>     NA   85       8
# 7  ADVS   SRCSEQ   numeric  <NA>     NA   85       2
# 8  ADVS     TRTP character  <NA>     NA   85       5
# 9  ADVS    TRTPN   numeric  <NA>     NA   85       1
# 10 ADVS     TRTA character  <NA>     NA   85       5
# 11 ADVS    TRTAN   numeric  <NA>     NA   85       1
# 12 ADVS   RANDFL character  <NA>     NA    0       1
# 13 ADVS    SAFFL character  <NA>     NA    0       1
# 14 ADVS   MITTFL character  <NA>     NA    0       1
# 15 ADVS  PPROTFL character  <NA>     NA    0       1
# 16 ADVS   TRTSDT      Date  <NA>     NA   54      10 # Converted to Date 
# 17 ADVS   TRTEDT      Date  <NA>     NA  119      10 # Converted to Date
# 18 ADVS      ADT character  <NA>     NA    0       9
# 19 ADVS      ADY   numeric  <NA>     NA   54       4
# 20 ADVS     ADTF   logical  <NA>     NA 3617       0
# 21 ADVS   AVISIT character  <NA>     NA   50      14
# 22 ADVS  AVISITN   numeric  <NA>     NA   50       2
# 23 ADVS    PARAM character  <NA>     NA    0      35
# 24 ADVS  PARAMCD character  <NA>     NA    0       6
# 25 ADVS   PARAMN   numeric  <NA>     NA    0       1
# 26 ADVS PARAMTYP character  <NA>     NA 3532       7
# 27 ADVS     AVAL   numeric  <NA>     NA    0       5
# 28 ADVS     BASE   numeric  <NA>     NA   70       5
# 29 ADVS      CHG   numeric  <NA>     NA 1312       4
# 30 ADVS  AWRANGE character  <NA>     NA 1331      25
# 31 ADVS AWTARGET   numeric  <NA>     NA 1331       3
# 32 ADVS  AWTDIFF   numeric  <NA>     NA 1331       2
# 33 ADVS     AWLO   numeric  <NA>     NA 1331       3
# 34 ADVS     AWHI   numeric  <NA>     NA 1331       3
# 35 ADVS      AWU character  <NA>     NA 1331       4
# 36 ADVS    ABLFL character  <NA>     NA 2869       1
# 37 ADVS  ANL01FL character  <NA>     NA  448       1

Observe that the columns “TRTSDT” and “TRTEDT” are now converted to Date columns. This import spec will apply to all “TRTSDT” and “TRTEDT” columns in all datasets. The format “%d%b%Y” was written specifically to read in the data value this files uses. See the import_spec() documentation for more information on these data formats, and for other possible data type specifications. Also see the specs() documentation if you want to assign a unique import specifications to each data item in the catalog.

Add Import Specifications to a Fetch

You may also add an import specification to the fetch operation. The import specification is defined with the import_spec() function in the same way as the catalog operation:

# Get sample data directory
pkg <- system.file("extdata", package = "fetch")
 
# Create import spec
spc <- import_spec(TRTSDT = "date=%d%b%Y",
                   TRTEDT = "date=%d%b%Y")

# Create catalog with import spec
ct <- catalog(pkg, engines$csv, import_specs = spc)

# View dictionary for ADVS with Import Spec
ct$ADVS
# data item 'ADVS': 37 cols 3617 rows
# - Engine: csv
# - Size: 1.1 Mb
# - Last Modified: 2020-09-18 14:30:22
#    Name   Column     Class Label Format  NAs MaxChar
# 1  ADVS  STUDYID character  <NA>     NA    0       3
# 2  ADVS  USUBJID character  <NA>     NA    0      10
# 3  ADVS   SUBJID character  <NA>     NA    0       3
# 4  ADVS   SITEID character  <NA>     NA    0       2
# 5  ADVS   SRCDOM character  <NA>     NA   85       2
# 6  ADVS   SRCVAR character  <NA>     NA   85       8
# 7  ADVS   SRCSEQ   numeric  <NA>     NA   85       2
# 8  ADVS     TRTP character  <NA>     NA   85       5
# 9  ADVS    TRTPN   numeric  <NA>     NA   85       1
# 10 ADVS     TRTA character  <NA>     NA   85       5
# 11 ADVS    TRTAN   numeric  <NA>     NA   85       1
# 12 ADVS   RANDFL character  <NA>     NA    0       1
# 13 ADVS    SAFFL character  <NA>     NA    0       1
# 14 ADVS   MITTFL character  <NA>     NA    0       1
# 15 ADVS  PPROTFL character  <NA>     NA    0       1
# 16 ADVS   TRTSDT      Date  <NA>     NA   54      10
# 17 ADVS   TRTEDT      Date  <NA>     NA  119      10
# 18 ADVS      ADT character  <NA>     NA    0       9
# 19 ADVS      ADY   numeric  <NA>     NA   54       4
# 20 ADVS     ADTF   logical  <NA>     NA 3617       0
# 21 ADVS   AVISIT character  <NA>     NA   50      14
# 22 ADVS  AVISITN   numeric  <NA>     NA   50       2
# 23 ADVS    PARAM character  <NA>     NA    0      35
# 24 ADVS  PARAMCD character  <NA>     NA    0       6
# 25 ADVS   PARAMN   numeric  <NA>     NA    0       1
# 26 ADVS PARAMTYP character  <NA>     NA 3532       7
# 27 ADVS     AVAL   numeric  <NA>     NA    0       5
# 28 ADVS     BASE   numeric  <NA>     NA   70       5
# 29 ADVS      CHG   numeric  <NA>     NA 1312       4
# 30 ADVS  AWRANGE character  <NA>     NA 1331      25
# 31 ADVS AWTARGET   numeric  <NA>     NA 1331       3
# 32 ADVS  AWTDIFF   numeric  <NA>     NA 1331       2
# 33 ADVS     AWLO   numeric  <NA>     NA 1331       3
# 34 ADVS     AWHI   numeric  <NA>     NA 1331       3
# 35 ADVS      AWU character  <NA>     NA 1331       4
# 36 ADVS    ABLFL character  <NA>     NA 2869       1
# 37 ADVS  ANL01FL character  <NA>     NA  448       1

The results of this operation are similar to the previous example, when we applied the import spec to the catalog. The difference is that now the import spec applies only to this specific fetch operation, not the entire catalog.

Next Steps

For next steps, it is suggested to play around with the fetch functions, and note how easy these functions are to use. You may also wish to review the documentation for the catalog(), fetch(), import_spec() functions. The documentation contains some additional examples and descriptions of all parameters.