The fetch package allows you to retrieve data from many different data sources. The package retrieves data in a memory-efficient manner. You first identify the data by defining a data catalog. Then fetch the data from the catalog. Catalogs can be defined for many popular data formats: csv, rds, sas7bdat, excel, etc.
The functions contained in the fetch package are as follows:
catalog
: Creates a data libraryfetch
: Creates a data dictionaryimport_spec
: Defines an import spec for a specific dataset
The fetch
function retrieves a dataset from a data
catalog. The function accepts a catalog item as the first parameter. The
catalog item is the only required parameter. The "select" parameter allows
you to pull only some of the columns. The "where" and "top" parameters
may be used to define a subset of the data to retrieve. The "import_specs"
parameter accepts an import_spec
object, which can be used
to control how data is read into the data frame.
Arguments
- catalog
The catalog item to fetch data for. Catalog items are created using the
catalog
function.- select
A vector of column names or column numbers to extract from the data item. Note that the column names can be easily obtained as a vector from the catalog item, and then manipulated to suit your needs.
- where
An optional expression to be used to filter the fetched data. Use the base R
expression
function to define the expression. The expression allows logical operators and Base R functions. Column names can be unquoted.- top
A number of records to return from the head of the data item. Valid value is an integer.
- import_specs
The import specs to use for the fetch operation. Import specs can be used to control the data types of the fetched dataset. An import specification is created with the
import_spec
function. See the documentation of this function for additional details and an example.
See also
Useful links:
The catalog
function to create a data catalog.
Also see the import_spec
function to create import specifications.
Author
Maintainer: David Bosak dbosak01@gmail.com
Other contributors:
Kevin Kramer kkrame02@amgen.com [contributor]
Archytas Clinical Solutions [copyright holder]
Examples
# Get data directory
pkg <- system.file("extdata", package = "fetch")
# Create catalog
ct <- catalog(pkg, engines$csv)
# View catalog
ct
# data catalog: 6 items
# - Source: C:/packages/fetch/inst/extdata
# - Engine: csv
# - Items:
# data item 'ADAE': 56 cols 150 rows
# data item 'ADEX': 17 cols 348 rows
# data item 'ADPR': 37 cols 552 rows
# data item 'ADPSGA': 42 cols 695 rows
# data item 'ADSL': 56 cols 87 rows
# data item 'ADVS': 37 cols 3617 rows
# Example 1: Fetch Entire Dataset
# Get data from the catalog
dat1 <- fetch(ct$ADEX)
# View Data
dat1
# A tibble: 348 × 17
# STUDYID USUBJID SUBJID SITEID TRTP TRTPN TRTA TRTAN RANDFL SAFFL
# <chr> <chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <chr>
# 1 ABC ABC-01-0… 049 01 ARM D 4 ARM D 4 Y Y
# 2 ABC ABC-01-0… 049 01 ARM D 4 ARM D 4 Y Y
# 3 ABC ABC-01-0… 049 01 ARM D 4 ARM D 4 Y Y
# 4 ABC ABC-01-0… 049 01 ARM D 4 ARM D 4 Y Y
# 5 ABC ABC-01-0… 050 01 ARM B 2 ARM B 2 Y Y
# 6 ABC ABC-01-0… 050 01 ARM B 2 ARM B 2 Y Y
# 7 ABC ABC-01-0… 050 01 ARM B 2 ARM B 2 Y Y
# 8 ABC ABC-01-0… 050 01 ARM B 2 ARM B 2 Y Y
# 9 ABC ABC-01-0… 051 01 ARM A 1 ARM A 1 Y Y
# 10 ABC ABC-01-0… 051 01 ARM A 1 ARM A 1 Y Y
# 338 more rows
# 7 more variables: MITTFL <chr>, PPROTFL <chr>, PARAM <chr>,
# PARAMCD <chr>, PARAMN <dbl>, AVAL <dbl>, AVALCAT1 <chr>
# Use `print(n = ...)` to see more rows
# Example 2: Fetch a Subset
# Get data with selected columns and where expression
dat2 <- fetch(ct$ADEX, select = c("SUBJID", "TRTA", "RANDFL", "SAFFL"),
where = expression(SUBJID == '051'))
# View Data
dat2
# A tibble: 4 x 4
# SUBJID TRTA RANDFL SAFFL
# <chr> <chr> <chr> <chr>
# 1 051 ARM A Y Y
# 2 051 ARM A Y Y
# 3 051 ARM A Y Y
# 4 051 ARM A Y Y