Skip to contents

A function to create a collection of import specifications for a data source. These specs can be used on the catalog function to correctly assign the data types uniquely for different imported data files. The spec collection is a set of import_spec objects identified by name/value pairs. The name corresponds to the name of the input dataset, without file extension. The value is the import_spec object to use for that dataset. In this way, you may define different specs for each dataset in your catalog.

The import engines will guess at the data types for any columns that are not explicitly defined in the import specifications. The import spec syntax is the same for all data engines.

Note that the na and trim_ws parameters on the specs function will be applied globally to all files in the library. These global settings can be overridden on the import_spec for any particular data file.

Also note that the specs collection is defined as an object so it can be stored and reused. See the write.specs and read.specs functions for additional information on saving and restoring specs.

Usage

specs(..., na = c("", "NA"), trim_ws = TRUE)

Arguments

...

Named input specs. The name should correspond to the file name, without the file extension. The spec is defined as an import_spec object. See the import_spec function for additional information on parameters for that object.

na

A vector of values to be treated as NA. For example, the vector c('', ' ') will cause empty strings and single blanks to be converted to NA values. For most file types, empty strings and the string 'NA' ('', 'NA') are considered NA. For SAS® datasets and transport files, a single blank and a single dot c(" ", ".") are considered NA. The value of the na parameter on the specs function can be overridden by the na parameter on the import_spec function.

trim_ws

Whether or not to trim white space from the input data values. Valid values are TRUE, and FALSE. Default is TRUE. The value of the trim_ws parameter on the specs function can be overridden by the trim_ws parameter on the import_spec function.

Value

The import spec collection. The class of the object is "specs".

See also

catalog to create a data catalog, fetch for retrieving data, and import_spec for additional information on defining an import spec.

Other specs: import_spec(), print.specs(), read.specs(), write.specs()

Examples

# Get sample data directory
pkg <- system.file("extdata", package = "fetch")

# Create import spec
spc <- specs(ADAE = import_spec(TRTSDT = "date=%d%b%Y",
                                TRTEDT = "date=%d%b%Y"),
             ADVS = import_spec(TRTSDT = "character",
                                TRTEDT = "character"))

# Create catalog with specs collection
ct <- catalog(pkg, engines$csv, import_specs = spc)

# Get dictionary for ADAE with Import Spec
d1 <- ct$ADAE

# Observe data types for TRTSDT and TRTEDT are Dates
d1[d1$Column %in% c("TRTSDT", "TRTEDT"), ]
# data item 'ADAE': 56 cols 150 rows
#- Engine: csv
#- Size: 155 Kb
#- Last Modified: 2020-09-18 14:30:22
#   Name Column Class Label Format NAs MaxChar
#13 ADAE TRTSDT  Date  <NA>     NA   1      10
#14 ADAE TRTEDT  Date  <NA>     NA   4      10

# Get dictionary for ADVS with Import Spec
d2 <- ct$ADVS

# Observe data types for TRTSDT and TRTEDT are character
d2[d2$Column %in% c("TRTSDT", "TRTEDT"), ]
# data item 'ADVS': 37 cols 3617 rows
#- Engine: csv
#- Size: 1.1 Mb
#- Last Modified: 2020-09-18 14:30:22
#   Name Column     Class Label Format NAs MaxChar
#16 ADVS TRTSDT character  <NA>     NA  54       9
#17 ADVS TRTEDT character  <NA>     NA 119       9