A function to create the import specifications for a
particular data file. This information can be used on the
catalog
or fetch
functions to correctly assign
the data types for
columns on imported data. The import specifications are defined as
name/value pairs, where the name is the column name and the value is the
data type indicator. Available data type indicators are
'guess', 'logical', 'character', 'integer', 'numeric',
'date', 'datetime', and 'time'.
Also note that multiple import specifications
can be combined into a collection, and assigned to an entire catalog.
See the specs
function
for an example of using a specs collection.
Arguments
- ...
Named pairs of column names and column data types, separated by commas. Available types are: 'guess', 'logical', 'character', 'integer', 'numeric', 'date', 'datetime', and 'time'. The date/time data types accept an optional input format. To supply the input format, append it after the data type following an equals sign, e.g.: 'date=%d%b%Y' or 'datetime=%d-%m-%Y %H:%M:%S'. Default is NULL, meaning no column types are specified, and the function should make its best guess for each column.
- na
A vector of values to be treated as NA. For example, the vector
c('', ' ')
will cause empty strings and single blanks to be converted to NA values. Default is NULL, meaning the value of thena
parameter will be taken from thespecs
function. Any value supplied on theimport_spec
function will override the value from thespecs
function.- trim_ws
Whether or not to trim white space from the input data values. The default is NULL, meaning the value of the
trim_ws
parameter will be taken from thespecs
function. Any value supplied on theimport_spec
function will override the value from thespecs
function.
Date/Time Format Codes
Below are some common date formatting codes. For a complete list,
see the documentation for the strptime
function:
%d = day as a number
%a = abbreviated weekday
%A = unabbreviated weekday
%m = month number
%b = abbreviated month name
%B = unabbreviated month name
%y = 2-digit year
%Y = 4-digit year
%H = hour
%M = minute
%S = second
%p = AM/PM indicator
See also
fetch
to retrieve data, and
specs
for creating a collection of import specs.
Other specs:
print.specs()
,
read.specs()
,
specs()
,
write.specs()
Examples
# Get sample data directory
pkg <- system.file("extdata", package = "fetch")
# Create import spec
spc <- import_spec(TRTSDT = "date=%d%b%Y",
TRTEDT = "date=%d%b%Y")
# Create catalog without filter
ct <- catalog(pkg, engines$csv, import_specs = spc)
# Get dictionary for ADVS with Import Spec
d <- ct$ADVS
# Observe data types for TRTSDT and TRTEDT are now Dates
d[d$Column %in% c("TRTSDT", "TRTEDT"), ]
# data item 'ADVS': 37 cols 3617 rows
#- Engine: csv
#- Size: 1.1 Mb
#- Last Modified: 2020-09-18 14:30:22
# Name Column Class Label Format NAs MaxChar
#16 ADVS TRTSDT Date <NA> NA 54 10
#17 ADVS TRTEDT Date <NA> NA 119 10