ufzLogo rdmLogo

Configuration Files#

In addition to it’s Python API, SaQC also offers a text based configuration system. This option has proven as a valuable strategy in largely automated quality control workflows. The ability to change existing setups without changes to the source code of, e.g. Extract-Transform-Load (ETL) pipelines, simplies the operation and adaption of such workflows as well as the collaboration with less technical project partners.

Format#

Configuration files are expected to be semicolon-separated text files with exactly one header line. Each row of the configuration file lists one variable and a respective test function that is applied to the given variable.

Header names#

The first line of every configuration file is dropped, so feel free to use header names to your liking.

Test function notation#

The notation of test functions follows the function call notation of Python and many other programming languages and looks like this:

flagRange(min=0, max=100)

Here the function flagRange is called and the values 0 and 100 are passed to the parameters min and max respectively. As we value readablity of the configuration more than conciseness of the extension language, only keyword arguments are supported. That means that the notation flagRange(0, 100) is not a valid replacement for the above example.

Examples#

Every row lists one test per variable. If you want to call multiple tests on a specific variable (and you probably want to), list them in separate rows:

varname ; test
#-------;----------------------------------
x       ; flagMissing()
x       ; flagRange(min=0, max=100)
x       ; flagConstants(window="3h")
y       ; flagRange(min=-10, max=40)

Available Test Functions#

All test functions available in the Python API are also available in the configuration system. The translation between API calls and the configuration syntax is straight forward and best described by an example. Let’s consider the definition of flagRange:

flagRange(field, min=-inf, max=inf, flag=255.)

The signature consists of the prevalent parameter field, the specific parameters min and max as well as the global parameter flag. The translation of the given API call to flagRange

qc.flagRange("x", 0, 100, flag=BAD)

into the configuration syntax look as follows:

varname ; test
#-------;------------------------------------
x       ; flagRange(min=0, max=100, flag=BAD)

We made the following changes: The value for field is given in the first column of the configuration file, the actual function including all parameter as name-value pairs are given in the second column.

All other test functions can be used in the same manner.

Regular Expressions in varname column#

Some of the tests (e.g. checks for missing values, range tests or interpolation functions) are very likely to be used on all or at least several variables of a given dataset. As it becomes quite cumbersome to list all these variables seperately, only to call the same functions with the same parameters, SaQC supports regular expressions on variables. To mark a given variable name as a regular expression, it needs to be quoted with ' or ".

varname    ; test
#----------;------------------------------
'.*'       ; shift(freq="15Min")
'(x | y)'  ; flagMissing()