Flags, Flagging Schemes and Histories#
Flags#
Flags, or more formally quality annotations, are SaQC’s mechanism for representing data quality. Flags are observation-based, meaning that each individual observation of a time series has an associated quality value.
SaQC distinguishes between two layers of flag representation: an internal representation and an external representation.
Internal Representation#
The internal representation of flags is stored in the attribute
saqc._flags. All flags associated with a given field can be accessed
via:
saqc._flags[field]
The internal representation is chosen solely for reasons of internal semantics and technical considerations. It is not intended for direct user interaction.
Internally, flags are stored as floating-point values that are numerically ordered. A higher numeric flag value has precedence over lower flag values.
Two special flag values are of particular interest:
-np.infRepresents the absence of a flag. An observation flagged with
-np.infhas not been quality controlled and must be considered unchecked.
255.0Represents the default flag denoting a detected anomaly. Observations flagged with
255.0were marked as anomalous by at least one QC function. Such observations are excluded from all other QC functions executed on the time series (see Filtering).
External Representation#
The external representation of flags is stored in the attribute
saqc.flags. To access all flags associated with a time series,
subset the data structure accordingly:
saqc.flags[field]
The external representation is derived from the internal representation through a translation. A translation is implemented by a specific Flagging Scheme and defines the mapping to and from the internal representation.
Depending on the chosen flagging scheme, flags may appear in different forms. These schemes are described in more detail in the corresponding section.
Flagging Schemes and Translations#
A flagging scheme describes a coherent set of external flags, their interrelation, and the bidirectional translation between internal flags and external flags.
The translation between internal and external flags is performed
implicitly whenever the attribute saqc.flags is accessed or when
external flags are passed to a SaQC function (e.g., via the global keyword
arguments dfilter and flag).
A flagging scheme is part of the SaQC context and, as such, an attribute of
the SaQC class. A scheme can either be provided during initialization of
a SaQC object using the keyword argument scheme
(SaQC(..., scheme="simple")) or by setting the attribute
SaQC.scheme directly.
Fig. 1 illustrates the translation between two exemplary flagging schemes and the internal representation.
Fig. 1 Translation between the Flagging Schemes Scheme 1, Scheme 2 and the internal
flag representation SaQC.#
Currently, three different flagging schemes are provided:
FloatSchemeThe default flagging scheme closely resembles the internal flags and operates directly on the internal floating-point flag representation.
-numpy.nandenotes that an observation is unchecked.-numpy.infindicates that an observation has been checked by at least one SaQC function but no anomaly was detected.255.0annotates anomalous observations.
All other flag values may be used freely. Flag values greater than or equal to
255.0are subject to SaQC’s filtering mechanism.To explicitly select the
FloatScheme, either use the class directly or its string alias"float", e.g.:SaQC(..., scheme=FloatScheme()) SaQC(..., scheme="float")
To change the scheme after initialization, assign it directly:
qc.scheme = "float" qc.scheme = FloatScheme()
AnnotatedFloatSchemeImplements the same flagging logic as the
FloatSchemebut augments the flag representation with detailed information about the SaQC function that produced the flag, including its name and the concrete arguments used.SimpleSchemeThe simplest available flagging scheme. It provides only three literal flags:
"UNFLAGGED"indicates that an observation has not been checked."BAD"indicates that at least one executed SaQC function marked the observation as anomalous."OK"indicates that an observation was checked by at least one SaQC function and no anomaly was detected.
Histories#
During the execution of a quality control pipeline, multiple flags may be
assigned to each observation of a time series. In general, each QC function
produces its own set of flags. In the following example, one “layer” of flags
for the field "f1" is created by each of the executed SaQC functions
flagRange(), flagConstants() and flagUniLOF():
(qc
.flagRange(field="f1", min=0, max=100)
.flagConstants(field="f1", thresh=0, window="2h")
.flagUniLOF(field="f1"))
These successive flag assignments are stored as separate layers within a data
structure called the History.
By default, the final visible flag for an observation is obtained by selecting
the last non-null flag assigned to that observation. Alternatively, it is
possible to aggregate by selecting either the lowest or the highest flag value.
Changes to the default behavior can be made by setting the module-level
constant saqc.core.history.AGGREGATION to one of the following string
values: "last", "min", or "max".
Every field in a SaQC dataset stores its own History, accessible via:
qc._history[field]
A History consists of two components:
A
pandas.DataFramewith the same index as the associated time series and one column per executed QC function (three in the example above). Each additional SaQC function execution adds another column to thisDataFrame.A list of Python dictionaries storing metadata about the executed functions (e.g., function name and parameters). The list is position-based, meaning that the first entry corresponds to the first
Historycolumn, which in turn corresponds to the first executed SaQC function.
This mechanism provides the possibility to enrich the external flags generated in the flagging scheme with observation-level metadata and provenance information.
Filtering#
SaQC takes existing flags into account through a mechanism called filtering. By default, all observations of a given time series that are already flagged are masked before a SaQC function is executed.
Masking is implemented by temporarily replacing the corresponding
observational values with numpy.nan. More precisely, a value \(v\)
with associated flag \(f(v)\) is masked if \(f(v) \geq\)
dfilter.
All SaQC functions are designed to ignore these null values during
computation. This means that such values are excluded from most arithmetic
calculations, but may still be implicitly considered in certain operations,
such as counting the number of observations or performing nan checks.
After the function has completed, the original values are restored.
The masking behaviour can be influenced in two ways:
dfilterfunction argumentThe globally available SaQC function parameter
dfilterdefines the filtering threshold. All observations with flag values greater than or equal to the specifieddfilterlevel are masked prior to execution of the function.DFILTER_DEFAULTflagging scheme constantEach flagging scheme defines a constant
DFILTER_DEFAULTthat specifies the default filtering threshold. This value is used whenever no explicitdfilterargument is provided.Setting
DFILTER_DEFAULTto the global constantFILTER_NONE(associated with the valuenumpy.inf) disables the filtering mechanism globally.

