ufzLogo rdmLogo

Global Keywords#

Introduction to the usage of the global keywords. (Keywords that can be passed to any saqc.SaQC method.)

  1. Set Up

  2. label keyword

  3. dfilter and flag keywords

Set Up#

Flagging Scheme Constraint#

The Tutorial currently only works when instantiating an SaQC object with the default flagging scheme, which is the FloatScheme.

Example Data#

Lets generate some example data and plot it:

>>> import pandas as pd
>>> import numpy as np
>>> noise = np.random.normal(0, 1, 200) # some normally distributed noise
>>> data = pd.Series(noise, index=pd.date_range('2020','2021',periods=200), name='data') # index the noise with some dates
>>> data.iloc[20] = 16 # add some artificial anomalies:
>>> data.iloc[100] = -17
>>> data.iloc[160:180] = -3
>>> qc = saqc.SaQC(data)
>>> qc.plot('data') 
../_images/GlobalKeywords-2.png

Label Keyword#

The label keyword can be passed with any function call and serves as label to be plotted by a subsequent call to saqc.SaQC.plot().

It is especially useful for enriching figures with custom context information, and for making results from different function calls distinguishable with respect to their purpose and parameterisation. Check out the following example:

At first, we apply some flagging functions to mark anomalies without usage of the label keyword:

>>> qc = qc.flagRange('data', max=15)
>>> qc = qc.flagRange('data', min=-16)
>>> qc = qc.flagConstants('data', window='2D', thresh=0)
>>> qc = qc.flagManual('data', mdata=pd.Series('2020-05', index=pd.DatetimeIndex(['2020-03'])))
>>> qc.plot('data') 
../_images/GlobalKeywords-3.png

In the above plot, one might want to discern the two results from the call to saqc.SaQC.flagRange() with respect to the parameters they where called with, also, one might want to give some hints about what is the context of the flags “manually” determined by the call to saqc.SaQC.flagManual(). Lets repeat the procedure and enrich the call with this information by making use of the label keyword:

Label Example Usage#

>>> qc = saqc.SaQC(data)
>>> qc = qc.flagRange('data', max=15, label='values < 15')
>>> qc = qc.flagRange('data', min=-16, label='values > -16')
>>> qc = qc.flagConstants('data', window='2D', thresh=0, label='values constant longer than 2 days')
>>> qc = qc.flagManual('data', mdata=pd.Series('2020-05', index=pd.DatetimeIndex(['2020-03'])), label='values collected while sensor maintenance')
>>> qc.plot('data') 
../_images/GlobalKeywords-4.png

dfilter and flag keywords#

The flag keyword controls a tests level of flagging \(f(v)\) for any value \(v\). So, in short, the keyword controls the output flag level of any flagging function.

The dfilter keyword controls the threshold up to which a flagged value is masked, when passed on to any flagging function. So, in short, it controls the input threshold, up to which flagged values are visible to any function that operates on the values.

In more detail: Any value \(v\) with a flag \(f(v)\) will be masked, if \(f(v) >=\) dfilter. A masked value will appear as NaN (not a number, or missing) to the flagging function and will be numerically treated as such. (This means, its excluded from most arithmetic calculations, but may be implicitly part of operations, such as count(NaN) or isnan). Lets at first visualize this interplay with the saqc.SaqC.plot() method. (We are reusing data and code from the Example Data section). First, we set some flags to the data. As pointed out in Flagging Scheme Constraint , we are referring to defaultly instantiated saqc.SaQC objects, that use the FloatScheme , (which uses a real valued scale of flags levels, ranging from -inf to 255.0).:

>>> qc = saqc.SaQC(data)
>>> qc = qc.flagRange('data', max=15, label='flaglevel=200', flag=200)
>>> qc = qc.flagRange('data', min=-16, label='flaglevel=100', flag=100)
>>> qc = qc.flagManual('data', mdata=pd.Series('2020-05', index=pd.DatetimeIndex(['2020-03'])), label='flaglevel=0', flag=0)
>>> qc.plot('data') 
../_images/GlobalKeywords-5.png

With the dfilter Keyword, we can now control, which of the flags are passed on to the plot function. For example, if we set dfilter=50, the flags set by the saqc.SaQC.flagRange() method wont get passed on and thus, the resulting plot will be cleared from the flags:

>>> qc.plot('data', dfilter=50) 
../_images/GlobalKeywords-6.png

Flags of Different Significance#

We can also use the interplay between the dfilter keyword and flag keyword, to order flags priorities. By default, the dfilter keyword is set to the highest flag value of the instantiated flagging scheme, referred to, as BAD. Since the flag set by a test also defaults to BAD, the second call to saqc.SaQC.flagRange() in the example below, wont get passed the values already flagged by the first call to saqc.SaQC.flagRange() - so it cant check the value level and assign no additional flag by its self.

>>> qc = saqc.SaQC(data)
>>> qc = qc.flagRange('data', max=15, label='value > 15')
>>> qc = qc.flagRange('data', max=0, label='value > 0')
>>> qc.plot('data') 
../_images/GlobalKeywords-7.png

We can make the value flagged by both the flagging functions by increasing the dfilter threshold of the flagging function called second, above the default flag level of BAD. This can be achieved, by passing the flagging constant FILTER_NONE,

>>> from saqc.constants import FILTER_NONE
>>> qc = saqc.SaQC(data)
>>> qc = qc.flagRange('data', max=15, label='value > 15')
>>> qc = qc.flagRange('data', max=0, label='value > 0', dfilter=FILTER_NONE)
>>> qc.plot('data') 
../_images/GlobalKeywords-8.png

Unflagging Values#

With the flag keyword it is as also possible, to revoke or unflag a flag from a value. This way, it is possible to associate flags with conditions determined by other functions. For example, if we want to flag all values below a level of 0.5, but not those that belong to a constant value course, we can achieve that, by combining the flag and the dfilter keyword. Lets first flag all the data below a level of 0.5:

>>> qc = saqc.SaQC(data)
>>> qc = qc.flagRange('data', min=0.5)
>>> qc.plot('data') 
../_images/GlobalKeywords-9.png

Now we can override the flags for the constant value course with the lowest (unflagged) flag level, which, for the FloatScheme is the value -np.inf. Alternatively to the explicit value, we can use the UNFLAGGED constant. Also, for the override to work, we have to rise (or deactivate) the input filter, so that the saqc.SaQC.flagConstants() method gets the already flagged values passed to test them.

>>> from saqc.constants import UNFLAGGED, FILTER_NONE
>>> qc = qc.flagConstants('data', window='2D', thresh=0, dfilter=FILTER_NONE, flag=UNFLAGGED)
>>> qc.plot('data') 
../_images/GlobalKeywords-10.png