ufzLogo rdmLogo

SaQC - System for automated Quality Control#

SaQC is an open-source framework for automated, transparent, and reproducible quality control of time series data. It transforms raw time series data into trustworthy data products by making quality control an explicit step in FAIR-compliant workflows, enabling reliable use in applications such as monitoring, modelling, and decision-making.

Quality control logic in SaQC can be defined using its Python API or through structured, low-code configuration files. The low-code approach enables domain experts to define checks, compound flagging strategies, and processing steps with minimal programming effort—and to apply the same rules consistently to both historical archives and live data streams.

A distinctive feature of SaQC is its flexible quality annotation, which provides a complete, observation-level flag history to ensure end-to-end provenance, traceability, and auditability. Its anomaly detection capabilities range from classical validation methods to advanced techniques. Most components of SaQC, including quality annotation and QC functionality, are easily extensible through well-defined interfaces, enabling hybrid rule-based and machine learning workflows.

Getting Started
  • installation and setup

  • first steps

  • Python API introduction

  • command-line usage

Getting Started
Functionality Overview (API)
  • overview of flagging methods

  • overview of processing algorithms

  • overview of tools

Test Functions
Documentation
  • configuration-based quality control

  • global keywords

  • flags and flagging

  • customization

Documentation
Cookbooks
  • outlier detection

  • frequency alignment

  • drift detection

  • data modeling

  • custom and generic function usage

Cook Books
Galaxy Tool
  • configure and run SaQC

  • integrate into larger workflows

https://usegalaxy.eu/?tool_id=toolshed.g2.bx.psu.edu%2Frepos%2Fufz%2Fsaqc%2Fsaqc%2F2.7.0%2Bgalaxy0&version=latest
Community
  • publications

  • users and partners

Users & Partners

SaQC turns quality control into an explicit, traceable, and version-controlled step in time series data workflows, enabling the production of AI-ready data and supporting reliable downstream use in research data portals, environmental models, and digital twins.

Beyond stand-alone use, SaQC is designed as a modular building block that can be integrated into various applications. It is, for example, an integral part of Neptoon, is integrated into time.IO - a time series data infrastructure developed at the UFZ - and is also available on Galaxy Europe for workflow-based, low-barrier execution within larger analysis pipelines.

SaQC is developed and maintained by the Research Data Management Team at UFZ at the Helmholtz Centre for Environmental Research - UFZ. It reflects the requirements and experience gained from implementing and operating fully automated quality control pipelines for environmental sensor data.

The diversity of involved communities, along with the specific demands of scientific data acquisition and provisioning, has shaped SaQC into its current form: inherently consistent yet externally extensible, fully traceable, accessible to non-programmers, and applicable across a wide range of use cases— from exploratory, interactive programming environments to large-scale, fully automated workflows.