Skip to content

Upstream Source

This page is part of Truthound Orchestration 3.x.

Source repository: seadonggyun4/truthound-orchestration Upstream docs path: docs/airflow/observability-alerting.md Edit upstream page: Edit in orchestration

Airflow Observability and Alerting

Airflow is often the place where Truthound results become operational signals. The Airflow package ships host-native callback and monitoring surfaces so teams can route validation failures through the same alerting channels they already use for task and DAG health.

Who This Is For

  • on-call operators and platform teams
  • DAG authors wiring warning and failure signals into callbacks
  • teams standardizing metrics and alert payloads across pipelines

When To Use It

Use this page when:

  • a validation failure should send an operational notification
  • warning-only checks still need visibility
  • you want Airflow-native callbacks around shared Truthound results

Prerequisites

  • working Airflow callbacks or alerting conventions
  • a supported Truthound Airflow operator
  • understanding of the team's warning vs failure policy

Minimal Quickstart

Attach a callback to a quality task:

from airflow import DAG
from truthound_airflow import DataQualityCheckOperator, DataQualitySLACallback

with DAG("quality_alerts", schedule="@daily", catchup=False) as dag:
    validate_users = DataQualityCheckOperator(
        task_id="validate_users",
        data_path="/opt/airflow/data/users.parquet",
        rules=[{"column": "id", "check": "not_null"}],
        on_failure_callback=DataQualitySLACallback(),
    )

For composite policies, chain callbacks instead of duplicating logic in DAG code:

from truthound_airflow import CallbackChain, DataQualitySLACallback, QualityAlertCallback

callback_chain = CallbackChain(
    callbacks=[DataQualitySLACallback(), QualityAlertCallback()]
)

Production Pattern

A durable Airflow alerting design separates concerns:

Concern Recommended Surface
task-local failure routing on_failure_callback
quality-specific alert policy DataQualitySLACallback or QualityAlertCallback
multiple sinks CallbackChain
raw counts and metadata shared result payload in XCom/logs

Recommended production checklist:

  • define which checks fail hard and which warn
  • route failure callbacks to the same destination your team already monitors
  • keep alert text short and link back to the DAG run for detail
  • emit metrics from the callback layer instead of scraping logs later

Failure Modes and Troubleshooting

Symptom Likely Cause What To Do
warnings are invisible only hard failures trigger callback logic add warning-aware callbacks or a metrics sink
duplicate notifications both task callbacks and DAG-level callbacks emit the same alert pick one owner for final alert emission
alert lacks dataset context the callback reads only Airflow task state enrich it from the shared result payload
operator succeeds but on-call still gets paged warning/failure policy is mixed in callback code make severity mapping explicit and documented