Upstream Source
This page is part of Truthound Orchestration 3.x.
Source repository: seadonggyun4/truthound-orchestration
Upstream docs path: docs/airflow/connections-secrets.md
Edit upstream page: Edit in orchestration
Airflow Connections and Secrets¶
Truthound's Airflow integration is designed to follow Airflow's connection model instead of inventing a second secret registry. The DataQualityHook resolves connection-backed execution when a source requires credentials, while local-file onboarding stays zero-config.
Who This Is For¶
- Airflow operators standardizing connection IDs across DAGs
- teams moving from local-file validation to warehouse-backed validation
- platform engineers deciding where Truthound should read credentials
When To Use It¶
Use this page when:
- a DAG moves from
data_path=tosql= - validation reads from Postgres, Snowflake, or another connection-backed source
- you want Airflow Variables and Secrets Backends to remain the source of truth
Prerequisites¶
truthound-orchestration[airflow]installed on the Airflow workers- an Airflow connection available for the target system
- a DAG that passes either
data_path=orsql=into a Truthound operator or sensor
Minimal Quickstart¶
Use the default connection contract when SQL execution needs credentials:
from airflow import DAG
from truthound_airflow import DataQualityCheckOperator
with DAG("warehouse_quality", schedule="@daily", catchup=False) as dag:
validate_users = DataQualityCheckOperator(
task_id="validate_users",
sql="select * from analytics.dim_users",
connection_id="warehouse_primary",
rules=[
{"column": "id", "check": "not_null"},
{"column": "email", "check": "email_format"},
],
)
For hook-first loading, use the same connection boundary explicitly:
from truthound_airflow import DataQualityHook
hook = DataQualityHook(connection_id="warehouse_primary")
data = hook.load_data(sql="select * from analytics.dim_users")
Production Pattern¶
Treat Airflow connections as the control plane for credentials and keep Truthound focused on validation semantics.
| Concern | Recommended Airflow Boundary | Why |
|---|---|---|
| Warehouse credentials | Airflow Connection | reuses existing governance and rotation |
| Runtime flags | DAG code or env vars | keeps changes versioned with the DAG |
| Token or password storage | Secrets Backend or masked extras | avoids inline secrets in DAG definitions |
| Validation rules | DAG code or imported rule modules | keeps quality intent visible in code review |
Recommended rollout sequence:
- validate with local files first
- introduce
connection_idonly when the source requires it - move secrets into the Airflow backend before enabling SQL checks in production
- keep the same connection IDs across operators and sensors to avoid split routing
Failure Modes and Troubleshooting¶
| Symptom | Likely Cause | What To Do |
|---|---|---|
operator works with data_path but fails with sql |
no Airflow connection provided | add connection_id and verify the connection exists |
| validation unexpectedly attempts network access | source resolution detected a connection-backed URI or SQL path | confirm the source shape in the task arguments |
| secrets differ between workers | environment-only credentials are not synchronized | move them to Airflow Connections or a Secrets Backend |
| result differs between tasks using the same warehouse | two connection IDs point at different environments | standardize connection naming by environment |