truthound checkpoint run¶
Run a validation checkpoint. This command executes a named checkpoint from the configuration file.
Synopsis¶
Arguments¶
| Argument | Required | Description |
|---|---|---|
name |
Yes | Name of the checkpoint to run |
Options¶
| Option | Short | Default | Description |
|---|---|---|---|
--config |
-c |
None | Checkpoint configuration file (YAML/JSON) |
--data |
-d |
None | Override data source path |
--validators |
-v |
None | Override validators (comma-separated) |
--output |
-o |
None | Output file path (JSON) |
--format |
-f |
console |
Output format (console, json) |
--strict |
false |
Exit with code 1 if issues found | |
--store |
None | Results storage directory | |
--slack |
None | Slack webhook URL | |
--webhook |
None | Generic webhook URL | |
--github-summary |
false |
Write GitHub Actions job summary |
Description¶
The checkpoint run command executes a named checkpoint:
- Loads the checkpoint configuration
- Validates the data assets
- Runs all configured validators
- Sends notifications (if configured)
- Stores results (if configured)
Examples¶
Basic Execution¶
Output:
Checkpoint: daily_data_validation
=================================
Running validators on data/production.csv...
Validators:
✓ null
✓ duplicate
✗ range (age): 5 values outside [0, 150]
✓ regex (email)
Summary:
Total Validators: 4
Passed: 3
Failed: 1
Status: FAILED
Strict Mode (CI/CD)¶
Exit with code 1 if any issues are found:
Override Data Source¶
Run checkpoint with different data:
Override Validators¶
Run only specific validators:
JSON Output¶
Output file (results.json):
{
"checkpoint": "daily_data_validation",
"timestamp": "2024-01-15T10:30:00Z",
"status": "failed",
"data_source": "data/production.csv",
"rows": 10000,
"results": [
{
"validator": "null",
"passed": true
},
{
"validator": "duplicate",
"passed": true
},
{
"validator": "range",
"passed": false,
"issues": [
{
"severity": "high",
"message": "5 values outside range [0, 150]"
}
]
},
{
"validator": "regex",
"passed": true
}
],
"summary": {
"total_validators": 4,
"passed": 3,
"failed": 1
}
}
Store Results¶
Save results to a directory:
Results are stored as:
.truthound/results/
└── daily_data_validation/
└── 2024-01-15T10-30-00/
├── report.json
└── summary.txt
Slack Notification¶
Send results to Slack:
Or configure in YAML:
checkpoints:
- name: daily_data_validation
data_source: data/production.csv
validators:
- 'null'
- duplicate
actions:
- type: slack
webhook_url: ${SLACK_WEBHOOK_URL}
notify_on: failure
channel: '#data-quality'
Webhook Notification¶
Send results to a webhook:
truthound checkpoint run daily_data_validation -c truthound.yaml --webhook https://api.example.com/webhook
GitHub Actions Summary¶
Write summary to GitHub Actions:
This creates a summary visible in the GitHub Actions job:
## Data Quality Report: daily_data_validation
| Validator | Status |
|-----------|--------|
| null | Passed |
| duplicate | Passed |
| range | Failed |
| regex | Passed |
### Issues Found
- **range**: 5 values outside range [0, 150]
Configuration File¶
Minimal Configuration¶
Full Configuration¶
checkpoints:
- name: daily_data_validation
data_source: data/production.csv
validators:
- 'null'
- duplicate
- range
validator_config:
range:
columns:
age:
min_value: 0
max_value: 150
price:
min_value: 0
# Note: For regex validation, use th.check() with RegexValidator directly:
# RegexValidator(pattern=r"^[\w.+-]+@[\w-]+\.[\w.-]+$", columns=["email"])
min_severity: medium
auto_schema: true
tags:
environment: production
team: data-platform
actions:
- type: store_result
store_path: ./truthound_results
partition_by: date
- type: update_docs
site_path: ./truthound_docs
include_history: true
- type: slack
webhook_url: https://hooks.slack.com/services/YOUR/WEBHOOK/URL
notify_on: failure
channel: '#data-quality'
triggers:
- type: schedule
interval_hours: 24
run_on_weekdays: [0, 1, 2, 3, 4]
- name: hourly_metrics_check
data_source: data/metrics.parquet
validators:
- 'null'
- range
validator_config:
range:
columns:
value:
min_value: 0
max_value: 100
count:
min_value: 0
actions:
- type: webhook
url: https://api.example.com/data-quality/events
auth_type: bearer
auth_credentials:
token: ${API_TOKEN}
triggers:
- type: cron
expression: 0 * * * *
Exit Codes¶
| Code | Condition |
|---|---|
| 0 | Success (no issues, or issues found without --strict) |
| 1 | Error (issues found with --strict flag, configuration error, or other error) |
Environment Variables¶
| Variable | Description |
|---|---|
TRUTHOUND_CONFIG |
Default config file path |
SLACK_WEBHOOK_URL |
Slack webhook URL |
WEBHOOK_URL |
Generic webhook URL |
API_TOKEN |
API authentication token |
Use Cases¶
1. CI/CD Pipeline¶
# GitHub Actions
- name: Run Data Quality Check
run: |
truthound checkpoint run daily_data_validation \
--config truthound.yaml \
--strict \
--github-summary
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
2. Scheduled Validation¶
# Cron job
0 6 * * * truthound checkpoint run daily_data_validation -c /app/truthound.yaml --store /var/log/truthound
3. Pre-Deployment Check¶
# Before deployment
truthound checkpoint run daily_data_validation -c truthound.yaml --strict || exit 1
4. Multiple Checkpoints¶
# Run multiple checkpoints
for checkpoint in daily_data_validation hourly_metrics_check; do
truthound checkpoint run $checkpoint -c truthound.yaml --strict
done
Related Commands¶
checkpoint list- List available checkpointscheckpoint validate- Validate configurationcheckpoint init- Initialize configurationcheck- Single file validation