truthound check¶
Validate data quality in a file. This command runs validators against your data and reports any issues found.
Synopsis¶
Arguments¶
| Argument | Required | Description |
|---|---|---|
file |
Yes | Path to the data file (CSV, JSON, Parquet, NDJSON, JSONL) |
Options¶
Core Options¶
| Option | Short | Default | Description |
|---|---|---|---|
--validators |
-v |
None | Comma-separated list of validators to run (runs all validators when not specified) |
--min-severity |
-s |
None | Minimum severity level to report (low, medium, high, critical) |
--schema |
None | Schema file for validation | |
--auto-schema |
false |
Auto-learn and cache schema | |
--format |
-f |
console |
Output format (console, json, html) |
--output |
-o |
None | Output file path (required for html format) |
--strict |
false |
Exit with code 1 if issues found |
Result Format Options (VE-1)¶
| Option | Short | Default | Description |
|---|---|---|---|
--result-format |
--rf |
summary |
Detail level: boolean_only, basic, summary, complete |
--include-unexpected-rows |
false |
Include failing rows in output (requires --rf complete) |
|
--max-unexpected-rows |
1000 |
Maximum number of unexpected rows to include |
Exception Handling Options (VE-5)¶
| Option | Short | Default | Description |
|---|---|---|---|
--catch-exceptions / --no-catch-exceptions |
true |
Isolate validator exceptions instead of aborting | |
--max-retries |
0 |
Number of retries for transient failures | |
--show-exceptions |
false |
Display exception details in output |
Description¶
The check command validates data quality by running a suite of validators:
- Completeness: Null values, missing data
- Uniqueness: Duplicates, primary key violations
- Consistency: Type mismatches, format violations
- Validity: Range checks, pattern matching
- Schema: Column presence, data type compliance
Examples¶
Basic Validation¶
Run all validators with default settings:
Output:
Data Quality Report
===================
File: data.csv
Rows: 1000
Columns: 5
Issues Found: 3
HIGH null_check email: 50 null values (5.0%)
MEDIUM range_check age: 10 values outside range [0, 120]
LOW duplicate_check id: 2 duplicate values
Specific Validators¶
Run only selected validators:
Severity Filter¶
Report only high and critical issues:
Schema Validation¶
Validate against a predefined schema:
Auto-Schema Mode¶
Automatically learn and cache schema on first run:
- First run: Learns schema and caches to
.truthound/schema_cache/ - Subsequent runs: Validates against cached schema
Strict Mode (CI/CD)¶
Exit with code 1 if any issues are found:
Result Format Control (VE-1)¶
Control the detail level of validation output:
# Quick pass/fail check (fastest)
truthound check data.csv --rf boolean_only
# Basic with sample values
truthound check data.csv --rf basic
# Full detail with unexpected rows and debug queries
truthound check data.csv --rf complete --include-unexpected-rows
# Limit unexpected rows
truthound check data.csv --rf complete --include-unexpected-rows --max-unexpected-rows 500
Exception Handling (VE-5)¶
Control error behavior during validation:
# Retry transient failures up to 3 times
truthound check data.csv --max-retries 3
# Strict mode: abort on first exception
truthound check data.csv --no-catch-exceptions
# Show exception details in output
truthound check data.csv --show-exceptions
# Combined: resilient mode with visibility
truthound check data.csv --catch-exceptions --max-retries 2 --show-exceptions
Output Formats¶
# Console (default)
truthound check data.csv
# JSON output
truthound check data.csv --format json -o report.json
# HTML report (requires pip install truthound[reports])
truthound check data.csv --format html -o report.html
Available Validators¶
Completeness¶
| Validator | Description |
|---|---|
null |
Check for null values |
completeness |
Check completeness ratio |
Uniqueness¶
| Validator | Description |
|---|---|
duplicate |
Check for duplicate rows |
unique |
Check column uniqueness |
Validity¶
| Validator | Description |
|---|---|
range |
Check numeric ranges |
pattern |
Check string patterns |
email |
Validate email format |
phone |
Validate phone format |
url |
Validate URL format |
date |
Validate date format |
Consistency¶
| Validator | Description |
|---|---|
type |
Check data type consistency |
format |
Check format consistency |
Schema¶
| Validator | Description |
|---|---|
schema |
Validate against schema definition |
Output Formats¶
Console Output¶
Data Quality Report
===================
File: data.csv
Rows: 1000
Columns: 5
Issues Found: 3
HIGH null_check email: 50 null values (5.0%)
MEDIUM range_check age: 10 values outside range [0, 120]
LOW duplicate_check id: 2 duplicate values
Summary:
Total Issues: 3
Critical: 0
High: 1
Medium: 1
Low: 1
JSON Output¶
{
"file": "data.csv",
"rows": 1000,
"columns": 5,
"passed": false,
"issues": [
{
"validator": "null_check",
"column": "email",
"severity": "high",
"message": "50 null values (5.0%)",
"details": {
"null_count": 50,
"null_ratio": 0.05
}
}
],
"summary": {
"total": 3,
"critical": 0,
"high": 1,
"medium": 1,
"low": 1
}
}
HTML Output¶
Generates an interactive HTML report with:
- Summary dashboard
- Issue breakdown by severity
- Column-level statistics
- Visualizations
CI/CD Integration¶
GitHub Actions¶
- name: Validate Data Quality
run: truthound check data/*.csv --strict
- name: Generate Report
if: failure()
run: truthound check data/*.csv --format html -o report.html
- name: Upload Report
if: failure()
uses: actions/upload-artifact@v4
with:
name: data-quality-report
path: report.html
GitLab CI¶
validate-data:
script:
- truthound check data/*.csv --strict --format json -o report.json
artifacts:
when: on_failure
paths:
- report.json
Exit Codes¶
| Code | Condition |
|---|---|
| 0 | Success (no issues, or issues found without --strict) |
| 1 | Issues found with --strict flag |
| 2 | Usage error (invalid arguments) |