Skip to content

truthound check

Validate data quality in a file. This command runs validators against your data and reports any issues found.

Synopsis

truthound check <file> [OPTIONS]

Arguments

Argument Required Description
file Yes Path to the data file (CSV, JSON, Parquet, NDJSON, JSONL)

Options

Core Options

Option Short Default Description
--validators -v None Comma-separated list of validators to run (runs all validators when not specified)
--min-severity -s None Minimum severity level to report (low, medium, high, critical)
--schema None Schema file for validation
--auto-schema false Auto-learn and cache schema
--format -f console Output format (console, json, html)
--output -o None Output file path (required for html format)
--strict false Exit with code 1 if issues found

Result Format Options (VE-1)

Option Short Default Description
--result-format --rf summary Detail level: boolean_only, basic, summary, complete
--include-unexpected-rows false Include failing rows in output (requires --rf complete)
--max-unexpected-rows 1000 Maximum number of unexpected rows to include

Exception Handling Options (VE-5)

Option Short Default Description
--catch-exceptions / --no-catch-exceptions true Isolate validator exceptions instead of aborting
--max-retries 0 Number of retries for transient failures
--show-exceptions false Display exception details in output

Description

The check command validates data quality by running a suite of validators:

  • Completeness: Null values, missing data
  • Uniqueness: Duplicates, primary key violations
  • Consistency: Type mismatches, format violations
  • Validity: Range checks, pattern matching
  • Schema: Column presence, data type compliance

Examples

Basic Validation

Run all validators with default settings:

truthound check data.csv

Output:

Data Quality Report
===================
File: data.csv
Rows: 1000
Columns: 5

Issues Found: 3

  HIGH    null_check       email: 50 null values (5.0%)
  MEDIUM  range_check      age: 10 values outside range [0, 120]
  LOW     duplicate_check  id: 2 duplicate values

Specific Validators

Run only selected validators:

truthound check data.csv -v null,duplicate,range

Severity Filter

Report only high and critical issues:

truthound check data.csv --min-severity high

Schema Validation

Validate against a predefined schema:

truthound check data.csv --schema schema.yaml

Auto-Schema Mode

Automatically learn and cache schema on first run:

truthound check data.csv --auto-schema
  • First run: Learns schema and caches to .truthound/schema_cache/
  • Subsequent runs: Validates against cached schema

Strict Mode (CI/CD)

Exit with code 1 if any issues are found:

truthound check data.csv --strict

Result Format Control (VE-1)

Control the detail level of validation output:

# Quick pass/fail check (fastest)
truthound check data.csv --rf boolean_only

# Basic with sample values
truthound check data.csv --rf basic

# Full detail with unexpected rows and debug queries
truthound check data.csv --rf complete --include-unexpected-rows

# Limit unexpected rows
truthound check data.csv --rf complete --include-unexpected-rows --max-unexpected-rows 500

Exception Handling (VE-5)

Control error behavior during validation:

# Retry transient failures up to 3 times
truthound check data.csv --max-retries 3

# Strict mode: abort on first exception
truthound check data.csv --no-catch-exceptions

# Show exception details in output
truthound check data.csv --show-exceptions

# Combined: resilient mode with visibility
truthound check data.csv --catch-exceptions --max-retries 2 --show-exceptions

Output Formats

# Console (default)
truthound check data.csv

# JSON output
truthound check data.csv --format json -o report.json

# HTML report (requires pip install truthound[reports])
truthound check data.csv --format html -o report.html

HTML Report Dependency

HTML reports require Jinja2. Install with:

pip install truthound[reports]

Available Validators

Completeness

Validator Description
null Check for null values
completeness Check completeness ratio

Uniqueness

Validator Description
duplicate Check for duplicate rows
unique Check column uniqueness

Validity

Validator Description
range Check numeric ranges
pattern Check string patterns
email Validate email format
phone Validate phone format
url Validate URL format
date Validate date format

Consistency

Validator Description
type Check data type consistency
format Check format consistency

Schema

Validator Description
schema Validate against schema definition

Output Formats

Console Output

Data Quality Report
===================
File: data.csv
Rows: 1000
Columns: 5

Issues Found: 3

  HIGH    null_check       email: 50 null values (5.0%)
  MEDIUM  range_check      age: 10 values outside range [0, 120]
  LOW     duplicate_check  id: 2 duplicate values

Summary:
  Total Issues: 3
  Critical: 0
  High: 1
  Medium: 1
  Low: 1

JSON Output

{
  "file": "data.csv",
  "rows": 1000,
  "columns": 5,
  "passed": false,
  "issues": [
    {
      "validator": "null_check",
      "column": "email",
      "severity": "high",
      "message": "50 null values (5.0%)",
      "details": {
        "null_count": 50,
        "null_ratio": 0.05
      }
    }
  ],
  "summary": {
    "total": 3,
    "critical": 0,
    "high": 1,
    "medium": 1,
    "low": 1
  }
}

HTML Output

Generates an interactive HTML report with:

  • Summary dashboard
  • Issue breakdown by severity
  • Column-level statistics
  • Visualizations

CI/CD Integration

GitHub Actions

- name: Validate Data Quality
  run: truthound check data/*.csv --strict

- name: Generate Report
  if: failure()
  run: truthound check data/*.csv --format html -o report.html

- name: Upload Report
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: data-quality-report
    path: report.html

GitLab CI

validate-data:
  script:
    - truthound check data/*.csv --strict --format json -o report.json
  artifacts:
    when: on_failure
    paths:
      - report.json

Exit Codes

Code Condition
0 Success (no issues, or issues found without --strict)
1 Issues found with --strict flag
2 Usage error (invalid arguments)
  • learn - Learn schema from data
  • scan - Scan for PII
  • profile - Generate data profile

See Also