Skip to content

Core Commands

The core commands provide essential data quality operations for validation, profiling, and data protection.

Overview

Command Description Primary Use Case
learn Learn schema from data Schema inference
check Validate data quality Data validation
scan Scan for PII Privacy compliance
mask Mask sensitive data Data anonymization
profile Generate data profile Data exploration
compare Detect data drift Model monitoring

Typical Workflow

graph LR
    A[Raw Data] --> B[learn]
    B --> C[schema.yaml]
    A --> D[check]
    C --> D
    D --> E{Issues?}
    E -->|Yes| F[Fix Data]
    E -->|No| G[scan]
    G --> H{PII Found?}
    H -->|Yes| I[mask]
    H -->|No| J[Ready]
    I --> J

1. Schema Learning

First, learn a schema from your reference data:

truthound learn reference_data.csv -o schema.yaml

2. Data Validation

Validate new data against the schema:

truthound check new_data.csv --schema schema.yaml --strict

3. PII Detection

Scan for personally identifiable information:

truthound scan customer_data.csv

4. Data Masking

Mask sensitive columns before sharing:

truthound mask customer_data.csv -o safe_data.csv --strategy hash

5. Data Profiling

Generate statistical profile for analysis:

truthound profile data.csv --format json -o profile.json

6. Drift Detection

Compare datasets to detect distribution changes:

truthound compare baseline.csv production.csv --method psi

Common Options

All core commands share these common patterns:

Output Format (-f, --format)

# Console output (default)
truthound check data.csv

# JSON output
truthound check data.csv --format json

# HTML report
truthound check data.csv --format html -o report.html

Output File (-o, --output)

truthound check data.csv -o results.json --format json

Strict Mode (--strict)

Exit with code 1 if issues are found (useful for CI/CD):

truthound check data.csv --strict
truthound compare baseline.csv current.csv --strict

CI/CD Integration

Use core commands in your CI/CD pipeline:

# GitHub Actions example
- name: Validate Data Quality
  run: truthound check data/*.csv --strict

- name: Check for PII
  run: truthound scan data/*.csv --format json -o pii_report.json

Next Steps

  • learn - Learn schema from data
  • check - Validate data quality
  • scan - Scan for PII
  • mask - Mask sensitive data
  • profile - Generate data profile
  • compare - Detect data drift