Skip to content

Core Commands

The core commands provide essential data quality operations for validation, profiling, and data protection.

Overview

Command Description Primary Use Case
learn Learn schema from data Schema inference
check Validate data quality Data validation
scan Scan for PII Privacy compliance
mask Mask sensitive data Data anonymization
profile Generate data profile Data exploration
read Read and preview data Data inspection
compare Detect data drift Model monitoring

Typical Workflow

graph LR
    A[Raw Data] --> B[learn]
    B --> C[schema.yaml]
    A --> D[check]
    C --> D
    D --> E{Issues?}
    E -->|Yes| F[Fix Data]
    E -->|No| G[scan]
    G --> H{PII Found?}
    H -->|Yes| I[mask]
    H -->|No| J[Ready]
    I --> J

1. Schema Learning

First, learn a schema from your reference data:

truthound learn reference_data.csv -o schema.yaml

2. Data Validation

Validate new data against the schema:

truthound check new_data.csv --schema schema.yaml --strict

3. PII Detection

Scan for personally identifiable information:

truthound scan customer_data.csv

4. Data Masking

Mask sensitive columns before sharing:

truthound mask customer_data.csv -o safe_data.csv --strategy hash

5. Data Profiling

Generate statistical profile for analysis:

truthound profile data.csv --format json -o profile.json

6. Drift Detection

Compare datasets to detect distribution changes:

truthound compare baseline.csv production.csv --method psi

Common Options

All core commands share these common patterns:

Output Format (-f, --format)

# Console output (default)
truthound check data.csv

# JSON output
truthound check data.csv --format json

# HTML report
truthound check data.csv --format html -o report.html

Output File (-o, --output)

truthound check data.csv -o results.json --format json

Strict Mode (--strict)

Exit with code 1 if issues are found (useful for CI/CD):

truthound check data.csv --strict
truthound compare baseline.csv current.csv --strict

Data Source Options

All core commands accept data source options for reading directly from databases instead of files. When using these options, the file argument becomes optional.

Option Short Description
--connection --conn Database connection string (e.g., postgresql://user:pass@host/db)
--table Database table name
--query SQL query (alternative to --table)
--source-config --sc Path to a data source config file (JSON/YAML)
--source-name Custom label for the data source
# Validate a database table directly
truthound check --connection "postgresql://user:pass@host/db" --table users --strict

# Profile from a source config file
truthound profile --source-config prod_db.yaml

# Read and preview database data
truthound read --connection "sqlite:///data.db" --table orders --head 20

For full details on connection string formats, config files, and security best practices, see the CLI Data Source Guide.

CI/CD Integration

Use core commands in your CI/CD pipeline:

# GitHub Actions example
- name: Validate Data Quality
  run: truthound check data/*.csv --strict

- name: Check for PII
  run: truthound scan data/*.csv --format json -o pii_report.json

Next Steps

  • read - Read and preview data
  • learn - Learn schema from data
  • check - Validate data quality
  • scan - Scan for PII
  • mask - Mask sensitive data
  • profile - Generate data profile
  • compare - Detect data drift