Core Commands¶
The core commands provide essential data quality operations for validation, profiling, and data protection.
Overview¶
| Command | Description | Primary Use Case |
|---|---|---|
learn |
Learn schema from data | Schema inference |
check |
Validate data quality | Data validation |
scan |
Scan for PII | Privacy compliance |
mask |
Mask sensitive data | Data anonymization |
profile |
Generate data profile | Data exploration |
read |
Read and preview data | Data inspection |
compare |
Detect data drift | Model monitoring |
Typical Workflow¶
graph LR
A[Raw Data] --> B[learn]
B --> C[schema.yaml]
A --> D[check]
C --> D
D --> E{Issues?}
E -->|Yes| F[Fix Data]
E -->|No| G[scan]
G --> H{PII Found?}
H -->|Yes| I[mask]
H -->|No| J[Ready]
I --> J
1. Schema Learning¶
First, learn a schema from your reference data:
2. Data Validation¶
Validate new data against the schema:
3. PII Detection¶
Scan for personally identifiable information:
4. Data Masking¶
Mask sensitive columns before sharing:
5. Data Profiling¶
Generate statistical profile for analysis:
6. Drift Detection¶
Compare datasets to detect distribution changes:
Common Options¶
All core commands share these common patterns:
Output Format (-f, --format)¶
# Console output (default)
truthound check data.csv
# JSON output
truthound check data.csv --format json
# HTML report
truthound check data.csv --format html -o report.html
Output File (-o, --output)¶
Strict Mode (--strict)¶
Exit with code 1 if issues are found (useful for CI/CD):
Data Source Options¶
All core commands accept data source options for reading directly from databases instead of files. When using these options, the file argument becomes optional.
| Option | Short | Description |
|---|---|---|
--connection |
--conn |
Database connection string (e.g., postgresql://user:pass@host/db) |
--table |
Database table name | |
--query |
SQL query (alternative to --table) |
|
--source-config |
--sc |
Path to a data source config file (JSON/YAML) |
--source-name |
Custom label for the data source |
# Validate a database table directly
truthound check --connection "postgresql://user:pass@host/db" --table users --strict
# Profile from a source config file
truthound profile --source-config prod_db.yaml
# Read and preview database data
truthound read --connection "sqlite:///data.db" --table orders --head 20
For full details on connection string formats, config files, and security best practices, see the CLI Data Source Guide.
CI/CD Integration¶
Use core commands in your CI/CD pipeline:
# GitHub Actions example
- name: Validate Data Quality
run: truthound check data/*.csv --strict
- name: Check for PII
run: truthound scan data/*.csv --format json -o pii_report.json