Core Commands¶
The core commands provide essential data quality operations for validation, profiling, and data protection.
Overview¶
| Command | Description | Primary Use Case |
|---|---|---|
learn |
Learn schema from data | Schema inference |
check |
Validate data quality | Data validation |
scan |
Scan for PII | Privacy compliance |
mask |
Mask sensitive data | Data anonymization |
profile |
Generate data profile | Data exploration |
compare |
Detect data drift | Model monitoring |
Typical Workflow¶
graph LR
A[Raw Data] --> B[learn]
B --> C[schema.yaml]
A --> D[check]
C --> D
D --> E{Issues?}
E -->|Yes| F[Fix Data]
E -->|No| G[scan]
G --> H{PII Found?}
H -->|Yes| I[mask]
H -->|No| J[Ready]
I --> J
1. Schema Learning¶
First, learn a schema from your reference data:
2. Data Validation¶
Validate new data against the schema:
3. PII Detection¶
Scan for personally identifiable information:
4. Data Masking¶
Mask sensitive columns before sharing:
5. Data Profiling¶
Generate statistical profile for analysis:
6. Drift Detection¶
Compare datasets to detect distribution changes:
Common Options¶
All core commands share these common patterns:
Output Format (-f, --format)¶
# Console output (default)
truthound check data.csv
# JSON output
truthound check data.csv --format json
# HTML report
truthound check data.csv --format html -o report.html
Output File (-o, --output)¶
Strict Mode (--strict)¶
Exit with code 1 if issues are found (useful for CI/CD):
CI/CD Integration¶
Use core commands in your CI/CD pipeline:
# GitHub Actions example
- name: Validate Data Quality
run: truthound check data/*.csv --strict
- name: Check for PII
run: truthound scan data/*.csv --format json -o pii_report.json