Profiler Commands¶
Advanced data profiling and validation rule generation commands.
Overview¶
| Command | Description | Primary Use Case |
|---|---|---|
auto-profile |
Advanced data profiling | Deep data analysis |
generate-suite |
Generate validation rules from profile | Rule automation |
quick-suite |
Profile and generate rules in one step | Quick setup |
list-formats |
List supported output formats | Reference |
list-presets |
List available presets | Reference |
list-categories |
List rule categories | Reference |
Workflow¶
graph LR
A[Data File] --> B[auto-profile]
B --> C[profile.json]
C --> D[generate-suite]
D --> E[validation_suite.yaml]
E --> F[truthound check]
A --> G[quick-suite]
G --> E
Two-Step Workflow¶
- Profile first, then generate rules:
# Step 1: Generate detailed profile
truthound auto-profile data.csv -o profile.json --format json
# Step 2: Generate validation suite from profile
truthound generate-suite profile.json -o suite.yaml --strictness medium
One-Step Workflow¶
- Profile and generate in one command:
Available Presets¶
Presets provide pre-configured settings for common use cases:
| Preset | Description |
|---|---|
default |
Balanced settings (medium strictness, all categories) |
strict |
Strict validation rules with high confidence |
loose |
Relaxed validation for flexible data |
minimal |
Only high-confidence schema rules |
comprehensive |
All generators with detailed output |
schema_only |
Schema and completeness rules only |
format_only |
Format and pattern rules only |
ci_cd |
Optimized for CI/CD pipelines (checkpoint format) |
development |
Development-friendly (Python code output) |
production |
Production-ready (strict, high confidence) |
Rule Categories¶
Categories determine which types of validation rules are generated:
| Category | Description |
|---|---|
completeness |
Null checks, missing data validation |
uniqueness |
Duplicate detection, primary key validation |
format |
Pattern matching, format validation |
range |
Numeric range validation |
consistency |
Cross-column consistency checks |
schema |
Data type and structure validation |
Output Formats¶
Profile Formats (auto-profile)¶
| Format | Extension | Description |
|---|---|---|
console |
- | Human-readable terminal output |
json |
.json |
Machine-readable JSON |
yaml |
.yaml |
YAML format |
Suite Formats (generate-suite, quick-suite)¶
| Format | Extension | Description |
|---|---|---|
yaml |
.yaml |
YAML configuration |
json |
.json |
JSON configuration |
python |
.py |
Python code |
toml |
.toml |
TOML configuration |
checkpoint |
.yaml |
Checkpoint-compatible YAML |
Use Cases¶
1. Data Discovery¶
Profile unknown data to understand its structure:
2. Automated Rule Generation¶
Generate validation rules from existing data:
3. CI/CD Pipeline Setup¶
Create checkpoint-compatible rules for CI/CD:
4. Development Environment¶
Generate Python validation code:
Next Steps¶
- auto-profile - Advanced data profiling
- generate-suite - Generate validation rules
- quick-suite - Quick profile + rules generation