truthound quick-suite¶
Profile data and generate a validation suite in one step. This is a convenience command that combines auto-profile and generate-suite.
Synopsis¶
Arguments¶
| Argument | Required | Description |
|---|---|---|
file |
Yes | Path to the data file (CSV, JSON, Parquet, NDJSON, JSONL) |
Options¶
| Option | Short | Default | Description |
|---|---|---|---|
--output |
-o |
None | Output file path |
--format |
-f |
yaml |
Output format (yaml, json, python, toml, checkpoint) |
--strictness |
-s |
medium |
Rule strictness (loose, medium, strict) |
--include |
-i |
All | Rule categories to include. Categories: schema, completeness, uniqueness, format, distribution, pattern, temporal, relationship, anomaly |
--exclude |
-e |
None | Rule categories to exclude. Categories: schema, completeness, uniqueness, format, distribution, pattern, temporal, relationship, anomaly |
--min-confidence |
None | Minimum confidence level (low, medium, high) | |
--name |
-n |
Auto | Suite name |
--preset |
-p |
None | Use preset configuration |
--sample-size |
None | Sample size for large datasets |
Description¶
The quick-suite command is a one-step workflow that:
- Profiles the data file (like
auto-profile) - Generates validation rules (like
generate-suite) - Outputs a ready-to-use validation suite
This is equivalent to running:
truthound auto-profile data.csv -o /tmp/profile.json --format json
truthound generate-suite /tmp/profile.json -o suite.yaml
Examples¶
Basic Usage¶
Output:
Quick Suite Generation
======================
Profiling: data.csv
Rows: 10,000
Columns: 8
Patterns detected: 3
Generating suite...
Rules generated: 15
Categories: completeness, uniqueness, range, format
Suite saved to: suite.yaml
With Preset¶
# Production-ready suite
truthound quick-suite data.csv -o suite.yaml --preset production
# CI/CD optimized
truthound quick-suite data.csv -o checkpoint.yaml --preset ci_cd
# Minimal rules
truthound quick-suite data.csv -o suite.yaml --preset minimal
Strictness Levels¶
# Loose: Relaxed thresholds
truthound quick-suite data.csv -o suite.yaml --strictness loose
# Medium (default): Balanced
truthound quick-suite data.csv -o suite.yaml --strictness medium
# Strict: Tight thresholds
truthound quick-suite data.csv -o suite.yaml --strictness strict
Output Formats¶
# YAML (default)
truthound quick-suite data.csv -o suite.yaml --format yaml
# JSON
truthound quick-suite data.csv -o suite.json --format json
# Python code
truthound quick-suite data.csv -o validators.py --format python
# TOML
truthound quick-suite data.csv -o suite.toml --format toml
# Checkpoint format
truthound quick-suite data.csv -o checkpoint.yaml --format checkpoint
Category Filtering¶
# Only schema and completeness rules
truthound quick-suite data.csv -o suite.yaml --include schema,completeness
# Exclude format rules
truthound quick-suite data.csv -o suite.yaml --exclude format
Confidence Filtering¶
Large Dataset Sampling¶
# Sample 100,000 rows for profiling
truthound quick-suite large_data.parquet -o suite.yaml --sample-size 100000
Custom Suite Name¶
Output Examples¶
YAML Output¶
name: data_validation_suite
version: "1.0"
generated_at: "2024-01-15T10:30:00Z"
source_file: data.csv
profile_summary:
rows: 10000
columns: 8
validators:
- type: not_null
columns: [id, created_at]
severity: high
confidence: 0.99
- type: unique
columns: [id]
severity: critical
confidence: 1.0
- type: range
column: age
min_value: 18
max_value: 85
severity: medium
confidence: 0.95
- type: pattern
column: email
pattern: email
severity: high
confidence: 0.98
- type: allowed_values
column: status
values: [active, inactive, pending]
severity: medium
confidence: 1.0
JSON Output¶
{
"name": "data_validation_suite",
"version": "1.0",
"generated_at": "2024-01-15T10:30:00Z",
"source_file": "data.csv",
"validators": [
{
"type": "not_null",
"columns": ["id", "created_at"],
"severity": "high",
"confidence": 0.99
},
{
"type": "unique",
"columns": ["id"],
"severity": "critical",
"confidence": 1.0
}
]
}
Python Output¶
"""
Validation suite generated by Truthound
Source: data.csv
Generated: 2024-01-15T10:30:00Z
"""
import truthound as th
# Validation rules
validators = [
th.validators.NotNullValidator(columns=["id", "created_at"]),
th.validators.UniqueValidator(columns=["id"]),
th.validators.RangeValidator(column="age", min_value=18, max_value=85),
th.validators.PatternValidator(column="email", pattern="email"),
th.validators.AllowedValuesValidator(
column="status",
values=["active", "inactive", "pending"]
),
]
def validate(data_path: str) -> th.ValidationReport:
"""Validate data file."""
return th.check(data_path, validators=validators)
if __name__ == "__main__":
import sys
report = validate(sys.argv[1] if len(sys.argv) > 1 else "data.csv")
print(report)
Available Presets¶
| Preset | Strictness | Confidence | Format | Description |
|---|---|---|---|---|
default |
medium | medium | yaml | Balanced settings |
strict |
strict | high | yaml | Tight validation |
loose |
loose | low | yaml | Relaxed validation |
minimal |
medium | high | yaml | Essential rules only |
comprehensive |
strict | low | yaml | All possible rules |
schema_only |
medium | high | yaml | Schema rules only |
format_only |
medium | medium | yaml | Format rules only |
ci_cd |
medium | medium | checkpoint | CI/CD optimized |
development |
loose | medium | python | Dev-friendly code |
production |
strict | high | yaml | Production-ready |
Use Cases¶
1. Quick Pipeline Setup¶
# Generate validation rules for new data pipeline
truthound quick-suite incoming_data.csv -o pipeline_rules.yaml --preset ci_cd
2. Data Quality Baseline¶
# Create baseline rules from reference data
truthound quick-suite reference_data.csv -o baseline.yaml --strictness strict
3. Development Testing¶
# Generate test validators
truthound quick-suite test_data.csv -o test_validators.py --preset development
4. Production Deployment¶
# Production-ready validation suite
truthound quick-suite prod_data.parquet -o prod_suite.yaml --preset production
5. CI/CD Integration¶
# GitHub Actions
- name: Generate Validation Suite
run: truthound quick-suite data.csv -o validation.yaml --preset ci_cd
- name: Validate Data
run: truthound check data.csv --schema validation.yaml --strict
Comparison: quick-suite vs auto-profile + generate-suite¶
| Aspect | quick-suite | auto-profile + generate-suite |
|---|---|---|
| Commands | 1 | 2 |
| Intermediate files | No | Yes (profile.json) |
| Profile customization | Limited | Full |
| Suite customization | Full | Full |
| Best for | Quick setup | Complex workflows |
Related Commands¶
auto-profile- Advanced profiling onlygenerate-suite- Generate rules from profilecheck- Run validation with suiteprofile- Basic profiling