Skip to content

truthound quick-suite

Profile data and generate a validation suite in one step. This is a convenience command that combines auto-profile and generate-suite.

Synopsis

truthound quick-suite <file> [OPTIONS]

Arguments

Argument Required Description
file Yes Path to the data file (CSV, JSON, Parquet, NDJSON, JSONL)

Options

Option Short Default Description
--output -o None Output file path
--format -f yaml Output format (yaml, json, python, toml, checkpoint)
--strictness -s medium Rule strictness (loose, medium, strict)
--include -i All Rule categories to include. Categories: schema, completeness, uniqueness, format, distribution, pattern, temporal, relationship, anomaly
--exclude -e None Rule categories to exclude. Categories: schema, completeness, uniqueness, format, distribution, pattern, temporal, relationship, anomaly
--min-confidence None Minimum confidence level (low, medium, high)
--name -n Auto Suite name
--preset -p None Use preset configuration
--sample-size None Sample size for large datasets

Description

The quick-suite command is a one-step workflow that:

  1. Profiles the data file (like auto-profile)
  2. Generates validation rules (like generate-suite)
  3. Outputs a ready-to-use validation suite

This is equivalent to running:

truthound auto-profile data.csv -o /tmp/profile.json --format json
truthound generate-suite /tmp/profile.json -o suite.yaml

Examples

Basic Usage

truthound quick-suite data.csv -o suite.yaml

Output:

Quick Suite Generation
======================
Profiling: data.csv
  Rows: 10,000
  Columns: 8
  Patterns detected: 3

Generating suite...
  Rules generated: 15
  Categories: completeness, uniqueness, range, format

Suite saved to: suite.yaml

With Preset

# Production-ready suite
truthound quick-suite data.csv -o suite.yaml --preset production

# CI/CD optimized
truthound quick-suite data.csv -o checkpoint.yaml --preset ci_cd

# Minimal rules
truthound quick-suite data.csv -o suite.yaml --preset minimal

Strictness Levels

# Loose: Relaxed thresholds
truthound quick-suite data.csv -o suite.yaml --strictness loose

# Medium (default): Balanced
truthound quick-suite data.csv -o suite.yaml --strictness medium

# Strict: Tight thresholds
truthound quick-suite data.csv -o suite.yaml --strictness strict

Output Formats

# YAML (default)
truthound quick-suite data.csv -o suite.yaml --format yaml

# JSON
truthound quick-suite data.csv -o suite.json --format json

# Python code
truthound quick-suite data.csv -o validators.py --format python

# TOML
truthound quick-suite data.csv -o suite.toml --format toml

# Checkpoint format
truthound quick-suite data.csv -o checkpoint.yaml --format checkpoint

Category Filtering

# Only schema and completeness rules
truthound quick-suite data.csv -o suite.yaml --include schema,completeness

# Exclude format rules
truthound quick-suite data.csv -o suite.yaml --exclude format

Confidence Filtering

# Only high-confidence rules
truthound quick-suite data.csv -o suite.yaml --min-confidence high

Large Dataset Sampling

# Sample 100,000 rows for profiling
truthound quick-suite large_data.parquet -o suite.yaml --sample-size 100000

Custom Suite Name

truthound quick-suite customers.csv -o suite.yaml --name "Customer Validation Suite"

Output Examples

YAML Output

name: data_validation_suite
version: "1.0"
generated_at: "2024-01-15T10:30:00Z"
source_file: data.csv
profile_summary:
  rows: 10000
  columns: 8

validators:
  - type: not_null
    columns: [id, created_at]
    severity: high
    confidence: 0.99

  - type: unique
    columns: [id]
    severity: critical
    confidence: 1.0

  - type: range
    column: age
    min_value: 18
    max_value: 85
    severity: medium
    confidence: 0.95

  - type: pattern
    column: email
    pattern: email
    severity: high
    confidence: 0.98

  - type: allowed_values
    column: status
    values: [active, inactive, pending]
    severity: medium
    confidence: 1.0

JSON Output

{
  "name": "data_validation_suite",
  "version": "1.0",
  "generated_at": "2024-01-15T10:30:00Z",
  "source_file": "data.csv",
  "validators": [
    {
      "type": "not_null",
      "columns": ["id", "created_at"],
      "severity": "high",
      "confidence": 0.99
    },
    {
      "type": "unique",
      "columns": ["id"],
      "severity": "critical",
      "confidence": 1.0
    }
  ]
}

Python Output

"""
Validation suite generated by Truthound
Source: data.csv
Generated: 2024-01-15T10:30:00Z
"""
import truthound as th

# Validation rules
validators = [
    th.validators.NotNullValidator(columns=["id", "created_at"]),
    th.validators.UniqueValidator(columns=["id"]),
    th.validators.RangeValidator(column="age", min_value=18, max_value=85),
    th.validators.PatternValidator(column="email", pattern="email"),
    th.validators.AllowedValuesValidator(
        column="status",
        values=["active", "inactive", "pending"]
    ),
]

def validate(data_path: str) -> th.ValidationReport:
    """Validate data file."""
    return th.check(data_path, validators=validators)

if __name__ == "__main__":
    import sys
    report = validate(sys.argv[1] if len(sys.argv) > 1 else "data.csv")
    print(report)

Available Presets

Preset Strictness Confidence Format Description
default medium medium yaml Balanced settings
strict strict high yaml Tight validation
loose loose low yaml Relaxed validation
minimal medium high yaml Essential rules only
comprehensive strict low yaml All possible rules
schema_only medium high yaml Schema rules only
format_only medium medium yaml Format rules only
ci_cd medium medium checkpoint CI/CD optimized
development loose medium python Dev-friendly code
production strict high yaml Production-ready

Use Cases

1. Quick Pipeline Setup

# Generate validation rules for new data pipeline
truthound quick-suite incoming_data.csv -o pipeline_rules.yaml --preset ci_cd

2. Data Quality Baseline

# Create baseline rules from reference data
truthound quick-suite reference_data.csv -o baseline.yaml --strictness strict

3. Development Testing

# Generate test validators
truthound quick-suite test_data.csv -o test_validators.py --preset development

4. Production Deployment

# Production-ready validation suite
truthound quick-suite prod_data.parquet -o prod_suite.yaml --preset production

5. CI/CD Integration

# GitHub Actions
- name: Generate Validation Suite
  run: truthound quick-suite data.csv -o validation.yaml --preset ci_cd

- name: Validate Data
  run: truthound check data.csv --schema validation.yaml --strict

Comparison: quick-suite vs auto-profile + generate-suite

Aspect quick-suite auto-profile + generate-suite
Commands 1 2
Intermediate files No Yes (profile.json)
Profile customization Limited Full
Suite customization Full Full
Best for Quick setup Complex workflows

See Also