Python API Guides¶
This section provides comprehensive guides for using Truthound through the Python API. Each guide includes practical examples, common patterns, and best practices for production environments.
Looking for CLI documentation? See CLI Reference for command-line usage.
Looking for API Reference? See Python API Reference for function signatures and parameters.
Quick Start¶
import truthound as th
# Read data from various sources
df = th.read("data.csv") # File path
df = th.read({"a": [1, 2, 3], "b": ["x", "y", "z"]}) # Dict data
df = th.read("large_data.parquet", sample_size=10000) # With sampling
# Basic validation
report = th.check("data.csv")
print(f"Found {len(report.issues)} issues")
# With specific validators
report = th.check(df, validators=["null", "duplicate", "range"])
# Schema-based validation
schema = th.learn("baseline.csv")
report = th.check("new_data.csv", schema=schema)
# Database validation
from truthound.datasources import PostgreSQLDataSource
source = PostgreSQLDataSource(table="users", host="localhost", database="mydb")
report = th.check(source=source)
# Data drift detection (14 methods available)
drift = th.compare("baseline.csv", "current.csv", method="auto") # Auto-select
drift = th.compare("baseline.csv", "current.csv", method="ks") # Kolmogorov-Smirnov
drift = th.compare("baseline.csv", "current.csv", method="wasserstein") # Earth Mover's Distance
drift = th.compare("baseline.csv", "current.csv", method="anderson") # Anderson-Darling
drift = th.compare("baseline.csv", "current.csv", method="hellinger") # Hellinger distance
drift = th.compare("baseline.csv", "current.csv", method="mmd") # Maximum Mean Discrepancy
Guide Categories¶
Core Functionality¶
| Guide | Description | Key Topics |
|---|---|---|
| Validators | Data validation patterns | 289 validators, custom validators, error handling |
| Data Sources | Database and file connections | SQL, Cloud DW, Spark, streaming |
| Profiling | Automatic data analysis | Schema inference, rule generation, scheduling |
Output and Reporting¶
| Guide | Description | Key Topics |
|---|---|---|
| Data Docs | HTML report generation | Themes, charts, PDF export, templates |
| Reporters | Output formats | JSON, Console, JUnit, custom reporters |
| Storage | Result persistence | S3, GCS, Azure, versioning, caching |
Operations¶
| Guide | Description | Key Topics |
|---|---|---|
| Configuration | Environment setup | Logging, metrics, encryption, resilience |
| CI/CD | Pipeline integration | Checkpoints, notifications, routing |
| Performance | Optimization | Parallel execution, pushdown, memory |
Enterprise Features¶
| Guide | Description | Key Topics |
|---|---|---|
| Advanced | Enterprise capabilities | ML anomaly, lineage, realtime streaming |
Common Workflows¶
Workflow 1: Basic Data Validation¶
import truthound as th
# 1. Validate data
report = th.check("data.csv")
# 2. Filter critical issues
critical = [i for i in report.issues if i.severity == "critical"]
# 3. Generate report
if critical:
from truthound.datadocs import generate_html_report
html = generate_html_report(report)
Path("report.html").write_text(html)
Workflow 2: Schema-Based Validation¶
import truthound as th
# 1. Learn schema from baseline data
schema = th.learn("baseline.csv")
schema.save("schema.yaml")
# 2. Validate new data against schema
report = th.check("new_data.csv", schema="schema.yaml")
# 3. Check for schema violations
schema_issues = [i for i in report.issues if i.validator == "schema"]
Workflow 3: Database Validation with Pushdown¶
import truthound as th
from truthound.datasources import PostgreSQLDataSource
# 1. Connect to database
source = PostgreSQLDataSource(
table="transactions",
host="db.example.com",
database="analytics",
user="readonly",
)
# 2. Validate with query pushdown (runs on database server)
report = th.check(source=source, pushdown=True)
# 3. Save results
from truthound.stores import S3Store
store = S3Store(bucket="validation-results", prefix="daily/")
store.save(report, key=f"transactions_{date.today()}")
Workflow 4: Profiling and Rule Generation¶
import truthound as th
# 1. Profile data
profile = th.profile("data.csv")
# 2. Generate validation suite from profile
from truthound.profiler import generate_suite
suite = generate_suite(profile)
# 3. Execute suite on new data
report = suite.execute(new_data)
Document Structure¶
Each guide follows a consistent structure:
- Overview - Purpose and scope
- Quick Start - Minimal working example
- Core Concepts - Key classes and patterns
- Practical Examples - Real-world use cases
- Configuration Options - Available settings
- Best Practices - Production recommendations
- Troubleshooting - Common issues and solutions
See Also¶
- Getting Started - Installation and first steps
- Tutorials - Step-by-step learning paths
- Python API Reference - Complete API documentation
- CLI Reference - Command-line interface