Truthound¶
Zero-Configuration Data Quality Framework Powered by Polars
Enterprise-grade data validation with zero setup
What is Truthound?¶
Truthound is a high-performance data quality framework built on Polars. It provides comprehensive data validation, profiling, and monitoring capabilities with a focus on ease of use and performance.
Key Features¶
- Zero Configuration: Start validating data immediately without complex setup
- 289 Validators: Coverage across 28 categories for data quality needs
- High Performance: Polars-native implementation for efficient validation
- Schema Inference: Automatically learn schemas from your data
- PII Detection: Built-in scanning for personally identifiable information
- CI/CD Integration: Seamlessly integrate with your deployment pipeline
- Extensible: Create custom validators with the SDK
Quick Start¶
# Install
pip install truthound
# Learn schema from data
truthound learn data.csv
# Validate data
truthound check data.csv
# Scan for PII
truthound scan data.csv
Python API¶
import truthound as th
# Check data quality
report = th.check("data.csv")
print(report)
# Profile data
profile = th.profile("data.csv")
print(profile)
# Learn schema
schema = th.learn("data.csv")
schema.save("schema.yaml")
Documentation Sections¶
-
Getting Started
Installation, quick start guide, and your first validation
-
User Guide
Comprehensive guide to CLI commands, validators, and configuration
-
API Reference
Complete API documentation with examples
-
Tutorials
Step-by-step tutorials for common use cases
Why Truthound?¶
Performance¶
Built on Polars, Truthound leverages:
- Lazy evaluation for query optimization
- Columnar memory layout for cache efficiency
- SIMD vectorized operations
- Multi-threaded execution
Actual performance depends on hardware, data characteristics, and Polars version. Run your own benchmarks for accurate measurements.
Comprehensive Validation¶
Validators across multiple categories including:
- Schema validation
- Format validation (email, phone, URL, etc.)
- Statistical validation
- PII detection
- Data drift detection
- Anomaly detection
Enterprise Ready¶
- Security: ReDoS protection, SQL injection prevention
- i18n: English and Korean language support
- Storage: S3, GCS, Azure Blob, Database backends
- CI/CD: 12 platform integrations
- Notifications: 9 providers (Slack, Teams, PagerDuty, etc.)
License¶
Truthound is open source under the Apache License 2.0.