truthound learn¶
Learn schema from a data file. This command analyzes your data and generates a YAML schema file with inferred types and constraints.
Synopsis¶
Arguments¶
| Argument | Required | Description |
|---|---|---|
file |
Yes | Path to the data file (CSV, JSON, Parquet, NDJSON, JSONL) |
Options¶
| Option | Short | Default | Description |
|---|---|---|---|
--output |
-o |
schema.yaml |
Output schema file path |
--no-constraints |
false |
Don't infer constraints from data |
Description¶
The learn command performs automatic schema inference by analyzing your data:
- Data Type Detection: Identifies column types (Int64, Float64, String, Date, etc.)
- Constraint Inference: Detects value ranges, allowed values, and patterns
- Nullability Detection: Determines which columns allow null values
- Uniqueness Detection: Identifies potential primary key columns
The generated schema can be used with truthound check to validate new data.
Examples¶
Basic Usage¶
Learn schema with default output:
Output:
Custom Output Path¶
Specify a custom output file:
Without Constraint Inference¶
Learn only data types without inferring min/max, allowed values, etc.:
This is useful when you want to define constraints manually.
From Different File Formats¶
# From CSV
truthound learn users.csv
# From Parquet
truthound learn transactions.parquet
# From JSON
truthound learn events.json
# From NDJSON/JSONL
truthound learn logs.ndjson
Output Format¶
The generated schema is a YAML file:
name: data
version: "1.0"
columns:
- name: id
dtype: Int64
nullable: false
unique: true
- name: email
dtype: String
nullable: true
patterns:
- email
- name: age
dtype: Int64
nullable: true
min_value: 0
max_value: 120
- name: status
dtype: String
nullable: false
allowed_values:
- active
- inactive
- pending
- name: created_at
dtype: Date
nullable: false
Schema Fields¶
| Field | Description |
|---|---|
name |
Column name |
dtype |
Data type (Int64, Float64, String, Date, Datetime, Boolean) |
nullable |
Whether null values are allowed |
unique |
Whether values must be unique |
min_value |
Minimum value (numeric columns) |
max_value |
Maximum value (numeric columns) |
allowed_values |
List of valid values (categorical columns) |
patterns |
Data patterns (email, phone, url, etc.) |
Use Cases¶
1. Schema-Based Validation Pipeline¶
# Step 1: Learn schema from reference data
truthound learn reference_data.csv -o schema.yaml
# Step 2: Validate new data against schema
truthound check new_data.csv --schema schema.yaml --strict
2. CI/CD Integration¶
# .github/workflows/data-quality.yml
jobs:
validate:
steps:
- name: Learn baseline schema
run: truthound learn baseline/data.csv -o schema.yaml
- name: Validate production data
run: truthound check production/data.csv --schema schema.yaml --strict
3. Manual Schema Refinement¶
# Generate base schema without constraints
truthound learn data.csv --no-constraints -o base_schema.yaml
# Edit manually to add business rules
# Then use for validation
truthound check new_data.csv --schema base_schema.yaml