Skip to content

PanderaAdapter

The Pandera adapter supports hybrid schema-based and rules-based validation.

Basic Usage

from common.engines import PanderaAdapter

engine = PanderaAdapter()

result = engine.check(
    data=df,
    rules=[
        {"type": "not_null", "column": "id"},
        {"type": "dtype", "column": "value", "dtype": "float64"},
        {"type": "in_range", "column": "percentage", "min": 0, "max": 100},
        {"type": "regex", "column": "email", "pattern": r"^[\w\.-]+@[\w\.-]+\.\w+$"},
    ],
)

Rule Type Conversion

Common rule types are automatically converted to Pandera Checks:

Common Rule Type Pandera Check
not_null nullable=False
unique pa.Check.unique()
in_set pa.Check.isin(values)
in_range pa.Check.in_range(min, max)
regex pa.Check.str_matches(pattern)
dtype pa.Column(dtype=...)
greater_than pa.Check.greater_than(value)
less_than pa.Check.less_than(value)

Pandera-Specific Parameters

result = engine.check(
    data=df,
    rules=rules,
    lazy=True,          # Collect all errors (True) vs stop at first error (False)
    fail_on_error=True,
)

Dtype Mapping

Common Dtype Pandera Dtype
int, int32, int64 pa.Int, pa.Int32, pa.Int64
float, float32, float64 pa.Float, pa.Float32, pa.Float64
str, string pa.String
bool, boolean pa.Bool
datetime pa.DateTime

Profiling

profile = engine.profile(df)

for col in profile.columns:
    print(f"{col.column_name}: {col.dtype}")
    print(f"  Null: {col.null_percentage}%")
    print(f"  Unique: {col.unique_count}")

Schema Learning

learn_result = engine.learn(df)

for rule in learn_result.rules:
    print(f"{rule.column}: {rule.rule_type}")

Lifecycle Management

with PanderaAdapter() as engine:
    result = engine.check(df, rules)
    health = engine.health_check()

Configuration

from common.engines import PanderaConfig

config = PanderaConfig(
    lazy=True,                  # Collect all errors
    strict=False,               # Strict mode
    coerce=False,               # Type coercion
    unique_column_names=False,  # Column name uniqueness check
    report_duplicates="all",    # Duplicate reporting mode
)

engine = PanderaAdapter(config=config)

Supported Data Types

Data Type Support
Pandas DataFrame Native
Polars DataFrame Auto-conversion