PanderaAdapter¶

The Pandera adapter supports hybrid schema-based and rules-based validation.

Basic Usage¶

from common.engines import PanderaAdapter

engine = PanderaAdapter()

result = engine.check(
    data=df,
    rules=[
        {"type": "not_null", "column": "id"},
        {"type": "dtype", "column": "value", "dtype": "float64"},
        {"type": "in_range", "column": "percentage", "min": 0, "max": 100},
        {"type": "regex", "column": "email", "pattern": r"^[\w\.-]+@[\w\.-]+\.\w+$"},
    ],
)

Rule Type Conversion¶

Common rule types are automatically converted to Pandera Checks:

Common Rule Type	Pandera Check
`not_null`	`nullable=False`
`unique`	`pa.Check.unique()`
`in_set`	`pa.Check.isin(values)`
`in_range`	`pa.Check.in_range(min, max)`
`regex`	`pa.Check.str_matches(pattern)`
`dtype`	`pa.Column(dtype=...)`
`greater_than`	`pa.Check.greater_than(value)`
`less_than`	`pa.Check.less_than(value)`

Pandera-Specific Parameters¶

result = engine.check(
    data=df,
    rules=rules,
    lazy=True,          # Collect all errors (True) vs stop at first error (False)
    fail_on_error=True,
)

Dtype Mapping¶

Common Dtype	Pandera Dtype
`int`, `int32`, `int64`	`pa.Int`, `pa.Int32`, `pa.Int64`
`float`, `float32`, `float64`	`pa.Float`, `pa.Float32`, `pa.Float64`
`str`, `string`	`pa.String`
`bool`, `boolean`	`pa.Bool`
`datetime`	`pa.DateTime`

Profiling¶

profile = engine.profile(df)

for col in profile.columns:
    print(f"{col.column_name}: {col.dtype}")
    print(f"  Null: {col.null_percentage}%")
    print(f"  Unique: {col.unique_count}")

Schema Learning¶

learn_result = engine.learn(df)

for rule in learn_result.rules:
    print(f"{rule.column}: {rule.rule_type}")

Lifecycle Management¶

with PanderaAdapter() as engine:
    result = engine.check(df, rules)
    health = engine.health_check()

Configuration¶

from common.engines import PanderaConfig

config = PanderaConfig(
    lazy=True,                  # Collect all errors
    strict=False,               # Strict mode
    coerce=False,               # Type coercion
    unique_column_names=False,  # Column name uniqueness check
    report_duplicates="all",    # Duplicate reporting mode
)

engine = PanderaAdapter(config=config)

Supported Data Types¶

Data Type	Support
Pandas DataFrame	Native
Polars DataFrame	Auto-conversion