Upstream Source
This page is part of Truthound Orchestration 3.x.
Source repository: seadonggyun4/truthound-orchestration
Upstream docs path: docs/engines/batch.md
Edit upstream page: Edit in orchestration
Batch Processing¶
Large validations often need chunking, aggregation, and worker control instead of a single eager call. The engine layer provides batch executors for that use case.
Main Components¶
BatchExecutorAsyncBatchExecutorBatchConfig- chunkers such as
RowCountChunker,PolarsChunker, andDatasetListChunker - hooks such as
LoggingBatchHookandMetricsBatchHook
Basic Usage¶
from common.engines import BatchConfig, BatchExecutor, TruthoundEngine
engine = TruthoundEngine()
executor = BatchExecutor(engine, BatchConfig(batch_size=10000, max_workers=4))
result = executor.check_batch(large_dataframe, auto_schema=True)
When To Use Batch Execution¶
- datasets are too large for one simple in-memory pass
- you need explicit worker and chunk-size control
- operators want aggregated results across many chunks
Operational Choices¶
Choose:
- sequential execution for simpler failure analysis or tight resource limits
- parallel execution for large workloads on capable runners
- fail-fast behavior when the first serious violation should stop the job