Benchmark Methodology¶
Purpose¶
Truthound benchmark methodology exists to answer one question honestly:
Can the public zero-config Truthound path outperform Great Expectations on comparable release-grade workloads while preserving correctness?
Principles¶
- Measure the public API, not internal helpers.
- Use repo-tracked workload manifests and fixtures.
- Separate first-run baseline cost from warm-run steady state.
- Run each framework in its own child process.
- Reject performance wins that do not also preserve correctness.
Measurement Model¶
Each framework/workload observation records:
- framework name and framework version
- workload id and dataset fingerprint
- backend and exactness class
- cold start seconds
- warm median seconds
- peak RSS bytes
- expected issue count
- observed issue count
- artifact paths for the per-workload run
Truthound records cold and warm runs in the same zero-config workspace so that baseline creation cost is visible in cold start and baseline reuse is visible in warm median.
Child Process Isolation¶
Framework timing and memory are collected in child processes rather than the parent runner. This reduces contamination from:
- already-imported modules
- allocator state from previous workloads
- mixed framework caches
- shared process RSS inflation
Runner Policy¶
Truthound uses a hybrid runner policy:
- GitHub-hosted nightly runners provide trend visibility and early warning
- fixed self-hosted runners provide the authoritative release-grade benchmark verification
Nightly artifacts are informative. Fixed-runner release artifacts are authoritative.
For the authoritative release verdict, the runner must record:
- CPU model
- logical core count
- RAM
- OS and Python minor
- storage class
The release workflow reads these from the fixed host plus the following environment contract:
TRUTHOUND_BENCHMARK_RUNNER_CLASS=self-hosted-fixedTRUTHOUND_BENCHMARK_RUNNER_LABELS=self-hosted,benchmark-fixedTRUTHOUND_BENCHMARK_RELEASE_VERDICT=trueTRUTHOUND_BENCHMARK_STORAGE_CLASSTRUTHOUND_BENCHMARK_CPU_MODELTRUTHOUND_BENCHMARK_RAM_BYTESTRUTHOUND_BENCHMARK_CPU_PHYSICAL_CORES
For self-hosted macOS runners, the release workflow avoids actions/setup-python and instead uses uv to install a managed Python 3.11 runtime plus a dedicated .release-venv. This sidesteps hosted-toolcache assumptions that often point at /Users/runner on macOS.
Thresholds¶
The current release-grade thresholds are:
- local exact:
Truthound >= 1.5x Great Expectations - SQL exact:
Truthound >= 1.0x Great Expectations, with1.2xas the target - local memory:
Truthound <= 60%of Great Expectations peak RSS
If correctness parity fails, the performance comparison is treated as failed even when timing looks better.
Commands¶
truthound benchmark parity --suite pr-fast --frameworks truthound --backend local
truthound benchmark parity --suite nightly-core --frameworks both --backend local
truthound benchmark parity --suite nightly-sql --frameworks both --backend sqlite
truthound benchmark parity --suite release-ga --frameworks both --strict
release-ga is intentionally stricter than the nightly suites:
- it must run
--frameworks both - it must not use
--backend - it must execute all eight tier-1 local and SQLite workloads
Artifact Layout¶
The benchmark artifact root is always .truthound/benchmarks/ in the active project context:
results/baselines/artifacts/release/
When a parity run writes an external output file, the canonical artifact set beside that output is:
release-ga.jsonrelease-ga.mdrelease-ga.htmlenv-manifest.jsonlatest-benchmark-summary.md
To publish the docs summary page from an approved release artifact set:
python docs/scripts/publish_benchmark_summary.py \
--json benchmark-artifacts/release-ga.json \
--artifact-base-url .. \
--output docs/releases/latest-benchmark-summary.md