Benchmark Methodology¶

Purpose¶

Truthound benchmark methodology exists to answer one question honestly:

Can the public zero-config Truthound path outperform Great Expectations on comparable release-grade workloads while preserving correctness?

Principles¶

Measure the public API, not internal helpers.
Use repo-tracked workload manifests and fixtures.
Separate first-run baseline cost from warm-run steady state.
Run each framework in its own child process.
Reject performance wins that do not also preserve correctness.

Measurement Model¶

Each framework/workload observation records:

framework name and framework version
workload id and dataset fingerprint
backend and exactness class
cold start seconds
warm median seconds
peak RSS bytes
expected issue count
observed issue count
artifact paths for the per-workload run

Truthound records cold and warm runs in the same zero-config workspace so that baseline creation cost is visible in cold start and baseline reuse is visible in warm median.

Child Process Isolation¶

Framework timing and memory are collected in child processes rather than the parent runner. This reduces contamination from:

already-imported modules
allocator state from previous workloads
mixed framework caches
shared process RSS inflation

Runner Policy¶

Truthound uses a hybrid runner policy:

GitHub-hosted nightly runners provide trend visibility and early warning
fixed self-hosted runners provide the authoritative release-grade benchmark verification

Nightly artifacts are informative. Fixed-runner release artifacts are authoritative.

For the authoritative release verdict, the runner must record:

CPU model
logical core count
RAM
OS and Python minor
storage class

The release workflow reads these from the fixed host plus the following environment contract:

TRUTHOUND_BENCHMARK_RUNNER_CLASS=self-hosted-fixed
TRUTHOUND_BENCHMARK_RUNNER_LABELS=self-hosted,benchmark-fixed
TRUTHOUND_BENCHMARK_RELEASE_VERDICT=true
TRUTHOUND_BENCHMARK_STORAGE_CLASS
TRUTHOUND_BENCHMARK_CPU_MODEL
TRUTHOUND_BENCHMARK_RAM_BYTES
TRUTHOUND_BENCHMARK_CPU_PHYSICAL_CORES

For self-hosted macOS runners, the release workflow avoids actions/setup-python and instead uses uv to install a managed Python 3.11 runtime plus a dedicated .release-venv. This sidesteps hosted-toolcache assumptions that often point at /Users/runner on macOS.

Thresholds¶

The current release-grade thresholds are:

local exact: Truthound >= 1.5x Great Expectations
SQL exact: Truthound >= 1.0x Great Expectations, with 1.2x as the target
local memory: Truthound <= 60% of Great Expectations peak RSS

If correctness parity fails, the performance comparison is treated as failed even when timing looks better.

Commands¶

truthound benchmark parity --suite pr-fast --frameworks truthound --backend local
truthound benchmark parity --suite nightly-core --frameworks both --backend local
truthound benchmark parity --suite nightly-sql --frameworks both --backend sqlite
truthound benchmark parity --suite release-ga --frameworks both --strict

release-ga is intentionally stricter than the nightly suites:

it must run --frameworks both
it must not use --backend
it must execute all eight tier-1 local and SQLite workloads

Artifact Layout¶

The benchmark artifact root is always .truthound/benchmarks/ in the active project context:

results/
baselines/
artifacts/
release/

When a parity run writes an external output file, the canonical artifact set beside that output is:

release-ga.json
release-ga.md
release-ga.html
env-manifest.json
latest-benchmark-summary.md

To publish the docs summary page from an approved release artifact set:

python docs/scripts/publish_benchmark_summary.py \
  --json benchmark-artifacts/release-ga.json \
  --artifact-base-url .. \
  --output docs/releases/latest-benchmark-summary.md