CI Regression Testing
Save a baseline
After merging to main, save benchmark results:
cargo bench -- --save-baseline=main
This writes .zenbench/baselines/main.json — a complete snapshot of all benchmark results.
Check for regressions on PRs
cargo bench -- --baseline=main
Exit codes:
- 0 — pass, no regressions exceed threshold
- 1 — fail, regressions detected
- 2 — error (baseline not found, etc.)
Output:
Baseline comparison
───────────────────
⚠ git hash differs: baseline=abc12345 current=def67890
compress::level_1 16.2µs → 16.4µs +1.2% unchanged
compress::level_6 15.1µs → 15.3µs +1.3% unchanged
compress::mixed 401.0µs → 425.3µs +6.1% ▲ REGRESSION
Summary: 1 regressions, 0 improvements, 2 unchanged
[zenbench] FAIL: 1 regression(s) exceed 5% threshold
Configure the threshold
# Fail if any benchmark regresses more than 3%
cargo bench -- --baseline=main --max-regression=3
# Auto-update baseline when the run passes
cargo bench -- --baseline=main --update-on-pass
GitHub Actions workflow
name: Benchmarks
on: [pull_request]
jobs:
bench:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Restore baseline
uses: actions/cache@v4
with:
path: .zenbench/baselines
key: bench-baselines-${{ github.base_ref }}
- name: Check for regressions
run: cargo bench -- --baseline=main --max-regression=5
- name: Update baseline (main only)
if: github.ref == 'refs/heads/main'
run: cargo bench -- --save-baseline=main
How comparison works
When comparing against a baseline, zenbench:
- Runs the full benchmark suite (interleaved, converging)
- Loads the baseline JSON
- Matches benchmarks by
group::name - Computes percentage change for each
- Applies both a percentage threshold AND a statistical t-test
- Only flags a regression if it exceeds the threshold AND is statistically significant
The t-test prevents false positives from noisy CI runners — if the mean shifted but the variance is high, the test won't flag it.
Hardware fingerprinting
Baselines include a hardware fingerprint (CPU model, arch, OS, core count). When comparing, zenbench warns if the hardware changed:
⚠ CPU changed: baseline='AMD EPYC 7763' current='Intel Xeon E5-2686'
Comparing against git tags
# Save baseline at release time
cargo bench -- --save-baseline=v0.3.0
# Later, check for regressions vs release
cargo bench -- --baseline=v0.3.0
# Or use worktree-based comparison (builds both versions, interleaves)
zenbench self-compare --bench my_bench --ref v0.3.0
Found an error or it needs a clarification?
Open an issue on GitHub.
Substantiated corrections will be incorporated with attribution.
Found a typo?
Fork, modify and open a PR.