→ Full methodology · → Interactive corpus comparison (new — pick any corpus / X / Y → scatter + step-5 + per-band SROCC)
Interactive per-band SROCC graphs for our V_X champion bakes versus the SSIMULACRA 2 reference and the CID22 paper's published numbers. Shipping bar: match-or-exceed fast-ssim2 across all four quality bands (B0–B3 per paper Table 5).
67482691) was trained on a clean CSV
whose perceptual-overlap cleanup ran at a looser threshold than
d≤16. 11,629 rows (7.43 %) were near-duplicates of
22 of the 49 CID22 held-out references. V0_8's CID22 0.8948 was
inflated by +0.0034 (confirmed by V0_15's 0.8914
honest result). V0_8 is preserved at
zensim/weights/archive/v0_8_tainted_2026-05-11.bin for
reference but is no longer the runtime weight.
Independent sanity check that our fast-ssim2 +
butter implementations reproduce the CID22 paper's
headline KonJND-1k numbers (paper p. 20 Table 4). Our
measurements match to 3–4 significant figures — pipeline
validated.
Paper-published SROCC values vs our reproduction. SSIMULACRA 2's KRCC 0.7033 / SRCC 0.88541 / PCC 0.87448 / MAE 4.97 is the 49-ref held-out target. We use the 49-ref held-out subset only.
Each point is one CID22 (reference, distorted) pair, plotted by V_X's quality score (−distance, y-axis) against a reference metric (x-axis). Color = MCOS band (red=B0 below-medium, orange=B1 medium, green=B2 high, blue=B3 visually-lossless). A perfect rank match shows monotonic clouds. dssim is not yet plotted — only ssim2 + butter are in the current eval pipeline (dssim integration is queued).
Within-bin SROCC at step-5 MCOS granularity (vs the 4 broader bands
above). Each point is the rank correlation of a metric's score vs
human MOS for pairs whose MCOS falls in that 5-point window. Tail
bins have small n — see annotations.
The V0_16 ship's smoothness gate (≤ 4.86 % non-monotonic q-step rate) holds across all major codec families. AVIF and JXL produce perfectly smooth quality curves; WebP and JPEG are well below target.
| Codec | Non-mono q-step rate | Notes |
|---|---|---|
| AVIF | 0.00 % | Perfect smoothness |
| JPEG XL | 0.00 % | Perfect smoothness |
| WebP | 0.50 % | Near-perfect |
| JPEG | 2.30 % | Within 4.86 % target; coarsest q steps |
| PNG | n/a | Lossless — no quality curves |
Both axes are SROCC vs human MOS; higher = better. Dashed lines mark fast-ssim2 levels. Target = upper-right quadrant (beats ssim2 on both). Orange = ensemble subsets (recipe-diversity). Green = current ship V0_16. Purple = V0_21 butter-clean single bake.
Each bake is a point. The X axis is reversed so smoother bakes appear on the right. The dashed line marks fast-ssim2's CID22 SROCC (0.8895). Goal #1 is to be both above the line AND to the right of fast-ssim2's non-mono 5.08 %.
| Bake | CID22 SROCC | vs ssim2 | Non-mono | AIC-3 SROCC | Status |
|---|---|---|---|---|---|
| V0_5 leaked | 0.8900 | +0.0005 | 5.36 % | — | archived (training leak 11.77 %) |
| V0_6 clean (seed=42) | 0.8839 | −0.0056 | 5.94 % | — | honest baseline |
| V0_7 seed=0 (initial) | 0.8912 | +0.0017 | 5.67 % | — | archived (non-mono over) |
| V0_7 seed=1 TV=10 | 0.8933 | +0.0038 | 5.46 % | — | archived (B1 -0.027) |
| V0_8 TV=15 seed=1 (TAINTED, ARCHIVED) | 0.8948 | +0.0053 | 5.87 % | 0.8043 (+0.0078) | archived 2026-05-12 (inflation +0.0034) |
| V0_10 [15,25,15,15] | 0.8877 | −0.0018 | 2.40 % | 0.7945 (−0.0020) | tainted-data smoothness specialist |
| V0_11 flat TV=20 | 0.8897 | +0.0002 | 2.33 % | 0.8056 (+0.0091) | tainted-data exp; AIC-3 leader at the time |
| V0_12 B1 oversample | 0.8895 | ±0.0000 | 1.68 % | 0.7972 (+0.0007) | tainted-data exp; B1 oversample disproved |
| V0_15 honest TV=15 (ARCHIVED same day) | 0.8914 | +0.0019 | 2.51 % | 0.8019 (+0.0054) | superseded by V0_16 (better B0/B1) |
| V0_16 honest TV=20 (SHIPPED 2026-05-12) | 0.8919 | +0.0024 | 2.30 % | 0.7990 (+0.0025) | current ship · honest B1 closure |
| V0_17 honest TV=25 (NOT SHIPPED) | 0.8849 | −0.0046 | 2.44 % | 0.7995 (+0.0030) | over-regularized; B2/B3 collapse; AIC-3 marginal +0.0005 vs V0_16 |
| V0_18 seed=42 (NOT SHIPPED) | 0.8847 | −0.0048 | 2.01 % | 0.7899 (−0.0066) | seed sweep; best non-mono of sweep |
| V0_19 seed=7 (NOT SHIPPED) | 0.8848 | −0.0047 | 2.84 % | 0.7986 (+0.0021) | seed sweep; best per-band B0/B1 (0.452/0.476) |
| V0_20 seed=123 (NOT SHIPPED) | 0.8872 | −0.0023 | 2.65 % | 0.8097 (+0.0132) | seed sweep; best AIC-3 single-seed |
| 4-seed ensemble (V0_16/18/19/20) | 0.8892 | −0.0003 | — | 0.7998 (+0.0033) | CID22 tied; AIC-3 beats ssim2 |
| V0_21 butter-clean (NOT SHIPPED) | 0.8874 | −0.0021 | 2.91 % | 0.8060 (+0.0095) | single-bake trade-off; ensemble member |
| 5-bake ensemble (V0_16/18/19/20/21) | 0.8896 | +0.0001 | — | 0.8012 (+0.0047) | diluted; subset search found better below |
| {V0_16, V0_20} 2-bake | 0.8910 | +0.0015 | — | 0.8050 (+0.0085) | OPTIMUM · runtime ensemble recommendation (after exhaustive 7-bake search) |
| {V0_16, V0_20, V0_21} 3-bake | 0.8908 | +0.0013 | — | 0.8051 (+0.0086) | virtually tied with 2-bake at +0.0099 combined |
| V0_22 konjnd_w=1.0 (specialty) | 0.8870 | −0.0025 | 1.96 % | 0.7906 (−0.0059) | best smoothness + Near-PJND; not shipped |
Site source: site/index.html. Data builder:
scripts/v_next/build_site_data.py. See the
parity plan
for the full audit + methodology pipeline.
Caveat: paper used libjxl 0.8's ssim2; we use our own fast-ssim2 Rust port. KonJND-1k Table 4 reproduction matches to 3-4 sig figs; full CID22 Table 3 reproduction is pending Goal 3 work.