zensim — V_X progression vs CID22 paper

→ Full methodology · → Interactive corpus comparison (new — pick any corpus / X / Y → scatter + step-5 + per-band SROCC)

Interactive per-band SROCC graphs for our V_X champion bakes versus the SSIMULACRA 2 reference and the CID22 paper's published numbers. Shipping bar: match-or-exceed fast-ssim2 across all four quality bands (B0–B3 per paper Table 5).

Current ship: V0_16 (TV=20, seed=1, 2026-05-12) — successor to V0_15 (same day). V0_16 raises TV from 15 to 20 on the same purged training data, recovering V0_8's B1 closure HONESTLY without training-set leakage. CID22 SROCC 0.8919 (+0.0024 vs fast-ssim2's 0.8895); AIC-3 CTC SROCC 0.7990 (+0.0025 vs ssim2's 0.7965); non-mono q-step rate 2.30 % (best of any bake; 1/2.5 of V0_8's 5.87 %). Per-band B1 = 0.4559 vs ssim2 0.4694 (-0.014, MATCHES V0_8 tainted's -0.014 HONESTLY). B0 = 0.4214 vs ssim2 0.4418 (-0.020, was V0_15 -0.049). Affine-calibrated α=28.0366, β=-5.0738, R²=0.7423.
V0_8 (2026-05-11) ARCHIVED 2026-05-12 — the previously- shipped V0_8 (md5 67482691) was trained on a clean CSV whose perceptual-overlap cleanup ran at a looser threshold than d≤16. 11,629 rows (7.43 %) were near-duplicates of 22 of the 49 CID22 held-out references. V0_8's CID22 0.8948 was inflated by +0.0034 (confirmed by V0_15's 0.8914 honest result). V0_8 is preserved at zensim/weights/archive/v0_8_tainted_2026-05-11.bin for reference but is no longer the runtime weight.

Goal 3 — KonJND-1k PJND validation vs paper Table 4

Independent sanity check that our fast-ssim2 + butter implementations reproduce the CID22 paper's headline KonJND-1k numbers (paper p. 20 Table 4). Our measurements match to 3–4 significant figures — pipeline validated.

Aggregate SROCC per dataset

CID22 is the only true held-out test; KADID/TID/KonJND are in training.

Per-band SROCC (CID22 49-ref held-out)

Bands per CID22 paper Table 5: B0<50 / B1 [50,65) / B2 [65,90) / B3≥90.

Paper-parity table (Table 3 — full 250-ref SROCC; CID22 paper)

Paper-published SROCC values vs our reproduction. SSIMULACRA 2's KRCC 0.7033 / SRCC 0.88541 / PCC 0.87448 / MAE 4.97 is the 49-ref held-out target. We use the 49-ref held-out subset only.

Scatter plots: V_X vs reference metrics (CID22)

Each point is one CID22 (reference, distorted) pair, plotted by V_X's quality score (−distance, y-axis) against a reference metric (x-axis). Color = MCOS band (red=B0 below-medium, orange=B1 medium, green=B2 high, blue=B3 visually-lossless). A perfect rank match shows monotonic clouds. dssim is not yet plotted — only ssim2 + butter are in the current eval pipeline (dssim integration is queued).

Compare V0_16 (current ship), V0_15 (archived same-day), and V0_8 (tainted, archived).

V0_16 vs fast-ssim2

V0_16 vs butteraugli (3-norm, negated)

V0_16 vs human MCOS (gold standard)

Step-5 per-band SROCC on CID22

Within-bin SROCC at step-5 MCOS granularity (vs the 4 broader bands above). Each point is the rank correlation of a metric's score vs human MOS for pairs whose MCOS falls in that 5-point window. Tail bins have small n — see annotations.

V0_16 non-mono q-step rate across codecs

The V0_16 ship's smoothness gate (≤ 4.86 % non-monotonic q-step rate) holds across all major codec families. AVIF and JXL produce perfectly smooth quality curves; WebP and JPEG are well below target.

CodecNon-mono q-step rateNotes
AVIF0.00 %Perfect smoothness
JPEG XL0.00 %Perfect smoothness
WebP0.50 %Near-perfect
JPEG2.30 %Within 4.86 % target; coarsest q steps
PNGn/aLossless — no quality curves

Pareto: CID22 SROCC vs AIC-3 SROCC (held-out 2D)

Both axes are SROCC vs human MOS; higher = better. Dashed lines mark fast-ssim2 levels. Target = upper-right quadrant (beats ssim2 on both). Orange = ensemble subsets (recipe-diversity). Green = current ship V0_16. Purple = V0_21 butter-clean single bake.

Pareto: CID22 SROCC vs non-mono q-step rate

Each bake is a point. The X axis is reversed so smoother bakes appear on the right. The dashed line marks fast-ssim2's CID22 SROCC (0.8895). Goal #1 is to be both above the line AND to the right of fast-ssim2's non-mono 5.08 %.

Bake history

BakeCID22 SROCCvs ssim2Non-monoAIC-3 SROCCStatus
V0_5 leaked0.8900+0.00055.36 %archived (training leak 11.77 %)
V0_6 clean (seed=42)0.8839−0.00565.94 %honest baseline
V0_7 seed=0 (initial)0.8912+0.00175.67 %archived (non-mono over)
V0_7 seed=1 TV=100.8933+0.00385.46 %archived (B1 -0.027)
V0_8 TV=15 seed=1 (TAINTED, ARCHIVED)0.8948+0.00535.87 %0.8043 (+0.0078)archived 2026-05-12 (inflation +0.0034)
V0_10 [15,25,15,15]0.8877−0.00182.40 %0.7945 (−0.0020)tainted-data smoothness specialist
V0_11 flat TV=200.8897+0.00022.33 %0.8056 (+0.0091)tainted-data exp; AIC-3 leader at the time
V0_12 B1 oversample0.8895±0.00001.68 %0.7972 (+0.0007)tainted-data exp; B1 oversample disproved
V0_15 honest TV=15 (ARCHIVED same day)0.8914+0.00192.51 %0.8019 (+0.0054)superseded by V0_16 (better B0/B1)
V0_16 honest TV=20 (SHIPPED 2026-05-12)0.8919+0.00242.30 %0.7990 (+0.0025)current ship · honest B1 closure
V0_17 honest TV=25 (NOT SHIPPED)0.8849−0.00462.44 %0.7995 (+0.0030)over-regularized; B2/B3 collapse; AIC-3 marginal +0.0005 vs V0_16
V0_18 seed=42 (NOT SHIPPED)0.8847−0.00482.01 %0.7899 (−0.0066)seed sweep; best non-mono of sweep
V0_19 seed=7 (NOT SHIPPED)0.8848−0.00472.84 %0.7986 (+0.0021)seed sweep; best per-band B0/B1 (0.452/0.476)
V0_20 seed=123 (NOT SHIPPED)0.8872−0.00232.65 %0.8097 (+0.0132)seed sweep; best AIC-3 single-seed
4-seed ensemble (V0_16/18/19/20)0.8892−0.00030.7998 (+0.0033)CID22 tied; AIC-3 beats ssim2
V0_21 butter-clean (NOT SHIPPED)0.8874−0.00212.91 %0.8060 (+0.0095)single-bake trade-off; ensemble member
5-bake ensemble (V0_16/18/19/20/21)0.8896+0.00010.8012 (+0.0047)diluted; subset search found better below
{V0_16, V0_20} 2-bake0.8910+0.00150.8050 (+0.0085)OPTIMUM · runtime ensemble recommendation (after exhaustive 7-bake search)
{V0_16, V0_20, V0_21} 3-bake0.8908+0.00130.8051 (+0.0086)virtually tied with 2-bake at +0.0099 combined
V0_22 konjnd_w=1.0 (specialty)0.8870−0.00251.96 %0.7906 (−0.0059)best smoothness + Near-PJND; not shipped

Notes

Site source: site/index.html. Data builder: scripts/v_next/build_site_data.py. See the parity plan for the full audit + methodology pipeline.

Caveat: paper used libjxl 0.8's ssim2; we use our own fast-ssim2 Rust port. KonJND-1k Table 4 reproduction matches to 3-4 sig figs; full CID22 Table 3 reproduction is pending Goal 3 work.