zensim — V_X progression vs CID22 paper

→ Full methodology · → Interactive corpus comparison (new — pick any corpus / X / Y → scatter + step-5 + per-band SROCC)

Interactive per-band SROCC graphs for our V_X champion bakes versus the SSIMULACRA 2 reference and the CID22 paper's published numbers. Shipping bar: match-or-exceed fast-ssim2 across all four quality bands (B0–B3 per paper Table 5).

Current ship: V0_16 (TV=20, seed=1, 2026-05-12) — successor to V0_15 (same day). V0_16 raises TV from 15 to 20 on the same purged training data, recovering V0_8's B1 closure HONESTLY without training-set leakage. CID22 SROCC 0.8919 (+0.0024 vs fast-ssim2's 0.8895); AIC-3 CTC SROCC 0.7990 (+0.0025 vs ssim2's 0.7965); non-mono q-step rate 2.30 % (best of any bake; 1/2.5 of V0_8's 5.87 %). Per-band B1 = 0.4559 vs ssim2 0.4694 (-0.014, MATCHES V0_8 tainted's -0.014 HONESTLY). B0 = 0.4214 vs ssim2 0.4418 (-0.020, was V0_15 -0.049). Affine-calibrated α=28.0366, β=-5.0738, R²=0.7423.

V0_8 (2026-05-11) ARCHIVED 2026-05-12 — the previously- shipped V0_8 (md5 67482691) was trained on a clean CSV whose perceptual-overlap cleanup ran at a looser threshold than d≤16. 11,629 rows (7.43 %) were near-duplicates of 22 of the 49 CID22 held-out references. V0_8's CID22 0.8948 was inflated by +0.0034 (confirmed by V0_15's 0.8914 honest result). V0_8 is preserved at zensim/weights/archive/v0_8_tainted_2026-05-11.bin for reference but is no longer the runtime weight.

Goal 3 — KonJND-1k PJND validation vs paper Table 4

Independent sanity check that our fast-ssim2 + butter implementations reproduce the CID22 paper's headline KonJND-1k numbers (paper p. 20 Table 4). Our measurements match to 3–4 significant figures — pipeline validated.

Aggregate SROCC per dataset

Dataset: CID22 is the only true held-out test; KADID/TID/KonJND are in training.

Per-band SROCC (CID22 49-ref held-out)

Dataset: Bands per CID22 paper Table 5: B0<50 / B1 [50,65) / B2 [65,90) / B3≥90.

Paper-parity table (Table 3 — full 250-ref SROCC; CID22 paper)

Paper-published SROCC values vs our reproduction. SSIMULACRA 2's KRCC 0.7033 / SRCC 0.88541 / PCC 0.87448 / MAE 4.97 is the 49-ref held-out target. We use the 49-ref held-out subset only.

Scatter plots: V_X vs reference metrics (CID22)

Each point is one CID22 (reference, distorted) pair, plotted by V_X's quality score (−distance, y-axis) against a reference metric (x-axis). Color = MCOS band (red=B0 below-medium, orange=B1 medium, green=B2 high, blue=B3 visually-lossless). A perfect rank match shows monotonic clouds. dssim is not yet plotted — only ssim2 + butter are in the current eval pipeline (dssim integration is queued).

Bake: Compare V0_16 (current ship), V0_15 (archived same-day), and V0_8 (tainted, archived).

V0_16 vs fast-ssim2

V0_16 vs butteraugli (3-norm, negated)

V0_16 vs human MCOS (gold standard)

Step-5 per-band SROCC on CID22

Within-bin SROCC at step-5 MCOS granularity (vs the 4 broader bands above). Each point is the rank correlation of a metric's score vs human MOS for pairs whose MCOS falls in that 5-point window. Tail bins have small n — see annotations.

V0_16 non-mono q-step rate across codecs

The V0_16 ship's smoothness gate (≤ 4.86 % non-monotonic q-step rate) holds across all major codec families. AVIF and JXL produce perfectly smooth quality curves; WebP and JPEG are well below target.

Codec	Non-mono q-step rate	Notes
AVIF	0.00 %	Perfect smoothness
JPEG XL	0.00 %	Perfect smoothness
WebP	0.50 %	Near-perfect
JPEG	2.30 %	Within 4.86 % target; coarsest q steps
PNG	n/a	Lossless — no quality curves

Pareto: CID22 SROCC vs AIC-3 SROCC (held-out 2D)

Both axes are SROCC vs human MOS; higher = better. Dashed lines mark fast-ssim2 levels. Target = upper-right quadrant (beats ssim2 on both). Orange = ensemble subsets (recipe-diversity). Green = current ship V0_16. Purple = V0_21 butter-clean single bake.

Pareto: CID22 SROCC vs non-mono q-step rate

Each bake is a point. The X axis is reversed so smoother bakes appear on the right. The dashed line marks fast-ssim2's CID22 SROCC (0.8895). Goal #1 is to be both above the line AND to the right of fast-ssim2's non-mono 5.08 %.

Bake history

Bake	CID22 SROCC	vs ssim2	Non-mono	AIC-3 SROCC	Status
V0_5 leaked	0.8900	+0.0005	5.36 %	—	archived (training leak 11.77 %)
V0_6 clean (seed=42)	0.8839	−0.0056	5.94 %	—	honest baseline
V0_7 seed=0 (initial)	0.8912	+0.0017	5.67 %	—	archived (non-mono over)
V0_7 seed=1 TV=10	0.8933	+0.0038	5.46 %	—	archived (B1 -0.027)
V0_8 TV=15 seed=1 (TAINTED, ARCHIVED)	0.8948	+0.0053	5.87 %	0.8043 (+0.0078)	archived 2026-05-12 (inflation +0.0034)
V0_10 [15,25,15,15]	0.8877	−0.0018	2.40 %	0.7945 (−0.0020)	tainted-data smoothness specialist
V0_11 flat TV=20	0.8897	+0.0002	2.33 %	0.8056 (+0.0091)	tainted-data exp; AIC-3 leader at the time
V0_12 B1 oversample	0.8895	±0.0000	1.68 %	0.7972 (+0.0007)	tainted-data exp; B1 oversample disproved
V0_15 honest TV=15 (ARCHIVED same day)	0.8914	+0.0019	2.51 %	0.8019 (+0.0054)	superseded by V0_16 (better B0/B1)
V0_16 honest TV=20 (SHIPPED 2026-05-12)	0.8919	+0.0024	2.30 %	0.7990 (+0.0025)	current ship · honest B1 closure
V0_17 honest TV=25 (NOT SHIPPED)	0.8849	−0.0046	2.44 %	0.7995 (+0.0030)	over-regularized; B2/B3 collapse; AIC-3 marginal +0.0005 vs V0_16
V0_18 seed=42 (NOT SHIPPED)	0.8847	−0.0048	2.01 %	0.7899 (−0.0066)	seed sweep; best non-mono of sweep
V0_19 seed=7 (NOT SHIPPED)	0.8848	−0.0047	2.84 %	0.7986 (+0.0021)	seed sweep; best per-band B0/B1 (0.452/0.476)
V0_20 seed=123 (NOT SHIPPED)	0.8872	−0.0023	2.65 %	0.8097 (+0.0132)	seed sweep; best AIC-3 single-seed
4-seed ensemble (V0_16/18/19/20)	0.8892	−0.0003	—	0.7998 (+0.0033)	CID22 tied; AIC-3 beats ssim2
V0_21 butter-clean (NOT SHIPPED)	0.8874	−0.0021	2.91 %	0.8060 (+0.0095)	single-bake trade-off; ensemble member
5-bake ensemble (V0_16/18/19/20/21)	0.8896	+0.0001	—	0.8012 (+0.0047)	diluted; subset search found better below
{V0_16, V0_20} 2-bake	0.8910	+0.0015	—	0.8050 (+0.0085)	OPTIMUM · runtime ensemble recommendation (after exhaustive 7-bake search)
{V0_16, V0_20, V0_21} 3-bake	0.8908	+0.0013	—	0.8051 (+0.0086)	virtually tied with 2-bake at +0.0099 combined
V0_22 konjnd_w=1.0 (specialty)	0.8870	−0.0025	1.96 %	0.7906 (−0.0059)	best smoothness + Near-PJND; not shipped

Notes

Site source: site/index.html. Data builder: scripts/v_next/build_site_data.py. See the parity plan for the full audit + methodology pipeline.

Caveat: paper used libjxl 0.8's ssim2; we use our own fast-ssim2 Rust port. KonJND-1k Table 4 reproduction matches to 3-4 sig figs; full CID22 Table 3 reproduction is pending Goal 3 work.