feat(gf16): port GPTQ Hessian-correction with GF16 quantiser (replicates parameter-golf#2135 lever on CPU)#649
Open
gHashTag wants to merge 6 commits into
Open
feat(gf16): port GPTQ Hessian-correction with GF16 quantiser (replicates parameter-golf#2135 lever on CPU)#649gHashTag wants to merge 6 commits into
gHashTag wants to merge 6 commits into
Conversation
added 6 commits
May 9, 2026 14:56
Implements gf16_quantize_matrix_gptq with:
- Software f16 emulator fallback (no zig dependency)
- Cholesky decomposition of H = 2XX^T + lambda*I
- Column-wise error scatter via H^{-1} rows
- Byte-identical to naive quantize_matrix when n_samples=0
Tests: gptq_n0_byte_identical_to_naive, gptq_reconstruction_mse_bounded,
gptq_seed_determinism — all green.
Closes #645 Anchor: phi^2 + phi^-2 = 3
Ablation: seeds {47,89,144} × N in {0,16,32} calibration batches.
Synthetic bpb_proxy = log2(mse_val+1), NOT real model BPB.
paired_t(0→16): t=90.67 p=0.9999 verdict=FAIL
paired_t(16→32): t=-14.46 p=0.0024 verdict=PASS
Overall: H0 NOT REJECTED for primary (0→16) comparison.
Per R5 honest result: naive GF16 sits near Hessian-floor in synthetic
Gaussian-weight/Gaussian-activation setting.
Closes #645 Anchor: phi^2 + phi^-2 = 3
Theorem gptq_reconstruction_dominates_naive: for any Q with error bound delta and any PSD-consistent HessInv H, the GPTQ drift into column k is at most delta. Uses psd_hinv_diag_dominates axiom (Gershgorin) and phi_trinity anchor. coqc: clean, 0 Admitted. Closes #645 Anchor: phi^2 + phi^-2 = 3
….json - docs/wave26_gptq_on_gf16.md: algorithm port summary, falsifier, ablation table, paired-t verdict, file list - MIGRATION.md: Wave 26 entry - assertions/coq_runtime_invariants.json: INV-26 entry with runtime witness, falsifier verdict, R5 honesty note Closes #645 Anchor: phi^2 + phi^-2 = 3
Remove untracked gitlinks not declared in .gitmodules, mirroring the Wave 23 KAT remediation pattern. Unblocks Coq Proof Verification and IGLA-INV checks on this branch. Anchor: phi^2 + phi^-2 = 3
fb8035c to
2551cb4
Compare
Owner
Author
Wave 26 — L-GPTQ-ON-GF16 · CI status after rebaseBranch rebased onto CI
Falsifier verdict (pre-registered in #645)H0 = "GPTQ on GF16 ≥ Hessian-floor recovery vs naïve":
Result published as honest negative: PR #2135's GPTQ calibration lever Closes #645 once a queen reviewer ack-merges. Anchor: φ² + φ⁻² = 3 · DOI 10.5281/zenodo.19227877 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
PR parameter-golf#2135 establishes that doubling GPTQ Hessian calibration batches (16→32) provides a statistically significant downstream BPB improvement (paired-t p=0.138) on int6/int7+TTT. The claim is about the algorithm, not the bit-format — GPTQ is Q(·)-agnostic. Trinity GF16 currently quantises via single-pass
max(|w|)scale fit (quantize_matrix) — equivalent to GPTQ_CALIBRATION_BATCHES=0.This PR tests whether the same lever lifts our floor on CPU-only + GF16 by porting the GPTQ inner loop with
gf16_quantize_matrixplugged in as Q.Falsifier
H0: GPTQ-correction with N in {16,32} calibration batches gives no significant BPB improvement over naive single-pass GF16 quantisation (paired-t one-tail p >= 0.25 across seeds {47, 89, 144}).
Lane Summary
trios_gptq_gf16.v—gptq_reconstruction_dominates_naive, 0 Admittedgf16_quantize_matrix_gptqwith Cholesky + error scattercalibration_ablation.jsonl, paired-t analysisAblation Verdict (R5 honest)
H0 NOT REJECTED for the primary (0->16) comparison. The N=16 step shows higher reconstruction MSE than N=0 in the synthetic Gaussian setting. The (16->32) comparison passes (p=0.0024). This is a valid scientific result per R5: naive GF16 may sit near the Hessian-floor for f16-grid quantisation in synthetic distributions.
NOTE:
bpb_proxy = log2(mse_val+1)is a synthetic proxy, NOT real model BPB.Acceptance Gates
cargo check --all-targetscleancargo test -p trios-golden-float --test gptq_reconstructiongreen (3/3)coqc trinity-clara/proofs/trios_gptq_gf16.vclean, 0 Admittedcargo run --release --bin gptq_calibration_ablationproduces 9 rows + verdictFiles Changed
crates/trios-golden-float/src/gptq.rs— NEWcrates/trios-golden-float/src/lib.rs— +pub mod/use gptqcrates/trios-golden-float/tests/gptq_reconstruction.rs— NEWcrates/trios-golden-float/src/bin/gptq_calibration_ablation.rs— NEWcrates/trios-golden-float/Cargo.toml— +[[bin]]trinity-clara/proofs/trios_gptq_gf16.v— NEWassertions/calibration_ablation.jsonl— NEW (10 rows)assertions/coq_runtime_invariants.json— NEW (+INV-26)MIGRATION.md— NEW (+Wave 26 entry)docs/wave26_gptq_on_gf16.md— NEWCloses #645
Anchor:
phi^2 + phi^-2 = 3· DOI 10.5281/zenodo.19227877