Skip to content

Match MATLAB column-major average reference (float32 parity)#232

Merged
arnodelorme merged 1 commit into
developfrom
fix/reref-average-transpose-parity
Jun 13, 2026
Merged

Match MATLAB column-major average reference (float32 parity)#232
arnodelorme merged 1 commit into
developfrom
fix/reref-average-transpose-parity

Conversation

@arnodelorme

Copy link
Copy Markdown
Collaborator

🤖 Average re-referencing was computed as np.dot(refmatrix, data) (NumPy row-major), which differs from MATLAB reref.m's column-major refmatrix * data by up to ~0.06 µV in float32. For borderline subjects, ICA (Picard) convergence amplifies this tiny input difference into a different decomposition, cascading to ~2–3 µV differences at ICLabel and epoching.

Fix

Transpose the operands — (data.T @ refmatrix).T (refmatrix is symmetric, so algebraically identical) — to replicate MATLAB's column-major float32 accumulation order. Bit-exact to MATLAB on x86 BLAS. Single-line change; only the average-reference branch is touched (specific-channel/huber paths unchanged). pop_reref routes both the data and the ICA icawinv through reref(), so both are covered.

Validation (13 P300 subjects, Python 3.10 + MATLAB R2022b)

  • Step-2 re-reference: 0.03–0.16 µV → 0.000000 µV for all 13 subjects.
  • Steps 1–4 bit-exact; ICA AMARI ~0 / corr 1.0 for all subjects, including sub-005/008/013 (previously 0.029 / 0.042 / 0.026).
  • Step-5 ICLabel max across subjects: 3.34 µV → 0.0002 µV; the sub-005/008/013 ICLabel divergence (2.76 / 3.34 / 1.87 µV) is eliminated, and the same components are rejected on both sides.

reref() computed the average reference as np.dot(refmatrix, data)
(NumPy row-major), which differs from MATLAB's column-major
refmatrix*data by ~0.06 uV in float32. ICA convergence amplifies
this for borderline subjects (sub-005/008/013) into a different
decomposition and ~2-3 uV ICLabel/epoch differences. Transposing
the operands -- (data.T @ refmatrix).T, refmatrix being symmetric --
reproduces MATLAB's accumulation order bit-exactly on x86. reref now
matches MATLAB to 0 uV across all 13 P300 parity subjects.
@claude

claude Bot commented Jun 13, 2026

Copy link
Copy Markdown

Claude finished @arnodelorme's task in 1m 52s —— View job


PR Review — Match MATLAB column-major average reference (float32 parity)

  • Read changed file & diff vs origin/develop
  • Check MATLAB reref.m reference (not vendored in this checkout)
  • Assess correctness / algebraic equivalence
  • Review dtype flow, scope, and risks

Verdict: LGTM. The change is correct, minimal, well-scoped, and the comment honestly documents the reasoning and its platform constraint.


Correctness

The only behavioral line changes the average-reference matmul:

# before
work[chansin_array, :] = np.dot(refmatrix, work[chansin_array, :].astype(dt))
# after
block = np.ascontiguousarray(work[chansin_array, :].astype(dt).T)
work[chansin_array, :] = (block @ refmatrix).T
  • Algebraically equivalent. refmatrix = eye(n) - ones(n)/n is symmetric, so (data.T @ refmatrix).T = refmatrix.T @ data = refmatrix @ data. ✅
  • dtype flow is sound. work is float64 (line 50, result_type(float32, float64)); .astype(dt) casts back to the original float32 so the BLAS GEMM runs in float32 (the whole point), and the float32 result is upcast on assignment back into the float64 work buffer. Identical dtype semantics to the prior line. ✅
  • ascontiguousarray(...T) materializes the transposed block in C-order so the GEMM operand layout mirrors MATLAB's column-major refmatrix * data. This is the mechanism behind the bit-exact claim. Cost is one extra transpose-copy per call — negligible at EEG sizes.
  • Scope is tight. Only the average-reference else branch is touched; the specific-channel (mean_data = work[ref].mean(...)) and Huber paths are untouched, matching the diff. Since pop_reref routes both data and icawinv through reref(), both get the corrected accumulation. ✅
  • mean_data = None for average reference is unchanged pre-existing behavior.

Worth noting (non-blocking)

  • Platform dependence. The comment correctly scopes bit-exactness to "x86 BLAS". On Apple Silicon / ARM or a different BLAS build (OpenBLAS vs MKL), the result may not be bit-identical to MATLAB. Importantly this is never worse than the prior np.dot form — it's the same float32 GEMM, just with operand layout chosen to match MATLAB — so there's no regression risk for non-x86 users. The honest comment is the right call here; no code change needed.
  • No new test asserts the parity. The existing tests/test_pop_reref.py cases check mean across channels ≈ 0 (atol 1e-6) and icawinv subtraction, which this change preserves but which would not have caught the original 0.06 µV divergence (that only surfaces downstream through ICA). The real validation lives in the MATLAB-parity scripts (tests/matlab/pop_reref_compare.m) and the 13-subject P300 results in the PR body. That's reasonable given bit-exactness is platform/BLAS-specific and hard to assert portably — flagging only so the rationale is recorded.

Testing note

I was unable to execute the test suite or a numerical sanity check in this environment (Bash command approval is restricted here). Please confirm the parity scripts / tests/test_pop_reref.py pass in CI before merging — though given the algebraic equivalence and the thorough validation in the PR description, I expect them to.

@arnodelorme arnodelorme merged commit c73ca0d into develop Jun 13, 2026
10 checks passed
@arnodelorme arnodelorme deleted the fix/reref-average-transpose-parity branch June 13, 2026 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant