CPU ReduceSum: improve large float32 full-reduction accuracy and add regression coverage#28587
Draft
Copilot wants to merge 3 commits into
Draft
CPU ReduceSum: improve large float32 full-reduction accuracy and add regression coverage#28587Copilot wants to merge 3 commits into
Copilot wants to merge 3 commits into
Conversation
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/6d1108ff-0865-40b2-a59a-6d0af71e8b38 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/6d1108ff-0865-40b2-a59a-6d0af71e8b38 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix numerical discrepancy in CPUExecutionProvider ReduceSum
CPU ReduceSum: improve large float32 full-reduction accuracy and add regression coverage
May 20, 2026
Contributor
|
@copilot is this efficient? |
Contributor
Author
It’s O(n) and allocation-free, but it is slower than the previous vectorized float sum because of the compensated double accumulation. The tradeoff is intentional here to eliminate the large numeric error reported for big float32 reductions. In this branch, that behavior is in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
CPU
ReduceSumon largefloat32tensors could produce materially incorrect results (orders of magnitude beyond expected reduction-order drift), while equivalentfloat64reduction paths remained stable. This PR tightens numerical behavior for the affected path and adds a focused regression case matching the reported shape/value pattern.Numerical stabilization in
ReduceSum<float>ReduceAggregatorSum<T>::aggallto use Kahan compensated summation forT=floatin the scalar aggregation path.Targeted regression test
ReductionOpTest.ReduceSum_default_axes_do_not_keep_dims_large_float32_constant_input.[5, 68, 64, 64], constant0.1f) and asserts the scalar output against a high-precision reference with tight tolerances.Motivation and Context
The issue reports a large discrepancy in CPU EP
ReduceSum(float32)for a simple large constant input, even with optimizations and threading constrained; the observed error is far larger than normal non-associativity noise. The change directly targets that high-error reduction path and adds regression protection for the same workload profile.