CPU ReduceSum: improve large float32 full-reduction accuracy and add regression coverage by Copilot · Pull Request #28587 · microsoft/onnxruntime

Copilot · 2026-05-20T15:37:55Z

Description

CPU ReduceSum on large float32 tensors could produce materially incorrect results (orders of magnitude beyond expected reduction-order drift), while equivalent float64 reduction paths remained stable. This PR tightens numerical behavior for the affected path and adds a focused regression case matching the reported shape/value pattern.

Numerical stabilization in ReduceSum<float>
- Updated ReduceAggregatorSum<T>::aggall to use Kahan compensated summation for T=float in the scalar aggregation path.
- Kept non-float behavior unchanged.
Targeted regression test
- Added ReductionOpTest.ReduceSum_default_axes_do_not_keep_dims_large_float32_constant_input.
- Reproduces a large full-tensor reduction ([5, 68, 64, 64], constant 0.1f) and asserts the scalar output against a high-precision reference with tight tolerances.

if constexpr (std::is_same_v<T, float>) {
  double sum = 0.0, compensation = 0.0;
  for (int64_t i = 0; i < size; ++i) {
    const double value = static_cast<double>(from_data[i]) - compensation;
    const double next_sum = sum + value;
    compensation = (next_sum - sum) - value;
    sum = next_sum;
  }
  return static_cast<float>(sum);
}

Motivation and Context

The issue reports a large discrepancy in CPU EP ReduceSum(float32) for a simple large constant input, even with optimizations and threading constrained; the observed error is far larger than normal non-associativity noise. The change directly targets that high-error reduction path and adds regression protection for the same workload profile.

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/6d1108ff-0865-40b2-a59a-6d0af71e8b38 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

justinchuby · 2026-05-20T23:26:12Z

@copilot is this efficient?

Copilot · 2026-05-20T23:28:12Z

@copilot is this efficient?

It’s O(n) and allocation-free, but it is slower than the previous vectorized float sum because of the compensated double accumulation. The tradeoff is intentional here to eliminate the large numeric error reported for big float32 reductions. In this branch, that behavior is in ReduceAggregatorSum<float>::aggall from commit 1d0f58f. If you want, I can follow up with a size-threshold fast path (Eigen for small reductions, compensated path for large) to reduce overhead.

Initial plan

270f80e

Copilot AI assigned Copilot and justinchuby May 20, 2026

Copilot started work on behalf of justinchuby May 20, 2026 15:39 View session

Copilot AI linked an issue May 20, 2026 that may be closed by this pull request

Large numerical discrepancy in CPUExecutionProvider ReduceSum for large float32 tensor #28450

Open

Copilot AI and others added 2 commits May 20, 2026 16:24

Fix float32 ReduceSum stability for large full reductions

deb2525

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/6d1108ff-0865-40b2-a59a-6d0af71e8b38 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Add clarifying notes for compensated ReduceSum fix

1d0f58f

Agent-Logs-Url: https://github.com/microsoft/onnxruntime/sessions/6d1108ff-0865-40b2-a59a-6d0af71e8b38 Co-authored-by: justinchuby <11205048+justinchuby@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix numerical discrepancy in CPUExecutionProvider ReduceSum~~ CPU ReduceSum: improve large float32 full-reduction accuracy and add regression coverage May 20, 2026

Copilot AI requested a review from justinchuby May 20, 2026 16:32

Copilot finished work on behalf of justinchuby May 20, 2026 16:32

Copilot started work on behalf of justinchuby May 20, 2026 23:26 View session

Copilot finished work on behalf of justinchuby May 20, 2026 23:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU ReduceSum: improve large float32 full-reduction accuracy and add regression coverage#28587

CPU ReduceSum: improve large float32 full-reduction accuracy and add regression coverage#28587
Copilot wants to merge 3 commits into
mainfrom
copilot/fix-cpu-executionprovider-reducesum-issue

Copilot AI commented May 20, 2026 •

edited

Loading

Uh oh!

justinchuby commented May 20, 2026

Uh oh!

Copilot AI commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

justinchuby commented May 20, 2026

Uh oh!

Copilot AI commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented May 20, 2026 •

edited

Loading