Skip to content

Stabilize CI test runs with deterministic environment settings#314

Draft
harryswift01 wants to merge 16 commits intomainfrom
313-bug-regression-tests-fail
Draft

Stabilize CI test runs with deterministic environment settings#314
harryswift01 wants to merge 16 commits intomainfrom
313-bug-regression-tests-fail

Conversation

@harryswift01
Copy link
Copy Markdown
Member

@harryswift01 harryswift01 commented Apr 8, 2026

Summary

This PR resolves nondeterministic regression test failures by enforcing deterministic iteration, eliminating shared mutable state, and standardizing the CI execution environment. Parallel execution with pytest-xdist is preserved.


Changes

Deterministic data handling

  • Enforce deterministic iteration over group mappings:
    • Replaced groups.keys() with sorted(groups.keys())
    • Standardized ordering when building group_id_to_index
  • Ensure consistent floating-point accumulation:
    • Sort components before summation in results reporting
  • Remove shared mutable state (aliasing):
    • Replace direct references with copies (list(...), .copy()) for force/torque accumulators

Deterministic CI environment

  • Set thread-related environment variables to avoid nondeterministic BLAS/OpenMP behavior:
    • OMP_NUM_THREADS=1
    • MKL_NUM_THREADS=1
    • OPENBLAS_NUM_THREADS=1
    • NUMEXPR_NUM_THREADS=1
  • Set PYTHONHASHSEED=0 to ensure stable dictionary hashing and ordering

Standardize test execution

  • Keep pytest-xdist enabled (-n auto) for performance
  • Use consistent distribution strategy (--dist=loadscope) across all jobs
  • Align test execution flags across unit, regression, and coverage workflows

CI consistency improvements

  • Apply identical environment configuration across all workflows (PR, daily, weekly)
  • Ensure consistent behavior across operating systems and Python versions

Root cause

The regression failures were caused by a combination of:

  • Non-deterministic dictionary iteration affecting group-to-index mappings
  • Floating-point accumulation order differences
  • Shared mutable state between data structures leading to cross-test contamination
  • Threaded numerical libraries introducing small, inconsistent variations

These issues manifested as intermittent failures, typically in later tests or when running in parallel.


Impact

  • Eliminates flaky regression test failures
  • Guarantees deterministic results across runs, platforms, and Python versions
  • Maintains parallel execution performance
  • Improves reproducibility between local and CI environments

Validation

  • Repeated regression test runs with parallel execution show consistent results
  • Verified stability under varying PYTHONHASHSEED
  • Added unit tests to ensure deterministic group indexing and prevent aliasing

@harryswift01 harryswift01 added this to the 2.1.2 milestone Apr 8, 2026
@harryswift01 harryswift01 self-assigned this Apr 8, 2026
@harryswift01 harryswift01 added bug Something isn't working CI Failure labels Apr 8, 2026
…le aliases

- sort group IDs when building `group_id_to_index` to guarantee deterministic ordering
- replace backwards-compatible aliases with copies to avoid shared mutable state
- update unit tests to reflect deterministic behaviour
- add determinism and aliasing tests to prevent regression
@harryswift01 harryswift01 force-pushed the 313-bug-regression-tests-fail branch from 5f1aa03 to 313f038 Compare April 9, 2026 08:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CI Failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Regression tests fail intermittently in CI due to floating-point nondeterminism

1 participant