Skip to content

CuD-PDLP#1391

Draft
Bubullzz wants to merge 87 commits into
NVIDIA:mainfrom
Bubullzz:cuD-PDLP
Draft

CuD-PDLP#1391
Bubullzz wants to merge 87 commits into
NVIDIA:mainfrom
Bubullzz:cuD-PDLP

Conversation

@Bubullzz

@Bubullzz Bubullzz commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Not review ready
Not merge ready

Just to let team have a look at it but definitely needs a big clean up
closes #891

Bubullzz added 30 commits May 7, 2026 15:07
@copy-pr-bot

copy-pr-bot Bot commented Jun 4, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Bubullzz Bubullzz added the do not merge Do not merge if this flag is set label Jun 4, 2026
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e69fca71-aa27-4195-bb47-de42ad9fd38c

📥 Commits

Reviewing files that changed from the base of the PR and between 91b1ae5 and f3b6343.

📒 Files selected for processing (10)
  • cpp/CMakeLists.txt
  • cpp/cmake/thirdparty/get_kaminpar.cmake
  • cpp/cuopt_cli.cpp
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/tests/linear_programming/pdlp_test.cu
💤 Files with no reviewable changes (5)
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/tests/linear_programming/pdlp_test.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/pdlp.cu
🚧 Files skipped from review as they are similar to previous changes (4)
  • cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh
  • cpp/CMakeLists.txt
  • cpp/cmake/thirdparty/get_kaminpar.cmake
  • cpp/cuopt_cli.cpp

📝 Walkthrough

Walkthrough

Adds end-to-end distributed multi-GPU PDLP: CMake/third-party wiring, partitioner contracts and METIS/KaMinPar backends, partition file I/O and rank data, per-GPU shard types, a multi-GPU engine (halo/exchange/allreduce/distributed SpMV/scaling), PDHG/PDLP solver multi-GPU wiring and constructors, distributed scaling/refactoring, convergence/restart adaptations, and tests.

Changes

Distributed Multi-GPU PDLP

Layer / File(s) Summary
Build system & dependency wiring
cpp/CMakeLists.txt, cpp/cmake/thirdparty/get_kaminpar.cmake, cpp/src/pdlp/CMakeLists.txt
CMake locates NCCL/METIS, configures KaMinPar, and adds distributed PDLP sources and link targets.
Configuration & CLI routing
cpp/include/cuopt/linear_programming/constants.h, cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp, cpp/src/math_optimization/solver_settings.cu, cpp/cuopt_cli.cpp
Adds distributed PDLP config keys, solver settings registration, and CLI branching + per-device RMM provisioning for provisioned GPU count.
Partitioner contracts & implementations
cpp/src/pdlp/distributed_pdlp/partitioner.hpp, partitioner.cu, metis_partitioner.*, kaminpar_partitioner.*
Defines partitioner interface, factory, Dummy/METIS/KaMinPar backends with bipartite CSR conversion and validation.
Partition I/O & rank-data
cpp/src/pdlp/distributed_pdlp/partition_loader.*, rank_data.hpp
Parse/export partition files and build per-rank rank_data with local CSR matrices, global↔local maps, and per-peer halo plans.
Shard type & construction
cpp/src/pdlp/distributed_pdlp/shard.hpp, shard.cu
Non-copyable per-GPU shard owning device problem state, NCCL comm, cuSPARSE plans, pre-staged halo buffers, and scaling initialization.
Multi-GPU engine
cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.hpp, multi_gpu_engine.cu
Engine orchestrates halo exchange, NCCL all-reduce, distributed L2 norm, distributed scaling (Ruiz/Pock–Chambolle), distributed SpMV, power-iteration σ_max, gather to master, and graph-capture sync.
Initial scaling refactor
cpp/src/pdlp/initial_scaling_strategy/*
Splits Ruiz/Pock–Chambolle into compute/apply stages, exposes cumulative-scaling accessors/setters, and adds distributed rescaling application and skip flag.
PDHG multi-GPU wiring
cpp/src/pdlp/pdhg.hpp, pdhg.cu
Adds mgpu_engine wiring, dispatch to distributed_spmv when present, spmv_*_into helpers, reflected projection transforms, and CUDA graph fork/join across shards.
PDLP distributed constructor & solver loop
cpp/src/pdlp/pdlp.cuh, pdlp.cu
New constructor from MPS partitions problem, constructs engine/shards, performs distributed scaling and norm init, and adapts run loop and fixed-error/restart per-shard.
Entrypoints & graph disable
cpp/src/pdlp/solve.cuh, solve.cu, cpp/cuopt_cli.cpp
Adds solve_lp_distributed_from_mps, routing checks, graph-disable gating, and CLI changes to enable distributed path.
Convergence & termination
cpp/src/pdlp/termination_strategy/*
Adds per-shard objective partials, distributed residual norms, all-reduce aggregation, mutable getters, and gather-to-master return handling.
Adaptive step-size
cpp/src/pdlp/step_size_strategy/*
Exposes mutable norm buffers and owned-prefix parameters for per-shard movement computations.
cuSPARSE descriptor binding fix
cpp/src/pdlp/cusparse_view.cu
Bind CSR descriptor nnz to actual stored buffer lengths for shard-safety.
Tracing & graph gating
cpp/src/pdlp/utilities/mgpu_trace.cuh, ping_pong_graph.cuh
Adds MGPU_TRACE macros and atomic graph-disable flag for debugging.
Tests
cpp/tests/linear_programming/pdlp_test.cu
Adds METIS partition export/import round-trip test and distributed-vs-base parity tests for multiple MPS instances.

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels: non-breaking, improvement

Suggested reviewers:

  • hlinsen
  • akifcorduk
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
cpp/src/pdlp/solve.cu (1)

769-784: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject distributed problem_t calls before any early return.

The new guard sits below the zero-constraint return and the FP32 fallback. With use_distributed_pdlp=true plus SinglePrecision, this path returns run_pdlp_solver_in_fp32(...) instead of raising the intended validation error, so an unsupported distributed configuration silently runs the single-GPU solver.

Suggested fix
 static optimization_problem_solution_t<i_t, f_t> run_pdlp_solver(
   detail::problem_t<i_t, f_t>& problem,
   pdlp_solver_settings_t<i_t, f_t> const& settings,
   const timer_t& timer,
   bool is_batch_mode)
 {
+  cuopt_expects(!settings.hyper_params.use_distributed_pdlp,
+                error_type_t::ValidationError,
+                "Distributed PDLP must be entered via solve_lp(mps_data_model, ...) "
+                "so the master GPU never materializes the full problem. Call sites "
+                "with a problem_t cannot dispatch to distributed mode.");
+
   detail::pdlp_graph_disabled_flag().store(settings.hyper_params.pdlp_disable_graph,
                                            std::memory_order_relaxed);
 
   if (problem.n_constraints == 0) {
     ...
   }
 `#if` PDLP_INSTANTIATE_FLOAT || CUOPT_INSTANTIATE_FLOAT
   if constexpr (std::is_same_v<f_t, double>) {
     if (settings.pdlp_precision == pdlp_precision_t::SinglePrecision) {
       return run_pdlp_solver_in_fp32(problem, settings, timer, is_batch_mode);
     }
   }
 `#endif`
-  cuopt_expects(!settings.hyper_params.use_distributed_pdlp, ...);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/pdlp/solve.cu` around lines 769 - 784, The distributed-mode
validation (cuopt_expects(!settings.hyper_params.use_distributed_pdlp, ...))
must be performed before any early returns so a distributed call cannot
accidentally take the FP32 fallback or zero-constraint path; move or duplicate
that check to occur before the SinglePrecision/FP32 branch and before the
zero-constraint return so that when settings.hyper_params.use_distributed_pdlp
is true (for problem_t inputs) the function immediately raises the
ValidationError rather than calling run_pdlp_solver_in_fp32 or returning early.
Ensure the check references the same validation message and
error_type_t::ValidationError used currently.
cpp/src/pdlp/pdlp.cu (1)

3063-3079: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

The distributed average path is still unsafe in release builds.

When multi_gpu_engine is present and never_restart_to_average is false, Line 3071 uses plain assert(false). In release builds that disappears, and the subsequent raft::copy writes primal_size_h_/dual_size_h_ elements into unscaled_*_avg_solution_, which were never resized for the distributed ctor. That turns this TODO into an invalid device-copy / wrong-result path instead of a clean runtime rejection.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/pdlp/pdlp.cu` around lines 3063 - 3079, The path that handles
multi-GPU (multi_gpu_engine) uses assert(false) which vanishes in release builds
and leads to invalid device copies into unscaled_primal_avg_solution_ /
unscaled_dual_avg_solution_; fix by replacing the assert with a deterministic
runtime guard: either resize/allocate unscaled_primal_avg_solution_ and
unscaled_dual_avg_solution_ to primal_size_h_ and dual_size_h_ (and
synchronize/validate device pointers) before calling raft::copy from
pdhg_solver_.get_primal_solution() / get_dual_solution(), or explicitly fail
early by logging and throwing a runtime_error when multi_gpu_engine is true so
the copy is never attempted; update the branch around
internal_solver_iterations_ <= 1 where multi_gpu_engine is checked to implement
one of these safe behaviors.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/cuopt_cli.cpp`:
- Around line 180-184: When lp_settings.hyper_params.use_distributed_pdlp is
true, guard the distributed PDLP call by checking that handle_ptr is non-null
before invoking cuopt::linear_programming::solve_lp(handle_ptr.get(), ...); if
handle_ptr is null, fail fast with a clear error (e.g., log and exit or throw)
rather than calling the distributed overload; update the branch that currently
chooses between solve_lp(handle_ptr.get(), mps_data_model, lp_settings) and
solve_lp(problem_interface.get(), lp_settings) to validate handle_ptr first and
only call the distributed overload when handle_ptr is valid.
- Around line 439-447: The code currently computes requested_gpus and then uses
std::min(...) to compute provisioned_gpus and
memory_resources.reserve(provisioned_gpus) without validating requested_gpus;
add explicit validation after computing requested_gpus (and after remapping -1
when use_distributed_pdlp is true) to ensure requested_gpus > 0 and that
raft::device_setter::get_device_count() > 0 before calling std::min or reserve.
If either value is non-positive, return/log an error or throw an exception
(consistent with surrounding error handling) referencing the parameters obtained
via settings.get_parameter<int>(CUOPT_NUM_GPUS) and
settings.get_parameter<int>(CUOPT_DISTRIBUTED_PDLP_NUM_GPUS) so the code never
calls memory_resources.reserve with a non-positive size.

In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 501-511: The mixed-precision branch still sizes and recreates FP32
matrices using op_problem_scaled.nnz which can differ per shard; update that
block to use the shard-local nnz values (e.g. static_cast<int64_t>(A_.size())
and static_cast<int64_t>(A_T_.size())) when allocating/sizing A_mixed_ and
A_T_mixed_ and when copying/transposing data for A_T.create / A.create so you
don't overrun A_T_ or leave stale nnz metadata; ensure any metadata fields set
during the FP32 recreate follow the shard-local sizes and that all
transforms/read ranges use those local sizes (A_, A_T_, A_mixed_, A_T_mixed_).

In `@cpp/src/pdlp/distributed_pdlp/partition_loader.cu`:
- Around line 77-87: Validate partition and CSR metadata before any
slicing/indexing: check that parts.size() >= nb_cstr + nb_vars before creating
cstr_parts/var_parts, ensure all entries in parts are within [0, nb_parts)
before using them to index rank_data_t<i_t,f_t>, and verify CSR arrays
(offsets/indices) have expected lengths (e.g., offsets.size() >= rows+1 and
indices.size() == nnz) before dereferencing in functions that build/iterate the
CSR (referencing variables parts, nb_cstr, nb_vars, rank_data_t, and the CSR
offset/index containers); use cuopt_expects (or the existing error path) to fail
early with clear messages when any check fails.

In `@cpp/src/pdlp/pdlp.cu`:
- Around line 821-825: The distributed gather of the current iterate is missing
on several return paths so master buffers can be stale; call the multi-GPU
gather before any return that serializes the current solution. Specifically,
ensure pdhg_solver_.get_mgpu_engine() and its method
gather_potential_next_solutions_to_master(pdhg_solver_,
current_termination_strategy_.get_convergence_information().get_reduced_cost())
is invoked centrally before any code that calls
fill_return_problem_solution(...), and add the same centralized gather call on
the other identified return sites (including the ConcurrentLimit and
PrimalFeasible/infeasibility exits referenced around lines ~859-863 and
~1541-1545) so the master full-size solution/reduced-cost buffers are populated
on every distributed return path.
- Around line 387-393: The distributed constructor pdlp_solver_t( problem_t<i_t,
f_t>& placeholder_problem, ... ) currently delegates to the regular ctor before
shard sizes exist, causing functions that use primal_size_h_/dual_size_h_ (e.g.,
set_initial_primal_solution, handling of initial_primal_solution and
initial_dual_solution and warm-start data) to operate on zero-length buffers;
update this constructor to either (a) validate and reject any initial-state
options (initial_primal_solution, initial_dual_solution, warm-start) up front
and return an error, or (b) defer all logic that applies initial iterates (calls
to set_initial_primal_solution / set_initial_dual_solution and warm-start
handling) until after shard construction when primal_size_h_ and dual_size_h_
are set, ensuring no modulo/divide-by-zero or zero-length copies occur.

In `@cpp/src/pdlp/solve.cu`:
- Around line 759-760: The global flag detail::pdlp_graph_disabled_flag() is
being mutated per-solve causing races; instead make the graph-disable decision
local to each solver instance and avoid writing the process-global flag from
solve entrypoints. Change callers that currently
store(settings.hyper_params.pdlp_disable_graph, ...) to pass the
pdlp_disable_graph boolean into the solver instance (or ctor) and have
ping_pong_graph_t::run() and related graph code read that instance-level flag
rather than detail::pdlp_graph_disabled_flag(); remove writes to the global flag
in solve functions so concurrent solves do not flip each other’s mode.
- Around line 2129-2134: The current overload erroneously hard-fails via
cuopt_expects when settings.hyper_params.use_distributed_pdlp is false and
always forwards to solve_lp_distributed_from_mps, removing the original
single-GPU/direct-MPS path; restore the prior behavior by replacing the
hard-fail with a branch: if settings.hyper_params.use_distributed_pdlp is true
call solve_lp_distributed_from_mps(handle_ptr, mps_data_model, settings,
problem_checking, use_pdlp_solver_mode) else call the non-distributed/MPS
entrypoint (the original direct-MPS function used previously—e.g.,
solve_lp_from_mps or the equivalent direct-MPS routine) so both paths are
supported, and keep or adjust cuopt_expects to validate only unsupported
parameter combinations if needed.
- Around line 2157-2205: solve_lp_distributed_from_mps builds
detail::pdlp_solver_t using settings_resolved but never applies settings.method
or calls set_pdlp_solver_mode, so requested PDLP modes/presets are ignored; fix
by checking settings_resolved.use_pdlp_solver_mode (and/or
settings_resolved.method) before constructing the solver and call
set_pdlp_solver_mode(settings_resolved) to map the preset/method into the solver
settings (or apply the mapping to settings_resolved) so the subsequent
detail::pdlp_solver_t(placeholder_problem, mps_data_model, settings_resolved) is
constructed with the intended PDLP mode.

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 188-191: The test currently sets distributed_pdlp_num_gpus = -1
which lets a single-GPU run bypass the multi-GPU/NCCL path; change the test to
first query the available GPU count and if fewer than 2 GPUs are present skip
the test, otherwise set pdlp_solver_settings_t::distributed_pdlp_num_gpus to at
least 2 (e.g., max(2, available_gpus)) before calling solve_lp(&handle, problem,
dist_settings) so the distributed PDLP path is actually exercised (use
pdlp_solver_settings_t, dist_settings, distributed_pdlp_num_gpus and solve_lp as
the loci to modify).
- Around line 248-252: The test pdlp_class::distributed_parity_square41 is
loading the wrong dataset; change the argument to
expect_distributed_matches_base in that test so it points to
"linear_programming/square41/square41.mps" instead of
"linear_programming/neos3/neos3.mps" so the regression covers the intended
square41 case (update the call site in the distributed_parity_square41 test that
invokes expect_distributed_matches_base).

---

Outside diff comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 3063-3079: The path that handles multi-GPU (multi_gpu_engine) uses
assert(false) which vanishes in release builds and leads to invalid device
copies into unscaled_primal_avg_solution_ / unscaled_dual_avg_solution_; fix by
replacing the assert with a deterministic runtime guard: either resize/allocate
unscaled_primal_avg_solution_ and unscaled_dual_avg_solution_ to primal_size_h_
and dual_size_h_ (and synchronize/validate device pointers) before calling
raft::copy from pdhg_solver_.get_primal_solution() / get_dual_solution(), or
explicitly fail early by logging and throwing a runtime_error when
multi_gpu_engine is true so the copy is never attempted; update the branch
around internal_solver_iterations_ <= 1 where multi_gpu_engine is checked to
implement one of these safe behaviors.

In `@cpp/src/pdlp/solve.cu`:
- Around line 769-784: The distributed-mode validation
(cuopt_expects(!settings.hyper_params.use_distributed_pdlp, ...)) must be
performed before any early returns so a distributed call cannot accidentally
take the FP32 fallback or zero-constraint path; move or duplicate that check to
occur before the SinglePrecision/FP32 branch and before the zero-constraint
return so that when settings.hyper_params.use_distributed_pdlp is true (for
problem_t inputs) the function immediately raises the ValidationError rather
than calling run_pdlp_solver_in_fp32 or returning early. Ensure the check
references the same validation message and error_type_t::ValidationError used
currently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7df2a4b9-585b-4517-afcb-1aa089ecb1c1

📥 Commits

Reviewing files that changed from the base of the PR and between d6d6f9e and 91b1ae5.

📒 Files selected for processing (41)
  • cpp/CMakeLists.txt
  • cpp/cmake/thirdparty/get_kaminpar.cmake
  • cpp/cuopt_cli.cpp
  • cpp/include/cuopt/linear_programming/constants.h
  • cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh
  • cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
  • cpp/src/math_optimization/solver_settings.cu
  • cpp/src/pdlp/CMakeLists.txt
  • cpp/src/pdlp/cusparse_view.cu
  • cpp/src/pdlp/distributed_pdlp/kaminpar_partitioner.cpp
  • cpp/src/pdlp/distributed_pdlp/kaminpar_partitioner.hpp
  • cpp/src/pdlp/distributed_pdlp/metis_partitioner.cu
  • cpp/src/pdlp/distributed_pdlp/metis_partitioner.hpp
  • cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.cu
  • cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.hpp
  • cpp/src/pdlp/distributed_pdlp/partition_loader.cu
  • cpp/src/pdlp/distributed_pdlp/partition_loader.hpp
  • cpp/src/pdlp/distributed_pdlp/partitioner.cu
  • cpp/src/pdlp/distributed_pdlp/partitioner.hpp
  • cpp/src/pdlp/distributed_pdlp/rank_data.hpp
  • cpp/src/pdlp/distributed_pdlp/shard.cu
  • cpp/src/pdlp/distributed_pdlp/shard.hpp
  • cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
  • cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cuh
  • cpp/src/pdlp/pdhg.cu
  • cpp/src/pdlp/pdhg.hpp
  • cpp/src/pdlp/pdlp.cu
  • cpp/src/pdlp/pdlp.cuh
  • cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
  • cpp/src/pdlp/saddle_point.cu
  • cpp/src/pdlp/solve.cu
  • cpp/src/pdlp/solve.cuh
  • cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
  • cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.hpp
  • cpp/src/pdlp/termination_strategy/convergence_information.cu
  • cpp/src/pdlp/termination_strategy/convergence_information.hpp
  • cpp/src/pdlp/termination_strategy/termination_strategy.cu
  • cpp/src/pdlp/termination_strategy/termination_strategy.hpp
  • cpp/src/pdlp/utilities/mgpu_trace.cuh
  • cpp/src/pdlp/utilities/ping_pong_graph.cuh
  • cpp/tests/linear_programming/pdlp_test.cu

Comment thread cpp/cuopt_cli.cpp
Comment thread cpp/cuopt_cli.cpp
Comment thread cpp/src/pdlp/cusparse_view.cu
Comment thread cpp/src/pdlp/distributed_pdlp/partition_loader.cu Outdated
Comment thread cpp/src/pdlp/pdlp.cu
Comment thread cpp/src/pdlp/solve.cu
Comment thread cpp/src/pdlp/solve.cu Outdated
Comment thread cpp/src/pdlp/solve.cu
Comment thread cpp/tests/linear_programming/pdlp_test.cu
Comment thread cpp/tests/linear_programming/pdlp_test.cu Outdated
@rgsl888prabhu rgsl888prabhu marked this pull request as draft June 8, 2026 15:27
@Bubullzz

Copy link
Copy Markdown
Contributor Author

/ok to test 818ffcd

@Bubullzz

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@Bubullzz

Copy link
Copy Markdown
Contributor Author

/ok to test f3b6343

@Bubullzz

Copy link
Copy Markdown
Contributor Author

/ok to test f3b6343

@copy-pr-bot

copy-pr-bot Bot commented Jun 12, 2026

Copy link
Copy Markdown

/ok to test f3b6343

@Bubullzz, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Do not merge if this flag is set

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEA] Multi GPU PDLP

1 participant