CuD-PDLP by Bubullzz · Pull Request #1391 · NVIDIA/cuopt

Bubullzz · 2026-06-04T15:24:15Z

Not review ready
Not merge ready

Just to let team have a look at it but definitely needs a big clean up
closes #891

…he cycle seems to be fixed, cuopt compiles

…olver !!!

+ style too

…k on main

compiles and runs

copy-pr-bot · 2026-06-04T15:24:19Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2026-06-04T15:50:48Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e69fca71-aa27-4195-bb47-de42ad9fd38c

📥 Commits

Reviewing files that changed from the base of the PR and between 91b1ae5 and f3b6343.

📒 Files selected for processing (10)

cpp/CMakeLists.txt
cpp/cmake/thirdparty/get_kaminpar.cmake
cpp/cuopt_cli.cpp
cpp/include/cuopt/linear_programming/constants.h
cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh
cpp/src/math_optimization/solver_settings.cu
cpp/src/pdlp/pdlp.cu
cpp/src/pdlp/solve.cu
cpp/src/pdlp/termination_strategy/termination_strategy.cu
cpp/tests/linear_programming/pdlp_test.cu

💤 Files with no reviewable changes (5)

cpp/src/pdlp/termination_strategy/termination_strategy.cu
cpp/src/math_optimization/solver_settings.cu
cpp/tests/linear_programming/pdlp_test.cu
cpp/src/pdlp/solve.cu
cpp/src/pdlp/pdlp.cu

🚧 Files skipped from review as they are similar to previous changes (4)

cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh
cpp/CMakeLists.txt
cpp/cmake/thirdparty/get_kaminpar.cmake
cpp/cuopt_cli.cpp

📝 Walkthrough

Walkthrough

Adds end-to-end distributed multi-GPU PDLP: CMake/third-party wiring, partitioner contracts and METIS/KaMinPar backends, partition file I/O and rank data, per-GPU shard types, a multi-GPU engine (halo/exchange/allreduce/distributed SpMV/scaling), PDHG/PDLP solver multi-GPU wiring and constructors, distributed scaling/refactoring, convergence/restart adaptations, and tests.

Changes

Distributed Multi-GPU PDLP

Layer / File(s)	Summary
Build system & dependency wiring `cpp/CMakeLists.txt`, `cpp/cmake/thirdparty/get_kaminpar.cmake`, `cpp/src/pdlp/CMakeLists.txt`	CMake locates NCCL/METIS, configures KaMinPar, and adds distributed PDLP sources and link targets.
Configuration & CLI routing `cpp/include/cuopt/linear_programming/constants.h`, `cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp`, `cpp/src/math_optimization/solver_settings.cu`, `cpp/cuopt_cli.cpp`	Adds distributed PDLP config keys, solver settings registration, and CLI branching + per-device RMM provisioning for provisioned GPU count.
Partitioner contracts & implementations `cpp/src/pdlp/distributed_pdlp/partitioner.hpp`, `partitioner.cu`, `metis_partitioner.`, `kaminpar_partitioner.`	Defines partitioner interface, factory, Dummy/METIS/KaMinPar backends with bipartite CSR conversion and validation.
Partition I/O & rank-data `cpp/src/pdlp/distributed_pdlp/partition_loader.*`, `rank_data.hpp`	Parse/export partition files and build per-rank rank_data with local CSR matrices, global↔local maps, and per-peer halo plans.
Shard type & construction `cpp/src/pdlp/distributed_pdlp/shard.hpp`, `shard.cu`	Non-copyable per-GPU shard owning device problem state, NCCL comm, cuSPARSE plans, pre-staged halo buffers, and scaling initialization.
Multi-GPU engine `cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.hpp`, `multi_gpu_engine.cu`	Engine orchestrates halo exchange, NCCL all-reduce, distributed L2 norm, distributed scaling (Ruiz/Pock–Chambolle), distributed SpMV, power-iteration σ_max, gather to master, and graph-capture sync.
Initial scaling refactor `cpp/src/pdlp/initial_scaling_strategy/*`	Splits Ruiz/Pock–Chambolle into compute/apply stages, exposes cumulative-scaling accessors/setters, and adds distributed rescaling application and skip flag.
PDHG multi-GPU wiring `cpp/src/pdlp/pdhg.hpp`, `pdhg.cu`	Adds mgpu_engine wiring, dispatch to distributed_spmv when present, spmv_*_into helpers, reflected projection transforms, and CUDA graph fork/join across shards.
PDLP distributed constructor & solver loop `cpp/src/pdlp/pdlp.cuh`, `pdlp.cu`	New constructor from MPS partitions problem, constructs engine/shards, performs distributed scaling and norm init, and adapts run loop and fixed-error/restart per-shard.
Entrypoints & graph disable `cpp/src/pdlp/solve.cuh`, `solve.cu`, `cpp/cuopt_cli.cpp`	Adds solve_lp_distributed_from_mps, routing checks, graph-disable gating, and CLI changes to enable distributed path.
Convergence & termination `cpp/src/pdlp/termination_strategy/*`	Adds per-shard objective partials, distributed residual norms, all-reduce aggregation, mutable getters, and gather-to-master return handling.
Adaptive step-size `cpp/src/pdlp/step_size_strategy/*`	Exposes mutable norm buffers and owned-prefix parameters for per-shard movement computations.
cuSPARSE descriptor binding fix `cpp/src/pdlp/cusparse_view.cu`	Bind CSR descriptor nnz to actual stored buffer lengths for shard-safety.
Tracing & graph gating `cpp/src/pdlp/utilities/mgpu_trace.cuh`, `ping_pong_graph.cuh`	Adds MGPU_TRACE macros and atomic graph-disable flag for debugging.
Tests `cpp/tests/linear_programming/pdlp_test.cu`	Adds METIS partition export/import round-trip test and distributed-vs-base parity tests for multiple MPS instances.

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels: non-breaking, improvement

Suggested reviewers:

hlinsen
akifcorduk

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

cpp/src/pdlp/solve.cu (1)

769-784: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject distributed problem_t calls before any early return.

The new guard sits below the zero-constraint return and the FP32 fallback. With use_distributed_pdlp=true plus SinglePrecision, this path returns run_pdlp_solver_in_fp32(...) instead of raising the intended validation error, so an unsupported distributed configuration silently runs the single-GPU solver.

Suggested fix

 static optimization_problem_solution_t<i_t, f_t> run_pdlp_solver(
   detail::problem_t<i_t, f_t>& problem,
   pdlp_solver_settings_t<i_t, f_t> const& settings,
   const timer_t& timer,
   bool is_batch_mode)
 {
+  cuopt_expects(!settings.hyper_params.use_distributed_pdlp,
+                error_type_t::ValidationError,
+                "Distributed PDLP must be entered via solve_lp(mps_data_model, ...) "
+                "so the master GPU never materializes the full problem. Call sites "
+                "with a problem_t cannot dispatch to distributed mode.");
+
   detail::pdlp_graph_disabled_flag().store(settings.hyper_params.pdlp_disable_graph,
                                            std::memory_order_relaxed);
 
   if (problem.n_constraints == 0) {
     ...
   }
 `#if` PDLP_INSTANTIATE_FLOAT || CUOPT_INSTANTIATE_FLOAT
   if constexpr (std::is_same_v<f_t, double>) {
     if (settings.pdlp_precision == pdlp_precision_t::SinglePrecision) {
       return run_pdlp_solver_in_fp32(problem, settings, timer, is_batch_mode);
     }
   }
 `#endif`
-  cuopt_expects(!settings.hyper_params.use_distributed_pdlp, ...);

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/pdlp/solve.cu` around lines 769 - 784, The distributed-mode
validation (cuopt_expects(!settings.hyper_params.use_distributed_pdlp, ...))
must be performed before any early returns so a distributed call cannot
accidentally take the FP32 fallback or zero-constraint path; move or duplicate
that check to occur before the SinglePrecision/FP32 branch and before the
zero-constraint return so that when settings.hyper_params.use_distributed_pdlp
is true (for problem_t inputs) the function immediately raises the
ValidationError rather than calling run_pdlp_solver_in_fp32 or returning early.
Ensure the check references the same validation message and
error_type_t::ValidationError used currently.

cpp/src/pdlp/pdlp.cu (1)

3063-3079: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

The distributed average path is still unsafe in release builds.

When multi_gpu_engine is present and never_restart_to_average is false, Line 3071 uses plain assert(false). In release builds that disappears, and the subsequent raft::copy writes primal_size_h_/dual_size_h_ elements into unscaled_*_avg_solution_, which were never resized for the distributed ctor. That turns this TODO into an invalid device-copy / wrong-result path instead of a clean runtime rejection.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cpp/src/pdlp/pdlp.cu` around lines 3063 - 3079, The path that handles
multi-GPU (multi_gpu_engine) uses assert(false) which vanishes in release builds
and leads to invalid device copies into unscaled_primal_avg_solution_ /
unscaled_dual_avg_solution_; fix by replacing the assert with a deterministic
runtime guard: either resize/allocate unscaled_primal_avg_solution_ and
unscaled_dual_avg_solution_ to primal_size_h_ and dual_size_h_ (and
synchronize/validate device pointers) before calling raft::copy from
pdhg_solver_.get_primal_solution() / get_dual_solution(), or explicitly fail
early by logging and throwing a runtime_error when multi_gpu_engine is true so
the copy is never attempted; update the branch around
internal_solver_iterations_ <= 1 where multi_gpu_engine is checked to implement
one of these safe behaviors.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cpp/cuopt_cli.cpp`:
- Around line 180-184: When lp_settings.hyper_params.use_distributed_pdlp is
true, guard the distributed PDLP call by checking that handle_ptr is non-null
before invoking cuopt::linear_programming::solve_lp(handle_ptr.get(), ...); if
handle_ptr is null, fail fast with a clear error (e.g., log and exit or throw)
rather than calling the distributed overload; update the branch that currently
chooses between solve_lp(handle_ptr.get(), mps_data_model, lp_settings) and
solve_lp(problem_interface.get(), lp_settings) to validate handle_ptr first and
only call the distributed overload when handle_ptr is valid.
- Around line 439-447: The code currently computes requested_gpus and then uses
std::min(...) to compute provisioned_gpus and
memory_resources.reserve(provisioned_gpus) without validating requested_gpus;
add explicit validation after computing requested_gpus (and after remapping -1
when use_distributed_pdlp is true) to ensure requested_gpus > 0 and that
raft::device_setter::get_device_count() > 0 before calling std::min or reserve.
If either value is non-positive, return/log an error or throw an exception
(consistent with surrounding error handling) referencing the parameters obtained
via settings.get_parameter<int>(CUOPT_NUM_GPUS) and
settings.get_parameter<int>(CUOPT_DISTRIBUTED_PDLP_NUM_GPUS) so the code never
calls memory_resources.reserve with a non-positive size.

In `@cpp/src/pdlp/cusparse_view.cu`:
- Around line 501-511: The mixed-precision branch still sizes and recreates FP32
matrices using op_problem_scaled.nnz which can differ per shard; update that
block to use the shard-local nnz values (e.g. static_cast<int64_t>(A_.size())
and static_cast<int64_t>(A_T_.size())) when allocating/sizing A_mixed_ and
A_T_mixed_ and when copying/transposing data for A_T.create / A.create so you
don't overrun A_T_ or leave stale nnz metadata; ensure any metadata fields set
during the FP32 recreate follow the shard-local sizes and that all
transforms/read ranges use those local sizes (A_, A_T_, A_mixed_, A_T_mixed_).

In `@cpp/src/pdlp/distributed_pdlp/partition_loader.cu`:
- Around line 77-87: Validate partition and CSR metadata before any
slicing/indexing: check that parts.size() >= nb_cstr + nb_vars before creating
cstr_parts/var_parts, ensure all entries in parts are within [0, nb_parts)
before using them to index rank_data_t<i_t,f_t>, and verify CSR arrays
(offsets/indices) have expected lengths (e.g., offsets.size() >= rows+1 and
indices.size() == nnz) before dereferencing in functions that build/iterate the
CSR (referencing variables parts, nb_cstr, nb_vars, rank_data_t, and the CSR
offset/index containers); use cuopt_expects (or the existing error path) to fail
early with clear messages when any check fails.

In `@cpp/src/pdlp/pdlp.cu`:
- Around line 821-825: The distributed gather of the current iterate is missing
on several return paths so master buffers can be stale; call the multi-GPU
gather before any return that serializes the current solution. Specifically,
ensure pdhg_solver_.get_mgpu_engine() and its method
gather_potential_next_solutions_to_master(pdhg_solver_,
current_termination_strategy_.get_convergence_information().get_reduced_cost())
is invoked centrally before any code that calls
fill_return_problem_solution(...), and add the same centralized gather call on
the other identified return sites (including the ConcurrentLimit and
PrimalFeasible/infeasibility exits referenced around lines ~859-863 and
~1541-1545) so the master full-size solution/reduced-cost buffers are populated
on every distributed return path.
- Around line 387-393: The distributed constructor pdlp_solver_t( problem_t<i_t,
f_t>& placeholder_problem, ... ) currently delegates to the regular ctor before
shard sizes exist, causing functions that use primal_size_h_/dual_size_h_ (e.g.,
set_initial_primal_solution, handling of initial_primal_solution and
initial_dual_solution and warm-start data) to operate on zero-length buffers;
update this constructor to either (a) validate and reject any initial-state
options (initial_primal_solution, initial_dual_solution, warm-start) up front
and return an error, or (b) defer all logic that applies initial iterates (calls
to set_initial_primal_solution / set_initial_dual_solution and warm-start
handling) until after shard construction when primal_size_h_ and dual_size_h_
are set, ensuring no modulo/divide-by-zero or zero-length copies occur.

In `@cpp/src/pdlp/solve.cu`:
- Around line 759-760: The global flag detail::pdlp_graph_disabled_flag() is
being mutated per-solve causing races; instead make the graph-disable decision
local to each solver instance and avoid writing the process-global flag from
solve entrypoints. Change callers that currently
store(settings.hyper_params.pdlp_disable_graph, ...) to pass the
pdlp_disable_graph boolean into the solver instance (or ctor) and have
ping_pong_graph_t::run() and related graph code read that instance-level flag
rather than detail::pdlp_graph_disabled_flag(); remove writes to the global flag
in solve functions so concurrent solves do not flip each other’s mode.
- Around line 2129-2134: The current overload erroneously hard-fails via
cuopt_expects when settings.hyper_params.use_distributed_pdlp is false and
always forwards to solve_lp_distributed_from_mps, removing the original
single-GPU/direct-MPS path; restore the prior behavior by replacing the
hard-fail with a branch: if settings.hyper_params.use_distributed_pdlp is true
call solve_lp_distributed_from_mps(handle_ptr, mps_data_model, settings,
problem_checking, use_pdlp_solver_mode) else call the non-distributed/MPS
entrypoint (the original direct-MPS function used previously—e.g.,
solve_lp_from_mps or the equivalent direct-MPS routine) so both paths are
supported, and keep or adjust cuopt_expects to validate only unsupported
parameter combinations if needed.
- Around line 2157-2205: solve_lp_distributed_from_mps builds
detail::pdlp_solver_t using settings_resolved but never applies settings.method
or calls set_pdlp_solver_mode, so requested PDLP modes/presets are ignored; fix
by checking settings_resolved.use_pdlp_solver_mode (and/or
settings_resolved.method) before constructing the solver and call
set_pdlp_solver_mode(settings_resolved) to map the preset/method into the solver
settings (or apply the mapping to settings_resolved) so the subsequent
detail::pdlp_solver_t(placeholder_problem, mps_data_model, settings_resolved) is
constructed with the intended PDLP mode.

In `@cpp/tests/linear_programming/pdlp_test.cu`:
- Around line 188-191: The test currently sets distributed_pdlp_num_gpus = -1
which lets a single-GPU run bypass the multi-GPU/NCCL path; change the test to
first query the available GPU count and if fewer than 2 GPUs are present skip
the test, otherwise set pdlp_solver_settings_t::distributed_pdlp_num_gpus to at
least 2 (e.g., max(2, available_gpus)) before calling solve_lp(&handle, problem,
dist_settings) so the distributed PDLP path is actually exercised (use
pdlp_solver_settings_t, dist_settings, distributed_pdlp_num_gpus and solve_lp as
the loci to modify).
- Around line 248-252: The test pdlp_class::distributed_parity_square41 is
loading the wrong dataset; change the argument to
expect_distributed_matches_base in that test so it points to
"linear_programming/square41/square41.mps" instead of
"linear_programming/neos3/neos3.mps" so the regression covers the intended
square41 case (update the call site in the distributed_parity_square41 test that
invokes expect_distributed_matches_base).

---

Outside diff comments:
In `@cpp/src/pdlp/pdlp.cu`:
- Around line 3063-3079: The path that handles multi-GPU (multi_gpu_engine) uses
assert(false) which vanishes in release builds and leads to invalid device
copies into unscaled_primal_avg_solution_ / unscaled_dual_avg_solution_; fix by
replacing the assert with a deterministic runtime guard: either resize/allocate
unscaled_primal_avg_solution_ and unscaled_dual_avg_solution_ to primal_size_h_
and dual_size_h_ (and synchronize/validate device pointers) before calling
raft::copy from pdhg_solver_.get_primal_solution() / get_dual_solution(), or
explicitly fail early by logging and throwing a runtime_error when
multi_gpu_engine is true so the copy is never attempted; update the branch
around internal_solver_iterations_ <= 1 where multi_gpu_engine is checked to
implement one of these safe behaviors.

In `@cpp/src/pdlp/solve.cu`:
- Around line 769-784: The distributed-mode validation
(cuopt_expects(!settings.hyper_params.use_distributed_pdlp, ...)) must be
performed before any early returns so a distributed call cannot accidentally
take the FP32 fallback or zero-constraint path; move or duplicate that check to
occur before the SinglePrecision/FP32 branch and before the zero-constraint
return so that when settings.hyper_params.use_distributed_pdlp is true (for
problem_t inputs) the function immediately raises the ValidationError rather
than calling run_pdlp_solver_in_fp32 or returning early. Ensure the check
references the same validation message and error_type_t::ValidationError used
currently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7df2a4b9-585b-4517-afcb-1aa089ecb1c1

📥 Commits

Reviewing files that changed from the base of the PR and between d6d6f9e and 91b1ae5.

📒 Files selected for processing (41)

cpp/CMakeLists.txt
cpp/cmake/thirdparty/get_kaminpar.cmake
cpp/cuopt_cli.cpp
cpp/include/cuopt/linear_programming/constants.h
cpp/include/cuopt/linear_programming/pdlp/pdlp_hyper_params.cuh
cpp/include/cuopt/linear_programming/pdlp/solver_settings.hpp
cpp/src/math_optimization/solver_settings.cu
cpp/src/pdlp/CMakeLists.txt
cpp/src/pdlp/cusparse_view.cu
cpp/src/pdlp/distributed_pdlp/kaminpar_partitioner.cpp
cpp/src/pdlp/distributed_pdlp/kaminpar_partitioner.hpp
cpp/src/pdlp/distributed_pdlp/metis_partitioner.cu
cpp/src/pdlp/distributed_pdlp/metis_partitioner.hpp
cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.cu
cpp/src/pdlp/distributed_pdlp/multi_gpu_engine.hpp
cpp/src/pdlp/distributed_pdlp/partition_loader.cu
cpp/src/pdlp/distributed_pdlp/partition_loader.hpp
cpp/src/pdlp/distributed_pdlp/partitioner.cu
cpp/src/pdlp/distributed_pdlp/partitioner.hpp
cpp/src/pdlp/distributed_pdlp/rank_data.hpp
cpp/src/pdlp/distributed_pdlp/shard.cu
cpp/src/pdlp/distributed_pdlp/shard.hpp
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cu
cpp/src/pdlp/initial_scaling_strategy/initial_scaling.cuh
cpp/src/pdlp/pdhg.cu
cpp/src/pdlp/pdhg.hpp
cpp/src/pdlp/pdlp.cu
cpp/src/pdlp/pdlp.cuh
cpp/src/pdlp/restart_strategy/pdlp_restart_strategy.cu
cpp/src/pdlp/saddle_point.cu
cpp/src/pdlp/solve.cu
cpp/src/pdlp/solve.cuh
cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.cu
cpp/src/pdlp/step_size_strategy/adaptive_step_size_strategy.hpp
cpp/src/pdlp/termination_strategy/convergence_information.cu
cpp/src/pdlp/termination_strategy/convergence_information.hpp
cpp/src/pdlp/termination_strategy/termination_strategy.cu
cpp/src/pdlp/termination_strategy/termination_strategy.hpp
cpp/src/pdlp/utilities/mgpu_trace.cuh
cpp/src/pdlp/utilities/ping_pong_graph.cuh
cpp/tests/linear_programming/pdlp_test.cu

…t::PDLP

Bubullzz · 2026-06-11T16:03:04Z

/ok to test 818ffcd

Bubullzz · 2026-06-12T13:48:03Z

@coderabbitai review

coderabbitai · 2026-06-12T13:48:12Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Bubullzz · 2026-06-12T14:02:42Z

/ok to test f3b6343

…et a 6x on construction time !! #devtechouquoilateam

Bubullzz · 2026-06-12T15:00:45Z

/ok to test f3b6343

copy-pr-bot · 2026-06-12T15:00:48Z

/ok to test f3b6343

@Bubullzz, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

Bubullzz added 30 commits May 7, 2026 15:07

first commit !! added multi_gpu_partition file to solver settings

1e0bd53

slowly skeletonning

978d17b

better shard.cuh

dd0c0ef

wip

2037eca

added a bit of skeleton. Forward declared pdlp_solver in shard.hpp, t…

0f62eff

…he cycle seems to be fixed, cuopt compiles

still wip but going well

d89c85a

cursor broke everything grrr

5534ff0

partition loader now partition loads

dd935c5

big advancements ayo ! We can soon start working on imlementing the s…

09eb20b

…olver !!!

added pre loop setup need to manage boxing

b5ebfd2

+ style too

added distributed transform

0965a60

added semicolon and existing runtime error enum

d4d1cab

added } and fixed cuot_expects in partition loader

6659dd9

small bug fixes

b2ed271

a version that compiles #heheha 😎😎😎😎

50d16ce

removed use of engine:transaform

359d9f4

added multi-gpu SpMV #heheha

910a49a

transformed a transform. it compiles hehe

76c0b3f

updated take step for distributed. compiles but doesnt run. will chec…

5ec7138

…k on main

Merge branch 'main' into cuD-PDLP

1f02afd

support spmvop on multi-gpu

de19f38

compile ready

0030a6c

can run now

172ebc2

passing all tests, good merge

23d0798

fixed the errors hihi, finished distributed part for compte_fixed_error

30881ce

style

c33faf2

now manage halpern update in multi-gpu pdlp

98e0ce6

small fix to calls of multi_gpu_engine_ and scale/unscale solutions.

84128bf

compiles and runs

comments

abe4dd2

added is multi gpu to pdhg

5c41497

Bubullzz added the do not merge Do not merge if this flag is set label Jun 4, 2026

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

rgsl888prabhu marked this pull request as draft June 8, 2026 15:27

Bubullzz added 8 commits June 10, 2026 22:52

Merge branch 'main' into cuD-PDLP

caea509

moved an expect for edge case from code rabbit

c6c5940

updated test

3488874

update comment for code rabbit

bc1f87e

added check to ensure distributed pdlp is only activated with method_…

b28f07d

…t::PDLP

reverted back solve_lp from mps for better handling

9ae23a0

small clearer comment for pdlp_disable_graph

818ffcd

updated gather of final solution positionning in the code

d3dad66

Bubullzz added 5 commits June 12, 2026 12:48

added include for compile

74c2d8f

kaminpar compile

cfdacd4

expect no initial or warm start

0de9609

clean error if handle is null

1d43e8a

better exept for never_restart_to_average

f3b6343

Bubullzz added 6 commits June 12, 2026 16:03

style

c658769

removed scaled problem before shard building, and optimized code to g…

38fffaf

…et a 6x on construction time !! #devtechouquoilateam

added prints to now rank data timings

eb08f11

also print shard time

0816778

kaminpar quiet

9aca029

read mps for the compile !!

873d167

Conversation

Bubullzz commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Jun 4, 2026

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Bubullzz commented Jun 11, 2026

Uh oh!

Bubullzz commented Jun 12, 2026

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bubullzz commented Jun 12, 2026

Uh oh!

Bubullzz commented Jun 12, 2026

Uh oh!

copy-pr-bot Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Bubullzz commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading