Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,4 +51,7 @@ examples/**/.CondaPkg/*
*.err
*.tsv
*.pdf
plan.md
plan/
*_cuts.json
settings.json
*.sh
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
MathOptInterface = "b8f27783-ece8-5eb3-8dc8-9495eed66fee"
ParametricOptInterface = "0ce4ce61-57bf-432b-a095-efac525d185e"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"

[compat]
Expand All @@ -35,6 +36,7 @@ MadNLP = "0.8, 0.9, 0.10"
MadNLPGPU = "0.7, 0.8, 0.9, 0.10"
MathOptInterface = "1.48.0"
ParametricOptInterface = "0.14.1, 0.15, 0.16"
Statistics = "1.10, 1.11"
Zygote = "0.6.77, 0.7"
julia = "1.10, 1.11, 1.12"

Expand Down
14 changes: 13 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ DecisionRules.jl implements this workflow in three flavors:

```julia
using Pkg
Pkg.add(url="https://github.com/LearningToOptimize/DecisionRules.jl.git")
Pkg.add("DecisionRules")
```

## What you need to provide
Expand Down Expand Up @@ -202,6 +202,18 @@ Each evaluation reports (a) the rollout objective **excluding** the target-slack

Per-sample debugging hooks can be attached with `SampleLog(on_sample=(s, models, log) -> ...)`; the training loop calls the hook after each sample's solve with the live JuMP model(s). The previous `record_loss=(iter, model, loss, tag) -> ...` keyword keeps working as a deprecated adapter.

## GPU acceleration with DecisionRulesExa.jl

For large-scale problems where the inner NLP solve is the bottleneck (e.g., AC-OPF with hundreds of buses), [DecisionRulesExa.jl](https://github.com/LearningToOptimize/DecisionRulesExa.jl) provides a GPU-accelerated backend that replaces JuMP with [ExaModels.jl](https://github.com/exanauts/ExaModels.jl) and solves with [MadNLP.jl](https://github.com/MadNLP/MadNLP.jl) + CUDSS on GPU.

DecisionRulesExa.jl implements the same TS-DDR algorithm (deterministic-equivalent mode) with the same envelope-theorem gradient computation but formulates the NLP in ExaModels' SIMD-compatible modeling layer. This enables:

- **GPU-native interior-point solves** via MadNLP + CUDSS
- **Parallel GPU solves** for multiple training samples per gradient step
- **Runtime parameter updates** via `ExaModels.set_parameter!` (no model reconstruction)

See the [GPU Acceleration](https://LearningToOptimize.github.io/DecisionRules.jl/dev/gpu_acceleration/) page in the documentation for a tutorial on getting started with DecisionRulesExa.jl.

## Examples and tests

Examples live in `examples/`. Run tests with:
Expand Down
3 changes: 3 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,12 @@ DecisionRules = "47937410-f832-486f-8300-12c95b225dfc"
DiffOpt = "930fe3bc-9c6b-11ea-2d94-6184641e85e7"
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
Flux = "587475ba-b771-5e3f-ad9e-33799f191a9c"
Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
HiGHS = "87dc4568-4c63-4d18-b0c0-bb2238e4078b"
Ipopt = "b6b21f68-93f8-5de0-b562-5493be1d77c9"
JuMP = "4076af6c-e467-56ae-b986-b466b2749572"
Literate = "98b081ad-f1c9-55d3-8b20-4c87d4299306"
MathOptInterface = "b8f27783-ece8-5eb3-8dc8-9495eed66fee"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"

Expand Down
3 changes: 3 additions & 0 deletions docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ makedocs(;
pages=[
"Home" => "index.md",
"Algorithm" => "algorithm.md",
"Gradient Fallback" => "gradient_fallback.md",
"Uncertainty Sampling" => "sampling.md",
"GPU Acceleration" => "gpu_acceleration.md",
"Examples" => [
"Hydropower Scheduling" => "examples/hydro.md",
"Rocket Control" => "examples/rocket.md",
Expand Down
80 changes: 80 additions & 0 deletions docs/src/algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,86 @@ for k = 1, ..., ⌈T/W⌉:
**Pros**: balances coupling (within windows) with tractability; parallelizable windows.
**Cons**: continuity gaps between windows require penalty tuning.

## Mixed gradient: score-function (REINFORCE) correction

For problems with integer variables or non-smooth subproblems, the dual
gradient can be biased — it is local to a fixed integer assignment and cannot
see the effect of discrete switches (e.g., opening a setup variable).

DecisionRules provides a **score-function (REINFORCE)** correction that mixes
the dual gradient with a model-free policy gradient estimated from stage-wise
rollouts under perturbed targets.

### How the score-function estimator works

1. **Perturb**: add Gaussian noise to the policy targets:
``\tilde{x}_t = \hat{x}_t(\theta) + \delta_t``, where
``\delta_t \sim \mathcal{N}(0, \sigma^2 I)``.

2. **Rollout**: solve the stage-wise subproblems with the perturbed targets to
obtain realized costs ``R_m`` for ``m = 1, \ldots, M`` rollouts. These
rollouts solve the models exactly as built (MIPs stay MIPs), so the costs
reflect true integer-feasible decisions.

3. **Advantage**: center the costs ``A_m = R_m - \bar{R}`` (mean baseline
reduces variance without changing the expected gradient).

4. **Surrogate loss**: the differentiable scalar whose gradient recovers the
REINFORCE estimate:

```math
L_{\text{sf}}(\theta)
\;=\;
\frac{1}{M} \sum_{m=1}^{M}
A_m
\sum_{t=1}^{T}
\left\langle
\frac{\delta_{m,t}}{\sigma^2},\;
\hat{x}_{t+1}(\theta)
\right\rangle.
```

This is the standard score-function estimator for Gaussian perturbations.
The key identity is
``\nabla_\theta \log p(\delta_t \mid \theta) = \delta_t / \sigma^2``
for a Gaussian centered at ``\hat{x}_t(\theta)``.

### Mixed gradient

The final training gradient combines both signals:

```math
\nabla L
\;=\;
\alpha\, \nabla L_{\text{dual}}
+ (1 - \alpha)\, \nabla L_{\text{sf}},
```

where ``\alpha \in [0, 1]`` is the `dual_weight`.

There are two separate solve paths in the mixed-gradient training loop:

- **Dual path**: controlled by `integer_strategy`, which determines how local
dual information is read from the deterministic equivalent
(e.g., [`FixedDiscreteIntegerStrategy`](@ref) solves the MIP, fixes integers,
re-solves the LP, and reads LP duals).
- **Score-function path**: controlled by [`ScoreFunctionConfig`](@ref), which
owns separate rollout subproblems. These are solved exactly as built, and
their realized costs define the Monte Carlo score-function term.

### Scheduled ramp-in

A [`ScoreFunctionSchedule`](@ref) can ramp ``\alpha`` from 1 (pure dual) to
its final value over a warmup period. Let ``k`` be the current iteration and
``\rho_k = \operatorname{clip}((k - k_0) / r,\, 0,\, 1)``. The effective
score-function weight is ``\rho_k (1 - \alpha)``.

This lets the DE dual gradient establish a good initial policy before
introducing the higher-variance REINFORCE signal.

See the [Stochastic Lot-Sizing with Fixed Ordering Costs](@ref) example for a
complete worked example with integer variables and mixed gradients.

## Penalty annealing

The target penalty ``\lambda`` is critical: too small and the optimizer ignores
Expand Down
1 change: 1 addition & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,4 +16,5 @@ Private = false
```@autodocs
Modules = [DecisionRules]
Public = false
Filter = t -> t != DecisionRules
```
Binary file added docs/src/assets/hydro_generation_comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/assets/hydro_volume_comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/src/assets/inventory_integer_results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/src/assets/inventory_relaxed_results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading