DecisionRulesExa.jl

GPU-accelerated training of parametric decision rules through multi-stage optimization, using ExaModels.jl and solvers like MadNLP.jl.

This package replicates the core functionality of DecisionRules.jl but replaces JuMP with ExaModels for the optimization backend. JuMP relies on MathOptInterface operations that are not differentiable on GPU and cannot exploit GPU-native sparse linear solvers. ExaModels provides a fully SIMD-compatible algebraic modeling layer whose parameters can be updated at runtime via ExaModels.set_parameter!, enabling efficient repeated solves on GPU with MadNLPGPU (CUDSS-backed interior point).

Motivation

In the Two-Stage Deep Decision Rule (TS-DDR) framework, a neural-network policy predicts target state trajectories that are projected onto the feasible set by solving a parametric NLP (the deterministic equivalent). Training uses the envelope theorem: dual multipliers on the target constraints give the policy gradient without differentiating through the solver.

The inner NLP solve is the bottleneck. By formulating it in ExaModels and solving with MadNLP + CUDSS on GPU, DecisionRulesExa.jl achieves significant speedups over the CPU JuMP-based workflow in DecisionRules.jl, especially for large-scale problems (AC-OPF, multi-stage hydro scheduling).

Installation

using Pkg
Pkg.add(url="https://github.com/LearningToOptimize/DecisionRulesExa.jl.git")

Quick start

using DecisionRulesExa
using ExaModels, MadNLP, Flux, Random

Random.seed!(1)

T  = 8   # horizon
nx = 1   # state dimension

# Build a simple linear-tracking NLP on CPU
prob = build_linear_tracking_problem(
    horizon       = T,
    nx            = nx,
    backend       = nothing,       # CPU (use CUDABackend() for GPU)
    slack_penalty = 10.0,
    u_bounds      = (-2.0, 2.0),
)

# LSTM policy: input = [w_t ; x_{t-1}], called once per stage
policy = StateConditionedPolicy(nx, nx, nx, [64, 64])

# Uncertainty sampler (returns flat vector of length T*nw)
sampler() = Float32.(0.1 .* randn(T * nx))

# Train with TS-DDR policy gradient
train_tsddr(
    policy,
    Float32.([1.0]),        # initial state
    prob,
    prob.p_x0,
    prob.p_target,
    prob.p_w,
    sampler;
    num_batches         = 100,
    num_train_per_batch = 4,
    optimizer           = Flux.Adam(1f-3),
    madnlp_kwargs       = (print_level = MadNLP.ERROR, tol = 1e-6),
)

For GPU, replace backend = nothing with backend = CUDABackend() and add linear_solver = CUDSSSolver to madnlp_kwargs.

What you need to provide

DecisionRulesExa.jl is model-first: you describe your multi-stage NLP in ExaModels, then the package handles simulation and training.

For a custom problem you need:

An ExaModels deterministic-equivalent NLP with parametric initial state, uncertainty trajectory, and target trajectory. Target constraints must be added last so their multipliers form a contiguous slice of result.multipliers.
An uncertainty sampler () -> w_flat returning a flat Float32/Float64 vector of length T * nw.
A Flux policy (LSTM or MLP) mapping (w_t, x_{t-1}) to target x_t at each stage.

The package provides build_deterministic_equivalent for generic problems and build_linear_tracking_problem as a ready-made demo. For domain-specific models (power systems, robotics), build the ExaModels NLP directly — see examples/HydroPowerModels/ for a complete AC-OPF example.

Parallel GPU solves

When training samples are independent, multiple NLP instances can be solved concurrently on the same GPU. Pass a problem_pool of independent ExaModels problem copies to train_tsddr:

pool = [(prob, prob.p_x0, prob.p_target, prob.p_w)]
for _ in 2:num_workers
    p = build_my_problem(...)
    push!(pool, (p, p.p_x0, p.p_target, p.p_w))
end

train_tsddr(policy, x0, prob, ..., sampler;
    problem_pool = pool,
    num_train_per_batch = num_workers,
)

Each pool entry gets its own MadNLP solver on a dedicated thread, with CUDA handles properly bound.

Optional Critic Control Variates

train_tsddr can optionally train a scalar critic C(w, xhat). The critic does not replace the deterministic-equivalent solve in the default :control_variate mode: solved target-constraint multipliers remain the actor's primary local sensitivity signal. Instead, the critic supplies a learned rollout-value guide and optional control variate.

For critic fitting, the preferred target is the stage-wise rollout objective via RolloutCriticTarget, with policy_state = :target by default. This matches the differentiable target recurrence used by the actor while evaluating the true stage-by-stage objective. Set policy_state = :realized to train the critic on closed-loop realized-state rollout labels. For ablations, use DeterministicEquivalentCriticTarget() or critic_training_target = :deterministic_equivalent to fit the deterministic-equivalent objective instead.

The default control_variate = NoCriticControlVariate() recovers the original dual-only behavior. A scalar critic can be attached with:

input_dim = length(x0) + 2 * T * nx
critic = Chain(
    Dense(input_dim => 128, tanh),
    Dense(128 => 128, tanh),
    Dense(128 => 1),
)

cv = ScalarCriticControlVariate(
    critic;
    featurizer = default_critic_featurizer,
    value_loss_weight = 1.0,
    gradient_loss_weight = 0.0,
)

critic_target = RolloutCriticTarget(
    stage_problem;
    horizon = T,
    n_uncertainty = nx,
    set_stage_parameters! = set_stage_parameters!,
    realized_state = realized_state,
    objective_no_target_penalty = objective_no_target_penalty,
    policy_state = :target,
    objective_value = :objective_no_target_penalty,
)

train_tsddr(
    policy, x0, prob, prob.p_x0, prob.p_target, prob.p_w, sampler;
    control_variate = cv,
    critic_training_target = critic_target,
    critic_rollout_samples_per_batch = 1,
    actor_gradient_mode = :control_variate,
    critic_cv_weight = 1.0,
    num_cheap_critic_samples_per_batch = 32,
    critic_updates_per_batch = 1,
    critic_optimizer = Flux.Adam(1f-3),
)

The critic loss combines value matching against the selected target objective and optional gradient matching against target multipliers:

value_loss_weight * mse(C(w, xhat), objective) +
gradient_loss_weight * mse(gradient(xhat -> C(w, xhat), xhat), lambda)

Set either weight to zero for objective-only or gradient-only critic training. For rollout targets, objective-only critic training is usually the clean default, because DE target multipliers are not exact gradients of the realized rollout objective. If objectives and multipliers have very different magnitudes, prefer a custom featurizer and tuned loss weights; the Hydro example normalizes volumes and inflows before critic evaluation.

Actor modes:

:control_variate: subtracts critic_cv_weight * gradient_xhat(C) from the solved dual signal and adds the critic actor gradient back on solved or cheap rollout samples. This is the recommended mode when dual multipliers are reliable. critic_cv_weight = 0.0 recovers dual-only updates.
:surrogate: uses a practical hybrid of solved dual gradients and critic actor gradients, controlled by dual_actor_weight and critic_actor_weight. This is useful when raw dual/subgradient signals are empirically noisy or unstable, but it is no longer a pure unbiased control-variate estimator.

num_cheap_critic_samples_per_batch draws additional uncertainty samples, rolls out the current policy, and evaluates the critic actor term without any extra MadNLP or ExaModels solve.

Rollout evaluation

RolloutEvaluation evaluates the policy in deployment semantics (stage-by-stage sequential solves) on held-out scenarios:

eval = RolloutEvaluation(
    stage_problem, x0, eval_scenarios;
    horizon              = T,
    n_uncertainty        = nw,
    set_stage_parameters! = my_stage_setter!,
    realized_state       = my_realized_state,
    stride               = 25,
    policy_state         = :realized,
)

Supports parallel evaluation across a pool of stage problems via stage_problem_pool, and dynamic scenario count via active_scenarios.

Relationship to DecisionRules.jl

DecisionRules.jl implements the same TS-DDR algorithm using JuMP + DiffOpt for CPU-based training. It supports stage-wise decomposition, multiple shooting, and integer strategies that are not yet ported here.

DecisionRulesExa.jl focuses on the deterministic-equivalent training mode with GPU acceleration. Choose this package when:

Your NLP is large enough that GPU acceleration matters (e.g., AC-OPF with hundreds of buses)
You need to run many training samples per gradient step
You want to leverage CUDA-native sparse solvers (CUDSS)

Choose DecisionRules.jl when:

You need stage-wise or multiple-shooting decomposition
Your problem is naturally expressed in JuMP
You need DiffOpt-based sensitivity computation
You want CPU-only deployment

Examples

examples/end_to_end_cpu.jl — minimal CPU demo with a linear tracking problem
examples/end_to_end_gpu.jl — same demo on GPU with CUDSS
examples/HydroPowerModels/ — full multi-stage hydrothermal scheduling with DC and AC OPF

Citation

If you use this package in academic work, please cite:

@article{rosemberg2024efficiently,
  title={Efficiently Training Deep-Learning Parametric Policies using Lagrangian Duality},
  author={Rosemberg, Andrew and Street, Alexandre and Vallad{\~a}o, Davi M and Van Hentenryck, Pascal},
  journal={arXiv preprint arXiv:2405.14973},
  year={2024}
}

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
examples		examples
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DecisionRulesExa.jl

Motivation

Installation

Quick start

What you need to provide

Parallel GPU solves

Optional Critic Control Variates

Rollout evaluation

Relationship to DecisionRules.jl

Examples

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

DecisionRulesExa.jl

Motivation

Installation

Quick start

What you need to provide

Parallel GPU solves

Optional Critic Control Variates

Rollout evaluation

Relationship to DecisionRules.jl

Examples

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages