Skip to content

feat(pomdps): POMDPs.jl integration via package extension + pendulum demo#59

Open
jamgochiana wants to merge 5 commits into
arec/parametric-types-and-staticarraysfrom
arec/pomdps-interface
Open

feat(pomdps): POMDPs.jl integration via package extension + pendulum demo#59
jamgochiana wants to merge 5 commits into
arec/parametric-types-and-staticarraysfrom
arec/pomdps-interface

Conversation

@jamgochiana

@jamgochiana jamgochiana commented May 26, 2026

Copy link
Copy Markdown
Collaborator

Closes #38

Adds a POMDPs.jl package extension (weak dependency, requires Julia 1.9+) that wires GaussianFilters' AbstractFilter into the POMDPs belief-updater interface.

Extension surface (ext/GaussianFiltersPOMDPsExt.jl):

  • POMDPs.update(filter, b, a, o) dispatches directly on AbstractFilter — convenient for simple integrations.
  • POMDPs.initialize_belief accepts either a GaussianBelief (identity) or any AbstractMvNormal (extracts mean and cov).
  • GaussianFilterUpdater <: POMDPs.Updater wrapper for simulators that dispatch on the abstract type (HistoryRecorder, RolloutSimulator, etc).
  • pomdps_updater(filter) builder, exported as a stub from the main package and given a real implementation in the extension.

Users who do not have POMDPs.jl loaded pay zero overhead — the extension only activates when both packages are present.

Two example scripts:

  • examples/pomdps_integration.jl — minimal demonstration of using a KalmanFilter through POMDPs.update.
  • examples/pendulum_ekf_ilqr.jl — closed-loop stabilization of a noisy inverted pendulum observed only through its angular velocity (gyroscope-style). Defines a PendulumPOMDP (with SVector state/action/observation, exercising the StaticArrays support from refactor: parametric filter and model types + StaticArrays test suite #58), an ILQRPolicy <: POMDPs.Policy that runs iterative LQR on the belief mean (certainty-equivalent control), wraps the EKF with pomdps_updater, and drives the whole closed loop through POMDPs.simulate via HistoryRecorder.

The iLQR implementation includes backward pass with Levenberg-Marquardt regularization, backtracking line search, and actuator saturation — about 80 lines. The pendulum dynamics are pulled from POMDPModels.InvertedPendulum (parameters) but re-expressed generically so ForwardDiff dual numbers can pass through them (the original euler signature pins state to Tuple{Float64,Float64}).

The example produces an animation of the closed-loop trajectory with belief mean ± 2σ ribbons on both θ and ω.

Bumps julia compat to 1.9 to enable [weakdeps] and [extensions]. Adds POMDPs to test deps and adds a test/test_pomdps.jl regression covering both the direct-dispatch and GaussianFilterUpdater-wrapper paths.

@jamgochiana

Copy link
Copy Markdown
Collaborator Author

Gif generated from running examples/pendulum_ekf_ilqr.jl, which combines an iLQR policy with an EKF GaussianFilterUpdater

pendulum_ekf_ilqr

Comment on lines +245 to +258
dmodel = NonlinearDynamicsModel(pendulum_step, W)
omodel = NonlinearObservationModel(observe_omega, V)
ekf = ExtendedKalmanFilter(dmodel, omodel)
updater = pomdps_updater(ekf) # wraps the EKF as a POMDPs.Updater
pomdp = PendulumPOMDP()
policy = ILQRPolicy()

# ---------------------------------------------------------------------
# Run it via POMDPs.simulate
# ---------------------------------------------------------------------

rng = MersenneTwister(0)
hr = HistoryRecorder(rng = rng, max_steps = 60)
hist = simulate(hr, pomdp, policy, updater)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example usage with POMDPs.simulate

@zsunberg

Copy link
Copy Markdown
Member

Awesome 😎

Add a weak dependency on POMDPs.jl and a package extension
(GaussianFiltersPOMDPsExt) that wires our AbstractFilter into the
POMDPs.jl belief-updater interface:

- POMDPs.update(filter, b, a, o) dispatches to GaussianFilters.update
- POMDPs.initialize_belief(filter, b::GaussianBelief) is identity

Users who do not have POMDPs.jl loaded pay zero overhead — the
extension only activates when both packages are present (requires
Julia 1.9+).

Bumps julia compat to 1.9 to enable [weakdeps] and [extensions].

Includes:
- examples/pomdps_integration.jl demonstrating the extension end-to-end
- test/test_pomdps.jl regression test that POMDPs.update produces the
  same belief as GaussianFilters.update

Closes #38
… demo

Extension additions:
- initialize_belief(::AbstractFilter, ::AbstractMvNormal) extracts mean
  and cov and returns a GaussianBelief. Lets callers initialize from any
  multivariate normal distribution (POMDPs problems often expose initial
  state as a Distribution).

End-to-end example (examples/pendulum_ekf_ilqr.jl):
- Closed-loop stabilization of POMDPModels.InvertedPendulum from a small
  perturbation (theta0 = 0.3 rad).
- Partial observation: angle only; angular velocity must be inferred.
- An ExtendedKalmanFilter doubles as the POMDPs.Updater via the
  extension installed in d03083c.
- A minimal certainty-equivalent iLQR (~80 LoC, LQR backward pass over
  the dynamics linearized at the nominal trajectory) plans torques from
  belief.μ. iLQR ignores belief.Sigma per the standard certainty-
  equivalent control split.
- Empirically reaches theta_f ~ 0 from theta0 = 0.3 within 6 seconds.

The pendulum dynamics are re-expressed generically rather than calling
POMDPModels.euler directly, because the latter pins state to
Tuple{Float64,Float64} which blocks ForwardDiff dual numbers used both
inside the EKF and inside the iLQR linearization.
Extension surface (ext/GaussianFiltersPOMDPsExt.jl):
- New GaussianFilterUpdater <: POMDPs.Updater wrapping any AbstractFilter
  so it can be passed to simulators that dispatch on the Updater type
  (HistoryRecorder, RolloutSimulator, etc).
- pomdps_updater(filter) construction helper, exported as a stub from
  the main package and given a real implementation in the extension.
- Both the wrapper and the direct AbstractFilter dispatch are kept; the
  direct form is convenient for simple calls, the wrapper is required
  for POMDPs.simulate machinery.

Pendulum example refactored to a proper POMDPs.jl program:
- New PendulumPOMDP <: POMDP{SVector{2,Float64}, SVector{1,Float64},
  SVector{1,Float64}} — continuous-state partial-observation POMDP that
  also exercises the StaticArrays support added in the previous layer.
- Custom ILQRPolicy <: POMDPs.Policy holding only iLQR hyperparameters
  and warm-start state (no belief, no EKF — those live elsewhere).
- POMDPs.action(p::ILQRPolicy, b::GaussianBelief) is a one-liner that
  runs ilqr on belief.mu and shifts the warm start.
- iLQR upgraded from a single backward pass to multiple iterations with
  backtracking line search and Levenberg-Marquardt regularization;
  actuator saturation included. Empirically holds the pendulum within
  ~0.07 rad of upright at steady state from a 0.6 rad initial tilt.
- POMDPs.simulate (via HistoryRecorder) drives the closed loop instead
  of a hand-rolled control loop.

Animation:
- Writes examples/outputs/pendulum_ekf_ilqr.gif (gitignored).
- Disable with GENERATE_GIF=false in the environment.

Tests: add pomdps_updater wrapper coverage to test_pomdps.jl (238/238).
Change the pendulum observation model from angle (θ) to angular
velocity (ω) — gyroscope-style. This makes the demo a genuine hidden-
state inference problem: the angle is never directly observed, the
controller plans on belief.μ[1] which is reconstructed by the EKF
from successive ω measurements.

Empirically σ_θ shrinks from ~0.22 to ~0.05 over the first ~20 steps
while σ_ω stays small throughout (it's the observed dimension). The
new 3-panel animation makes this visible:
- left: pendulum rod (true vs belief μ)
- top right: θ trajectory with belief μ ± 2σ ribbon
- bottom right: ω trajectory with belief μ ± 2σ ribbon

Layout tweaks: right-aligned ylabels and explicit bottom margin so
the x-axis label is not clipped.
…certain)

Change the prior from N([0.6, 0.0], diag(0.05, 0.1)) to
N([0.0, 0.0], diag(0.5², 0.1²)). The new prior is mean-upright with
σ_θ = 0.5 rad. Effects:

- The true initial state is drawn from the prior, so the angle varies
  meaningfully across seeds.
- The initial belief mean is 0 (upright), but the truth typically is
  not — the belief is wrong by ~0.3–0.7 rad initially. The first few
  control actions are based on an incorrect angle estimate; the
  filter then converges on the true angle via ω observations and the
  controller catches up.
- Empirically σ_θ drops 10× over the run (0.50 → 0.05). The animation
  ribbon visibly narrows over time, making the filter convergence the
  central visual story.
@jamgochiana jamgochiana force-pushed the arec/parametric-types-and-staticarrays branch from b3bf81b to fbac0ec Compare May 27, 2026 20:24
@jamgochiana jamgochiana force-pushed the arec/pomdps-interface branch from 3d7473a to d42dff7 Compare May 27, 2026 20:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants