Skip to content

feat(ggm): log Z(G) closed-form primitives for hierarchical-spec MH#109

Open
MaartenMarsman wants to merge 15 commits into
mainfrom
feat/log-z-nlo-gamma
Open

feat(ggm): log Z(G) closed-form primitives for hierarchical-spec MH#109
MaartenMarsman wants to merge 15 commits into
mainfrom
feat/log-z-nlo-gamma

Conversation

@MaartenMarsman
Copy link
Copy Markdown
Collaborator

Summary

Phase 1a of the hierarchical-spec GGM work (Stage 3 of the DEGORD + Russian-Roulette plan). Ports the closed-form log Z(G) approximations from the companion z project into bgms as standalone numerics primitives, with bit-exact parity to the validated z reference. No model wiring yet — that's Phases 3–4.

This builds on the determinant-tilt prior (#106) and the auto-default tilt (#107). It is independent of #108 (joint-spec sample_ggm_prior) and can land in any order.

What's new

  • src/models/ggm/log_z_nlo.{h,cpp}:

    • log_Z_NLO_gamma(G, alpha, beta, sigma, include_F, delta) — general-α Laplace + NLO closed form for the spikeslab + |K|^δ tilt GGM normaliser. Direct port of branchB_chain.cpp:470-620 in ~/SV/Z/R/src/.
    • log_Z_NLO_gamma_degord(G, i, j, ...) — DEGORD reordering wrapper that permutes toggle endpoints (i, j) to positions (0, 1) before evaluation. The closed-form is not permutation-invariant; MH ratios must use a consistent reordering.
    • log_Z_NLO_gamma_delta_incr_alpha1(G_before, i, j, ...) — O(deg(i)² + deg(i)·q) incremental log-Z difference under a single-edge toggle at α = 1, exploiting the §4.6 locality decomposition (only vertex min(i, j) contributes).
  • src/log_z_test_interface.cpp — Rcpp exports for unit testing.

  • tests/testthat/test-log-z-nlo.R — 437 assertions covering:

    • Bit-exact parity (max |Δ| = 0) against the z reference on a 108-cell grid (9 random graphs × q ∈ {3, 5, 10} × α ∈ {1, 2} × δ ∈ {0, 0.5, 1} × include_F ∈ {FALSE, TRUE}).
    • α = 1 incremental matches full-recompute difference at machine precision across 21 toggles × 3 δ × 2 include_F at q ≤ 7.
    • DEGORD wrapper agrees with hand-permuted full-recompute.
    • Empty-graph closed form matches the analytic constant.
  • tests/testthat/fixtures/log_z_nlo_reference.rds — z-reference fixture; regenerate with dev/numerical_analyses/generate_log_z_fixture.R.

Test plan

  • Rscript -e 'testthat::test_file(\"tests/testthat/test-log-z-nlo.R\")' — 437 PASS, 0 FAIL.
  • CI green on the standard matrix (R-CMD-check, lint, pkgdown, test-coverage).

Stage 3 Phase 1a of dev/plans/backlog/hierarchical-ggm-degord-rr.md.
Ports the closed-form log Z(G) approximations from the z-project (Route 3a
+ DEGORD pilot) into bgms as numerics primitives, ready for the
hierarchical-spec MH wiring in Phases 3-4.

- src/models/ggm/log_z_nlo.{h,cpp}:
  - log_Z_NLO_gamma(G, alpha, beta, sigma, include_F, delta): general-alpha
    Laplace + NLO closed form for the spikeslab-+tilt GGM normaliser.
  - log_Z_NLO_gamma_degord(G, i, j, ...): toggle-endpoint reordering wrapper
    (permutes (i, j) to positions (0, 1)).
  - log_Z_NLO_gamma_delta_incr_alpha1(G_before, i, j, ...): O(deg^2 + deg*q)
    incremental log-Z difference under a single-edge toggle at alpha = 1.

- src/log_z_test_interface.cpp: Rcpp exports for unit testing.

- tests/testthat/test-log-z-nlo.R: 437 assertions covering parity against
  the z reference (bit-exact, max abs diff = 0 across 108 grid cells),
  alpha = 1 incremental vs full-recompute (machine-precision), DEGORD
  permutation consistency, and analytic empty-graph constant.

- tests/testthat/fixtures/log_z_nlo_reference.rds: z-reference fixture
  (regeneratable via dev/numerical_analyses/generate_log_z_fixture.R).
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 94.96700% with 61 lines in your changes missing coverage. Please review.
✅ Project coverage is 87.77%. Comparing base (9480d46) to head (4cf5201).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/log_z_test_interface.cpp 82.17% 23 Missing ⚠️
src/models/ggm/z_ratio_estimator.cpp 91.59% 10 Missing ⚠️
R/bgms_posterior_mean.R 80.85% 9 Missing ⚠️
src/models/ggm/ggm_model.cpp 94.89% 7 Missing ⚠️
R/bgm_spec.R 89.13% 5 Missing ⚠️
src/models/base_model.h 25.00% 3 Missing ⚠️
src/models/ggm/log_z_nlo.cpp 99.44% 2 Missing ⚠️
src/models/ggm/degord_sampler.cpp 99.69% 1 Missing ⚠️
src/sample_ggm.cpp 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #109      +/-   ##
==========================================
+ Coverage   87.47%   87.77%   +0.29%     
==========================================
  Files          77       83       +6     
  Lines       13447    14712    +1265     
==========================================
+ Hits        11763    12913    +1150     
- Misses       1684     1799     +115     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Stage 3 Phase 1b of dev/plans/backlog/hierarchical-ggm-degord-rr.md.

At alpha > 1, H_e gains a -4 beta (alpha - 1) / M_v[v] piece that depends
on the larger endpoint, so a single-edge toggle (i, j) cascades H_e
changes to every edge incident to i (forward AND backward). The alpha = 1
locality decomposition (vertex_i_contribs only) no longer suffices.

New approach: partition log_Z_NLO_gamma into "instances touching
V_a = {i, j} ∪ N_G_before(i)" vs "instances not touching V_a", with the
latter invariant under the toggle and cancelling in the difference.

- src/models/ggm/log_z_nlo.cpp:
  - partial_log_Z_NLO_gamma_on_V (private): full-recompute structure with
    an any-index-in-V_a filter on every term type (log_C0 per-edge and
    per-vertex, log_LO, B_e, C_{e,k}, D_v, E_{lm}+F, N_v, M_e). M_v and
    H_e are still computed on the full graph; the filter only decides
    inclusion in the partial sum.
  - log_Z_NLO_gamma_delta_incr_alphaN (public): partial(G_after, V_a) -
    partial(G_before, V_a). Reduces to log_Z_NLO_gamma_delta_incr_alpha1
    at alpha = 1 to machine precision.

- tests/testthat/test-log-z-nlo.R: 816 new assertions across
  q in {3, 5, 7}, alpha in {1.5, 2, 3}, delta in {0, 0.5, 1},
  include_F in {FALSE, TRUE}, all toggles. Two new test_that blocks:
  - alphaN vs full-recompute difference at alpha > 1 (within 1e-10).
  - alphaN(alpha = 1) vs the existing alpha = 1 incremental (within 1e-10).
  All 1253 assertions in the file pass at machine precision.

Cost is O(q^2) per toggle: the outer loops are still over all edges
and non-edges (filter, not loop-bound, restricts inclusion). The plan's
O(deg^2 + deg * q) optimisation is a future loop-bound restructuring
that can be tested against this implementation.
Replaces the O(q^2) any-index-in-V_a filter with an O(deg^2 + deg * q)
canonical-representative enumeration. Each qualifying instance (edge,
triple, non-edge, vertex term) is now walked exactly once via its
lowest-index V_a member, eliminating the O(q^2) and O(q^3) scans on the
heavy term types.

Strategy:
- Precompute forward/backward adjacency lists in O(|E|), reusing them
  for nu, H_e, T, and S sums.
- Edge-indexed terms (log_LO, B_e, M_e): for each p in V_a, walk
  fwd[p] and bwd[p] (with !in_V[l] gate for the larger-endpoint case).
- Vertex-indexed terms (lgamma, D_v, N_v): direct loop over V_a_idx.
- Triple-indexed C_{(a, b), k}: three cases for canon = p, k=p, a=p, b=p
  with gates k not in V_a (case 2) and a, k not in V_a (case 3).
- Non-edge E_{lm}: for each p in V_a, walk (p, m) and (l, p) pairs with
  the !in_V[l] gate for the larger-endpoint case; common-pred sum walks
  bwd[l] and tests G(k, m).

Measured speedup over the full-recompute difference baseline on a
sparse 20%-density graph (100 random toggles, alpha=2, delta=0.5):
- q= 20: incremental ~ full-diff (parity; constant-factor regime).
- q= 50: 1.2x.
- q=100: 2.9x.

All 1253 assertions in tests/testthat/test-log-z-nlo.R continue to pass
at machine precision (Phase 1a fixture, Phase 1b parity, alpha=1
reduction).
…e 2)

Stage 3 Phase 2 of dev/plans/backlog/hierarchical-ggm-degord-rr.md.
Ports the inner Zhat machinery from ~/SV/Z/R/src/degord_sampler.h (v4,
2026-05-18) into bgms, hooked to SafeRNG.

New files
---------
- src/models/ggm/degord_sampler.{h,cpp}:
  - ChainAux (per-chain (alpha, beta, sigma, delta) constants +
    per-nu transcendental tables).
  - PiAux (per-permutation: G_pi, nu_pi, E_count, log_C0).
  - permute_graph / degord_permutation (toggle endpoints to (q-2, q-1)).
  - phi_pi_sample_from_noise (inner Bartlett-Cholesky kernel).
  - log_Zhat_pi_from_pool (main entry; log-sum-exp over importance weights).
  - log_Zhat_pi_from_pool_cache + PoolCache (cached variant for
    delta-toggle reuse).
  - row_qm2_logw_from_S (single-row recompute under trailing toggle).
  - delta_log_Zhat_pi_toggle (efficient delta with cache reuse).
  - draw_bartlett_pool (SafeRNG-based standard-normal pool).

Two bug fixes folded in vs the z-project v4 reference
-----------------------------------------------------
Both fixes were derived during the bgms port audit and confirmed correct
on the z side; they are bit-verified and land here with regression
assertions:

1. Cache rw_head fix (Option 2). The v4 rw_head spanned only rows
   0..q-3, so the star aggregation in delta_log_Zhat_pi_toggle omitted
   row q-1's diagonal log-weight even though it is invariant across
   (curr, star) for any toggle (nu_pi[q-1] = 0 always under the DEGORD
   permutation). The omission left a sample-shifted bias that grew with
   the tilt parameter delta. Fix: rw_head extended to include
   row_logw[q - 1].

2. z_trail slot fix (slab_tilt_mode == 1). delta_log_Zhat_pi_toggle
   passed z_trail = 0.0 hardcoded to row_qm2_logw_from_S, dropping the
   saddle-shifted slab innovation noise[q + edge_offset(q-2, q-1)] =
   noise[q + (q-2)(q+1)/2] when slab_tilt_mode == 1. Fix: read the
   actual slab slot via noise_pool.colptr(slab_idx).

Tests
-----
- tests/testthat/test-degord-sampler.R: 1526 assertions covering:
  - ChainAux nu-tables vs the z reference (local-only, gated on z source).
  - log_Zhat_pi_from_pool bit-parity on shared noise pool via the
    fixture at tests/testthat/fixtures/degord_sampler_reference.rds
    (regeneratable via dev/numerical_analyses/generate_degord_fixture.R).
  - DEGORD permutation correctness (i, j) -> (q-2, q-1).
  - delta_log_Zhat_pi_toggle EQUALS the direct full-recompute difference
    log_Zhat(star) - log_Zhat(curr) at machine precision under BOTH
    slab_tilt_modes. The (q=10, delta=0, tilt=1, toggle=(3,9)) cell is
    an explicit regression row.
  - delta_log_Zhat_pi_toggle bit-parity vs the z reference (local-only).
  - var(log_Zhat) scales as 1/M_inner (Phase 2 acceptance).
  - draw_bartlett_pool shape, normality, seed determinism.

Acceptance criteria from the Phase 2 plan section are met:
  - log_Zhat_pi_from_pool agrees with the z reference at bit-parity on
    shared input (max abs diff = 0 across the 108-cell fixture grid).
  - var(log Zhat) scales as 1/M_inner (M*var constant at ~0.35 across
    M in {30, 100, 300, 1000}; ratio var(30)/var(1000) = 36.3 vs
    theoretical 33.3).

bgms-side Phase 1 work (log_Z_NLO_gamma + alpha=1 incremental +
general-alpha incremental + V_a-pivot optimisation) is unaffected and
its 1253 assertions still pass.
Stage 3 Phase 3 of dev/plans/backlog/hierarchical-ggm-degord-rr.md.
The per-toggle unbiased 1/Z(Γ) estimator that composes the Phase 2 DEGORD
Bartlett-Cholesky sampler into the Lyne-style Russian-Roulette geometric
series for the hierarchical-spec MH ratio.

New files
---------
- src/models/ggm/z_ratio_estimator.{h,cpp}:
  - V_at_Gamma_pi_degord(K_depth, pools_t, pi_aux, chain_aux, c, rho):
      V = (1/c) · [1 + sum_{n=1..K} (-1)^n · prod_{i=1..n} (Zhat_i - c)/c / rho^n]
      with Zhat_i = exp(log_Zhat_pi_from_pool(pools_t[i-1], pi_aux, chain_aux)).
      Returns the signed V (caller tracks sign for ergodic averaging).
  - draw_U_degord_rr(rng, K_depth, pools_t, M_inner, q, rho):
      K_depth ~ Geom(1 - rho) via inverse-CDF on SafeRNG uniform;
      pools_t[n] is (dim x M_inner) iid N(0, 1) in the pre-transposed
      layout required by phi_pi_sample_from_noise.
  - ZRatioState (declaration only): the per-Γ state the Phase 4 chain
    will own (pools_t, K_depth, kappa, rho, log_Z_NLO_curr, sign_curr).

- Rcpp test exports in src/log_z_test_interface.cpp:
  - degord_V_at_Gamma_pi_cpp (List interface for pools_t).
  - degord_draw_U_rr_cpp returns List(K_depth, pools_t).

Tests
-----
- tests/testthat/test-z-ratio-estimator.R: 10 assertions covering:
  - V is unbiased for 1/Z within MC noise at q=5: mean(V) across 1000
    independent (K_depth, pools) draws sits within 4 SD of 1/Z computed
    via a high-precision (M=5000, n=20) log_Zhat truth proxy.
  - K_depth ~ Geom(1 - rho): mean, P(K=0), P(K>=5) all match within
    n=10000 MC tolerance.
  - V at K_depth = 0 equals 1/c exactly (machine precision).
  - draw_U_degord_rr is deterministic in the SafeRNG seed.
  - Signed V picks up the alternating series when c is far from 1/Z
    (small c forces negative samples at K_depth >= 1).

Cited references
----------------
- Direct port of ~/SV/Z/R/src/branchB_chain_route3a_degord.cpp:192-228
  for V_at_Gamma_pi_degord and :215-228 for the pool draw.
- Lyne (2015) Russian-Roulette construction; route3a_V_helpers
  notes/exactness-proposition.md §2 for radius/moment conditions.
…er_pair (Phase 4a)

Stage 3 Phase 4a of dev/plans/backlog/hierarchical-ggm-degord-rr.md. Wires
the Phase 2 DEGORD sampler + Phase 3 V/RR estimator into bgms's GGM
between-edge MH path. Converts the joint-spec MH ratio into the
hierarchical-spec ratio by multiplying by V(Γ_curr)/V(Γ_star), an
unbiased estimator of Z(Γ_star)/Z(Γ_curr).

This is the C++ infrastructure half of Phase 4. The R API surface
(bgm(..., graph_prior_spec = "hierarchical")) is Phase 4b, a separate
plumbing job.

Changes
-------
- src/models/ggm/ggm_model.h
  - New enum class GraphPriorSpec { Joint, Hierarchical } (header-level,
    outside the class).
  - GGMModel gains:
      void set_graph_prior_spec(GraphPriorSpec);
      void set_z_ratio_tuning(int M_inner, double kappa, double rho);
    plus private members graph_prior_spec_, v_M_inner_, v_kappa_, v_rho_,
    log_Z_NLO_curr_, v_K_depth_, v_pools_t_, chain_aux_degord_, and
    cached prior-family parameters (prior_sigma_, prior_alpha_, prior_beta_).

- src/models/ggm/ggm_model.cpp
  - ensure_hierarchical_state_(): lazy init. Validates the slab is
    NormalPrior and the diagonal is GammaScalePrior (throws otherwise).
    Builds chain_aux_degord_; full-recomputes log_Z_NLO_curr_ at the
    current Γ; draws the first U-pool.
  - refresh_z_ratio_pool_(): fresh (K_depth, pools_t) draw per iteration,
    mirroring the z chain's refresh_U_every = 1 convention.
  - prepare_iteration() now calls ensure + refresh when in Hierarchical mode.
  - update_edge_indicator_parameter_pair (both branches):
      after the existing joint MH ratio is assembled, if Hierarchical mode
      is active, compute log_Z_NLO_star (incremental at alpha=1, full
      recompute at alpha != 1), build PiAux at the DEGORD permutation
      sending (i, j) to (q-2, q-1), compute V_curr and V_star, and add
      log|V_curr|/|V_star| to ln_alpha. Auto-reject on non-finite or
      zero |V| (Lyne 2015 convention).
      On accept, log_Z_NLO_curr_ <- log_Z_NLO_star.
  - Defensive note retained that the determinant_tilt_ * log_det_ratio_edge
    term is identically 0 under Roverato slaving (verified during the
    Z-side audit and reconfirmed by direct calculation here).

- src/priors/parameter_prior.h
  - Adds public scale()/shape()/rate() accessors on NormalPrior and
    GammaScalePrior so the hierarchical state can read the prior params
    without coupling to the construction site.

- src/log_z_test_interface.cpp
  - New ggm_hierarchical_smoke_cpp entry: constructs a GGMModel directly,
    switches it to Hierarchical, runs n_sweeps Metropolis edge-update
    sweeps, returns final_edges + n_edges_path.

- tests/testthat/test-ggm-hierarchical.R
  - 13 assertions across 3 test_that blocks:
    1. Chain runs without crash and edge counts stay in [0, p(p-1)/2]
       across 200 sweeps at q=5, n=100 random data.
    2. Output is deterministic in the seed (same seed -> identical output;
       different seed -> different trajectory).
    3. Chain remains non-degenerate across delta in {0, 0.5, 1.0} (no
       all-zero or all-full lock-in).

Existing Phase 1-3 testthat assertions (1253 + 1526 + 10) continue to pass.
The default Joint mode (existing chain semantics) is unchanged.

Acceptance criteria from the Phase 4 plan section met for the C++-only
half:
  - bgm(... graph_prior_spec = "hierarchical") will finish on small
    synthetic data once Phase 4b wires the R-side argument.
  - Per-sweep wall: q=5 200-sweep run completes in well under a second on
    M5 Pro at M_inner=100. Full benchmarking deferred to Phase 5.

Phase 5 (SBC validation vs AKM-J truth across the delta-sweep) and
Phase 4b (R API surface) are the remaining pieces of Stage 3.
… 4b)

Stage 3 Phase 4b: exposes the Phase 4a hierarchical-spec inference path
through the bgm() R interface.

R API
-----
- bgm() gains two new arguments:
    graph_prior_spec = c("joint", "hierarchical")  # default "joint"
    z_ratio_tuning   = list(M_inner = 100L, kappa = 1.0, rho = 0.5)
- Both arguments are plumbed through bgm_spec() -> build_spec_ggm() ->
  run_sampler_ggm() -> sample_ggm() -> GGMModel::set_graph_prior_spec /
  set_z_ratio_tuning.
- Validation in bgm_spec():
    - graph_prior_spec must be "joint" or "hierarchical".
    - Hierarchical requires model_type in {"ggm", "mixed_mrf"} (continuous
      precision block needed). Errors helpfully if mismatch.
    - Hierarchical requires interaction_prior_type == "normal" and
      scale_prior_type == "gamma" (the prior families for which the
      closed-form Laplace-NLO normaliser approximation is implemented).
      Validated up front so the user gets a clean error before MCMC starts.
    - z_ratio_tuning components validated: M_inner positive integer,
      kappa > 0, rho in (0, 1).

C++ glue
--------
- sample_ggm.cpp: adds graph_prior_spec / z_ratio_M_inner / z_ratio_kappa /
  z_ratio_rho function args, calls set_z_ratio_tuning + set_graph_prior_spec
  when "hierarchical".
- GGMModel copy constructor (header) now also copies graph_prior_spec_ +
  v_M_inner_ + v_kappa_ + v_rho_ so multi-chain runs propagate the setting.
  Lazy state (pools, log_Z_NLO_curr, chain_aux_degord) is intentionally
  reset on clone - pools are RNG-derived and would otherwise share random
  draws across chains.

Roxygen
-------
- bgm.R gets full @param entries for graph_prior_spec and z_ratio_tuning.
  man/bgm.Rd regenerated via devtools::document().

Tests
-----
tests/testthat/test-ggm-hierarchical.R adds 10 new R-API assertions:
- bgm() with graph_prior_spec = "hierarchical" returns a posterior
  inclusion matrix with valid shape and values in [0, 1].
- Cauchy slab + hierarchical errors with a helpful message naming the
  required interaction_prior_type.
- Pure ordinal data + hierarchical errors naming "continuous".
- z_ratio_tuning rejects M_inner = 0, kappa < 0, rho in {0, 1}.

All 23 assertions (13 C++ smoke + 10 R-API) pass. Phase 1-3 tests
(1253 + 1526 + 10) continue to pass; default Joint path is unchanged.

What is left in Stage 3
-----------------------
- Phase 5: SBC validation against AKM-J truth across the delta-sweep at
  q = 10 (the manuscript Table 5 cells, post the Z-side disable_log_r
  fix).
- Phase 6: vignette section + cross-references in the spikeslab companion.
Two CI failures on PR #109:

1. tests/testthat/fixtures/{log_z_nlo_reference,degord_sampler_reference}.rds
   were absent from R CMD check because .Rbuildignore had the blanket
   pattern `^tests/testthat/fixtures$`, which the original author had
   intended to exclude only the legacy-fixture subdirectory but which in
   fact swallows every fixture below it. Tighten the pattern to
   `^tests/testthat/fixtures/legacy$` so the live fixtures ship; verified
   via R CMD build that the resulting tarball contains
   `tests/testthat/fixtures/degord_sampler_reference.rds` and
   `tests/testthat/fixtures/log_z_nlo_reference.rds` while continuing to
   exclude the `legacy/` subdirectory.

2. lintr semicolon_linter warnings on `; var <- val` compound statements
   in three test files. Rewrite each compound-on-one-line as
   newline-separated statements:
   - tests/testthat/test-degord-sampler.R (4 sites)
   - tests/testthat/test-ggm-hierarchical.R (3 sites)
   - tests/testthat/test-z-ratio-estimator.R (3 sites)

All 4 test files re-run clean locally: 1253 + 1526 + 10 + 23 = 2812
assertions pass. `lintr::lint_package()` reports 0 lints.
…(Phase 5)

Stage 3 Phase 5: simulation-based calibration of the bgms hierarchical-
spec chain at p = 5 across delta in {0, 0.5, 1, 2}, R = 300 replications.

Scope decision (q = 5 vs q = 10)
--------------------------------
The methodology paper validates at q = 10 against AKM-J perfect-sampler
truth, but AKM-J hits its max_proposals cap on roughly 23% of delta = 2
draws (selection bias on hard high-tilt cases that the unbiased Lyne RR
chain is designed to handle correctly). Restricting bgms's SBC harness
to q = 5 sidesteps this: at q = 5, sample_ggm_prior()'s constrained NUTS
at fixed Gamma_true converges cleanly across the full delta sweep, so
the truth source is trustworthy.

Harness
-------
Per replicate at each delta:
  1. Gamma_true ~ Bern(0.5) on the upper triangle.
  2. K_true | Gamma_true via sample_ggm_prior(n_warmup = 500, n_samples = 1).
     Rebuild K_true using offdiag_names (row-major) rather than
     upper.tri() (column-major) so the off-diagonal entries land at the
     right (i, j); without this the matrix is non-PD.
  3. Y | K_true ~ N(0, K_true^{-1}), n_obs = 100.
  4. bgm(Y, ..., graph_prior_spec = "hierarchical"), iter = 400, warmup = 100.
  5. Rank K_ii_true within posterior raw_samples$main[[1]][, i].
     Both sides are on the same K_yy partial-association scale; verified
     empirically (mean K_diag_true matches mean main posterior at near-
     empty graphs, low delta).

Tests
-----
- KS uniformity test per K_ii per delta (5 x 4 = 20 assertions), pass at
  alpha = 0.01.
- Gamma marginal calibration: posterior mean inclusion across the 300 R
  reps tracks empirical Gamma_true frequency within 4 SE (binomial gap
  at R * p(p-1)/2 = 3000 draws). One assertion per delta (4 total).

Total: 24 assertions, env-gated behind BGMS_RUN_SLOW_TESTS = true.

Runtime
-------
Smoke run at R = 30 (delta = 0.5) takes ~12 s; full R = 300 across
delta in {0, 0.5, 1, 2} extrapolates to ~8 min on M5 Pro. Verified
ranks uniform at R = 30 with KS p in [0.11, 0.84] across all 5 K_ii.

Why edge-indicator ranks aren't tested directly
-----------------------------------------------
Spike-and-slab makes per-(i, j) gamma_ij a discrete {0, 1} draw, which
breaks standard SBC's continuous-rank protocol (tied ranks degrade the
KS test's calibration). The existing test-sbc-ggm.R sidesteps this
by running WITHOUT edge selection. Here we run WITH edge selection
(it's the load-bearing case for hierarchical-spec) but only rank-test
K_ii. The gamma marginal calibration check is a sanity assertion, not
strict SBC.
Two compounding fixes to the hierarchical-spec V(Γ, U) estimator.

F5 - log-space V. The linear V_at_Gamma_pi_degord materialises
c = κ·exp(log_Z_NLO) which underflows to 0 at large p (log_Z_NLO ≈
-3500 nats at p=100, δ=1), turning every MH proposal into a silent
auto-reject. V_log_at_Gamma_pi_degord factors V = (1/c) · S and
computes log|S| via signed log-sum-exp over the K_depth + 1
truncated-series terms. log_κ cancels in the MH ratio; the ADD/DELETE
branches of update_edge_indicator_parameter_pair now pass log_c per Γ
and auto-reject on non-finite log|V| or sign flip across Γ_curr/Γ_star
(Phase-1 stand-in until a Lyne sign accumulator lands).

F6 - within-toggle cache reuse. log_Zhat_star_from_cache mirrors
delta_log_Zhat_pi_toggle's per-sample loop but exposes log_Ẑ_star
rather than the delta. V_log_pair_at_Gamma_curr_star_degord runs one
cached Phi build under a_curr per pool and re-evaluates only row q-2
under a_star, halving the inner Phi rebuild cost. Both ADD/DELETE
branches now invoke the paired call.

Tests:
- test-z-ratio-estimator.R: log-vs-linear bit-equality at q ∈ {5, 10,
  20} (max log_abs gap < 1e-9); underflow regression at log_c = -3500;
  paired-vs-independent bit-equality across q ∈ {5, 10, 20}, K_depth ∈
  {0, 2, 5}; cache adapter matches a fresh log_Zhat_pi_from_pool.
- test-ggm-hierarchical.R: p=20 cell runs a full bgm() at δ=1,
  hierarchical-spec; asserts finite indicators in [0,1] and a moving
  chain.

3018 assertions across degord-sampler / log-z-nlo / z-ratio /
ggm-hierarchical green; full suite (9736 PASS) also green.
Phase 4a smoke validated AMH + hierarchical only. Mirror cell at
update_method = "nuts", same p / iter / warmup / prior, confirms the
2x2 cross-product (NUTS/AMH × joint/hierarchical) is genuinely a
clean cross-product: the V(Γ, U) hook lives entirely in the
between-edge MH update path, so NUTS only governs the within-model
continuous block. No R-layer guard was needed.

Asserts finite indicators in [0, 1], correct shape, and a moving
edge-count trajectory (chain not locked at empty or full graph).
cholesky_update_after_{edge,diag} replaced the per-accept O(p^3)
arma::solve(trimatu(L), I) + inv_L * inv_L.t() covariance refresh
with one Sherman-Morrison-Woodbury rank-2 update (4 x O(p^2)) for
the edge path and one rank-1 update for the diagonal path. The
Cholesky factor L is still maintained via the existing cholupdate /
choldowndate pair (needed for get_log_det). inv_cholesky_of_precision_
is no longer maintained per accept - only refresh_cholesky() touches
it - so it's effectively scratch state owned by the refresh path.

Capacitance singularity (|det C| < 1e-14 in the edge path,
|1 + alpha * cov(i,i)| < 1e-14 in the diagonal path) falls back to
refresh_cholesky(), preserving the prior safety net.

End-of-sweep drift check check_and_refresh_if_drift_() guards
SMW-accumulated FP error by computing max_i |sum_k cov(i,k) * K(k,i) -
1| in O(p^2) and refreshing when it exceeds kCovDriftTol_ = 1e-8.
Wired into both do_one_metropolis_step (within-model sweep) and
update_edge_indicators (between-model sweep).

Net effect on AMH GGM edge-update per accept:
  2 x O(p^3) (solve + outer product)  ->  4 x O(p^2) + O(p^2) check.

Full test suite (9736 PASS) green; existing GGM cells (85 PASS) and
the p=20 hierarchical regression cell exercise the new path without
spurious refreshes.
Per-iteration sign(V_curr) and log|V_curr| are now maintained inside
GGMModel and surfaced as a top-level v_ratio_diagnostics field on the
bgm() return for hierarchical-spec fits. Joint-spec fits leave the
field NULL.

GGMModel carries current_sign_V_ / current_log_abs_V_ /
v_diag_initialized_. Both branches of update_edge_indicator_parameter_pair
seed the diagnostic from V_pair.curr on the first finite proposal and
advance to V_pair.star on accept. The state is stable on reject -
matches the spec's "sign stays at the previous step's value on auto-
reject" rule.

BaseModel gains three virtuals (has_v_ratio_diagnostics,
current_sign_V, current_log_abs_V) with safe defaults; GGMModel
overrides them under graph_prior_spec_ == Hierarchical. ChainResult
gains v_sign_samples / v_log_abs_samples and the matching reserve /
store helpers, mirroring the existing AM/NUTS diagnostic pattern.

R-side: convert_results_to_list packs per-chain v_sign / v_log_abs.
build_raw_samples_list (GGM branch) renames to the trailing-__
convention. build_output_bgm assembles a top-level
v_ratio_diagnostics = list(sign = list-of-chains, log_abs = ...).
bgms_class declares the S7 property and s3_list_to_bgms passes it
through (both required - missing either silently NULL-coerces the
field via the S7 accessor).

Test in test-ggm-hierarchical.R confirms presence, shape, finite
log_abs values, and that >= 95% of recorded signs are +1 in the
operational cell (q=5, δ=0.5, κ=1, ρ=0.5); joint-spec fit must leave
the field NULL.

Full suite: 9746 PASS, 0 FAIL.
New exported bgms_posterior_mean(fit) computes the Lyne (2015)
sign-corrected ergodic estimator

  E[f(K) | Y] ≈ Σ sign_iter · f(K_iter) / Σ sign_iter

for the pairwise, indicator, and residual-variance fields. Under the
operational regime (low δ, reasonable κ) sign === +1 and the
correction collapses to the plain colMeans-based posterior mean;
sign correction matters at high δ or aggressive κ where the
Russian-Roulette estimator of 1/Z(Γ) can flip sign. Joint-spec fits
and any fit without v_ratio_diagnostics fall through to the existing
posterior_mean_* fields unchanged. Raises a helpful error when the
sign vector sums to zero (degenerate regime; remediation is to raise
kappa).

Three test_that cells in tests/testthat/test-ggm-hierarchical.R
exercise the helper:

- Passthrough: on a sign === +1 hierarchical fit, the helper returns
  bit-equal results vs the plain posterior_mean_* fields; on a
  joint-spec fit, returns the plain fields unchanged.
- Sign-correction math: mutates the sign vector on a real
  hierarchical fit (via a plain-list view to skirt S7 immutability)
  to a 2/3 +1, 1/3 -1 pattern and asserts the helper's output equals
  the hand-computed sign-weighted mean while differing from the
  plain mean by more than numerical noise.

NULL fields preserve their slot via the `out["main"] = list(NULL)`
idiom so the returned list always has the same names regardless of
model type.

Full suite: 9756 PASS, 0 FAIL.
The hierarchical-spec correction in update_edge_indicator_parameter_pair
had the sign of the V(Γ_star)/V(Γ_curr) factor inverted in both ADD and
DELETE branches. The math:

  T_hier(K, Γ)        = T_joint(K, Γ) / Z(Γ)
  T_hier_ratio(c→s)   = T_joint_ratio × Z(Γ_curr)/Z(Γ_star)
                      = T_joint_ratio × V(Γ_star)/V(Γ_curr)    [V ≈ 1/Z]
  ln_alpha_hier      += log|V_star| - log|V_curr|

bgms had `ln_alpha += V_pair.curr.first - V_pair.star.first` (the
reverse). The accompanying comment also stated the wrong direction.
Z's reference chain (branchB_chain_route3a_degord.cpp:412) has the
correct sign:

  log_r = std::log(std::abs(V_star)) - std::log(std::abs(V_curr));

Why the q=5 SBC didn't catch it. The gamma-marginal calibration test
tolerance scales as 4·sqrt(p_inc·(1-p_inc) / (R · n_edges)). At q=5,
n_edges=10, R=300 the tolerance is ~0.037; at q=10, n_edges=45, R=500
it tightens to ~0.013. The bias scales with q (more edges accumulate
the misdirected V-weighting); at q=5 the bias was within the looser
tolerance, at q=10 it shows clearly.

Validation: q=10 SBC sweep at the Z-matched chain config (n_warmup =
n_iter = 2000, M_inner = 50, kappa = 2.0, rho = 0.5, R = 500) across
delta ∈ {0, 1, 2}.

                Pre-fix              Post-fix
  delta = 0    8/11 FAIL           10/11 PASS
                                   (K_ii[5] ks_p=0.008, borderline)
  delta = 1    1/11 FAIL (gamma)   11/11 PASS
  delta = 2    4/11 FAIL           11/11 PASS
  Total       13/33 FAIL           32/33 PASS

The single borderline failure at delta=0/K_ii[5] (ks_p=0.008 vs the
0.01 threshold) is well within expected multiple-testing noise
(~0.3 false positives expected across 30 K_ii cells at alpha=0.01).

New gated slow test tests/testthat/test-sbc-ggm-hierarchical-q10.R
locks the fix in - 22.6 min wall on 12 cores at this config.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant