← Previous: MCMC Methods | Back to Index
Convergence diagnostics assess whether MCMC samplers have reached their stationary distribution and provide reliable samples from the posterior. The Numerics library provides essential diagnostic tools including Gelman-Rubin statistic and Effective Sample Size.
MCMC samplers:
- Start from arbitrary initial values
- Explore parameter space stochastically
- Eventually converge to target distribution
- Must discard "burn-in" samples
Key questions:
- Have chains converged to stationary distribution?
- How many independent samples do we have?
- Is the warmup period sufficient?
The theoretical justification for MCMC rests on the ergodic theorem: under regularity conditions (irreducibility, aperiodicity, positive recurrence), the time average of a function along the Markov chain converges to its expectation under the stationary distribution:
where
Burn-in (warmup) refers to the initial transient phase where the chain has not yet reached the stationary distribution. Samples from this phase are drawn from a distribution that depends on the arbitrary starting point, not from
Mixing describes how quickly the chain "forgets" its current state and explores the full support of
The Gelman-Rubin diagnostic compares within-chain and between-chain variance [1]. Values near 1.0 indicate convergence.
using Numerics.Sampling.MCMC;
using Numerics.Mathematics.Optimization;
// Run sampler with multiple chains
var sampler = new ARWMH(priors, logLikelihood);
sampler.NumberOfChains = 4;
sampler.WarmupIterations = 2000;
sampler.Iterations = 5000;
sampler.Sample();
// Get chains from sampler
var chains = sampler.MarkovChains.ToList();
// Compute Gelman-Rubin for each parameter
int warmup = sampler.WarmupIterations;
double[] rHat = MCMCDiagnostics.GelmanRubin(chains, warmup);
Console.WriteLine("Gelman-Rubin Diagnostics (R̂):");
for (int i = 0; i < rHat.Length; i++)
{
Console.WriteLine($" Parameter {i}: R̂ = {rHat[i]:F4}");
if (rHat[i] < 1.1)
Console.WriteLine(" ✓ Converged");
else if (rHat[i] < 1.2)
Console.WriteLine(" ⚠ Marginal - run longer");
else
Console.WriteLine(" ✗ Not converged - investigate");
}| R̂ Value | Interpretation | Action |
|---|---|---|
| R̂ < 1.01 | Excellent convergence | Proceed |
| R̂ < 1.1 | Good convergence | Safe to use |
| 1.1 ≤ R̂ < 1.2 | Marginal | Run longer |
| R̂ ≥ 1.2 | Poor convergence | Investigate |
Formula:
Where
Consider
Step 1: Chain means and overall mean. Compute the mean of each chain and the grand mean across all chains:
Step 2: Between-chain variance
The factor
Step 3: Within-chain variance
Step 4: Pooled variance estimate. The pooled estimate combines within-chain and between-chain information:
This is a weighted average that has a key property:
Step 5: The diagnostic ratio. The potential scale reduction factor is:
Since
Split-$\hat{R}$. Modern practice [4] recommends splitting each chain in half before computing GelmanRubin().
- Insufficient warmup - Chains haven't reached stationarity
- Poor mixing - Chains explore slowly
- Multimodal posterior - Chains stuck in different modes
- Bad initialization - Starting values too extreme
- Wrong sampler - Algorithm not suited to problem
ESS quantifies number of independent samples, accounting for autocorrelation [2].
// Extract samples for parameter 0 from the first chain
double[] samples = sampler.MarkovChains[0]
.Select(ps => ps.Values[0])
.ToArray();
double ess = MCMCDiagnostics.EffectiveSampleSize(samples);
Console.WriteLine($"Effective Sample Size: {ess:F0}");
Console.WriteLine($"Actual samples: {samples.Length}");
Console.WriteLine($"Efficiency: {ess / samples.Length:P1}");
// Rule of thumb: ESS > 100 per parameter
if (ess < 100)
Console.WriteLine("⚠ Warning: Low ESS - run longer or thin more");// Compute ESS for all parameters across chains
double[] essValues = MCMCDiagnostics.EffectiveSampleSize(chains, out double[][,] avgACF);
Console.WriteLine("Effective Sample Size by Parameter:");
for (int i = 0; i < essValues.Length; i++)
{
Console.WriteLine($" θ{i}: ESS = {essValues[i]:F0}");
}
// Check minimum ESS
double minESS = essValues.Min();
Console.WriteLine($"\nMinimum ESS: {minESS:F0}");
if (minESS > 400)
Console.WriteLine("✓ Excellent: ESS > 400");
else if (minESS > 100)
Console.WriteLine("✓ Good: ESS > 100");
else
Console.WriteLine("✗ Poor: ESS < 100 - need more samples");Guidelines:
- ESS > 400: Excellent - precise estimates
- ESS > 100: Good - adequate for most purposes
- ESS < 100: Poor - increase iterations or improve mixing
Autocorrelation impact:
- High autocorrelation → Low ESS → Need more iterations
- Good mixing → High ESS → Efficient sampling
where
The ESS formula arises from analyzing the variance of the sample mean of a correlated sequence. For a stationary process
For large
If the samples were independent, we would have
Solving for ESS:
The quantity
Truncation strategy. In practice, the infinite sum must be truncated. The Numerics implementation uses a simple truncation rule: the sum is cut off at the first lag
Geyer (1992) [3] proposed a more robust alternative called the initial positive sequence estimator, which sums consecutive pairs of autocorrelations
Multi-chain ESS. When
Here
This is capped at
The minimum ESS needed depends on what posterior summary you are estimating:
| Inference Goal | Minimum ESS | Rationale |
|---|---|---|
| Posterior mean | 100 | MCSE is approximately 10% of posterior SD |
| Posterior standard deviation | 200 | Variance estimation requires more samples than mean estimation |
| 95% credible interval | 400 | Quantile estimation demands greater precision in the tails |
| Tail probabilities (e.g., $P(\theta > c)$) | 1000+ | Extreme quantiles are estimated from sparse tail samples |
These thresholds are guidelines, not strict rules. For life-safety applications, err on the side of larger ESS.
When comparing MCMC samplers, ESS alone is not sufficient -- the computational cost per iteration matters. The ESS per second metric accounts for this:
where
The Monte Carlo Standard Error quantifies the precision of posterior estimates due to finite sampling [5]. While the posterior standard deviation describes uncertainty about the parameter, the MCSE describes uncertainty about the estimate itself.
For a posterior mean estimate
This is the standard error of the Monte Carlo estimate of
A useful rule of thumb is that the MCSE should be less than 5% of the posterior standard deviation:
This means your Monte Carlo error is small relative to the inherent posterior uncertainty. For life-safety applications, a more stringent threshold (e.g., MCSE/SD < 0.02, requiring ESS > 2500) may be appropriate.
// Compute MCSE from ESS and posterior standard deviation
double[] values = samples.Select(s => s.Values[0]).ToArray();
double sd = Statistics.StandardDeviation(values);
double mcse = sd / Math.Sqrt(ess[0]);
Console.WriteLine($"Posterior SD: {sd:F6}");
Console.WriteLine($"ESS: {ess[0]:F0}");
Console.WriteLine($"MCSE: {mcse:F6}");
Console.WriteLine($"MCSE/SD ratio: {mcse / sd:P2}");
if (mcse / sd > 0.05)
Console.WriteLine("⚠ MCSE is large relative to posterior SD - increase sampling");
else
Console.WriteLine("✓ MCSE is acceptably small");// Compute autocorrelation function
// (avgACF from ESS calculation above)
Console.WriteLine("Autocorrelation at selected lags:");
Console.WriteLine("Lag | ACF");
Console.WriteLine("-----|------");
for (int lag = 0; lag <= 20; lag += 5)
{
// Access from avgACF[parameter][lag, column] where column 1 has ACF values
double acf = avgACF[0][lag, 1]; // Parameter 0, average ACF at this lag
Console.WriteLine($"{lag,4} | {acf,5:F3}");
}
// Ideal: ACF drops quickly to zero
// Problem: ACF remains high (slow decorrelation)| Lag-1 ACF | Interpretation | ESS Impact |
|---|---|---|
| < 0.1 | Excellent mixing | ESS ≈ N |
| 0.1-0.3 | Good mixing | ESS ≈ 0.5N |
| 0.3-0.6 | Moderate | ESS ≈ 0.2N |
| > 0.6 | Poor mixing | ESS << 0.1N |
The Raftery-Lewis method provides a theoretical lower bound on the number of MCMC samples needed to estimate a particular posterior quantile with specified precision. Given a quantile of interest
where
For example, to estimate the 99th percentile (
Note that this is a lower bound assuming independent samples. With autocorrelated MCMC output, the actual number of iterations needed is
Determine required sample size for desired precision:
// For quantile estimation
double quantile = 0.99; // 100-year event
double tolerance = 0.01; // ±1% of true quantile
double probability = 0.95; // 95% confidence
int minN = MCMCDiagnostics.MinimumSampleSize(quantile, tolerance, probability);
Console.WriteLine($"Minimum sample size needed:");
Console.WriteLine($" For {quantile:P1} quantile");
Console.WriteLine($" With ±{tolerance:P1} tolerance");
Console.WriteLine($" At {probability:P0} confidence");
Console.WriteLine($" Need N ≥ {minN}");
// Adjust MCMC iterations accordinglyusing Numerics.Sampling.MCMC;
using Numerics.Data.Statistics;
// Step 1: Run sampler
var sampler = new ARWMH(priors, logLikelihood);
sampler.NumberOfChains = 4;
sampler.WarmupIterations = 2000;
sampler.Iterations = 5000;
sampler.ThinningInterval = 10;
sampler.Sample();
Console.WriteLine("MCMC Convergence Diagnostics");
Console.WriteLine("=" + new string('=', 60));
// Step 2: Extract samples (flatten all chains)
var samples = sampler.Output.SelectMany(chain => chain).ToList();
int nParams = samples[0].Values.Length;
Console.WriteLine($"\nSampling Summary:");
Console.WriteLine($" Chains: {sampler.NumberOfChains}");
Console.WriteLine($" Warmup: {sampler.WarmupIterations}");
Console.WriteLine($" Iterations: {sampler.Iterations}");
Console.WriteLine($" Thinning: {sampler.ThinningInterval}");
Console.WriteLine($" Total samples: {samples.Count}");
// Step 3: Check Gelman-Rubin
Console.WriteLine($"\nGelman-Rubin Statistics:");
var chains = sampler.MarkovChains.ToList();
double[] rhat = MCMCDiagnostics.GelmanRubin(chains, sampler.WarmupIterations);
bool converged = true;
for (int i = 0; i < nParams; i++)
{
string status = rhat[i] < 1.1 ? "✓" : "✗";
Console.WriteLine($" {status} θ{i}: R̂ = {rhat[i]:F4}");
if (rhat[i] >= 1.1) converged = false;
}
// Step 4: Check ESS
Console.WriteLine($"\nEffective Sample Size:");
double[] ess = MCMCDiagnostics.EffectiveSampleSize(chains, out double[][,] acf);
int minESS = (int)ess.Min();
// Use MarkovChains total sample count for efficiency (ESS is computed from MarkovChains, not Output)
int chainSampleCount = chains.Sum(c => c.Count);
for (int i = 0; i < nParams; i++)
{
double efficiency = ess[i] / chainSampleCount;
Console.WriteLine($" θ{i}: ESS = {ess[i]:F0} ({efficiency:P1} efficiency)");
}
// Step 5: Overall assessment
Console.WriteLine($"\nOverall Assessment:");
if (converged && minESS > 100)
Console.WriteLine(" ✓ PASS: Chains converged, sufficient samples");
else if (!converged)
Console.WriteLine(" ✗ FAIL: Chains not converged - run longer");
else
Console.WriteLine(" ⚠ MARGINAL: Converged but low ESS - consider more iterations");
// Step 6: Recommendations
if (minESS < 100)
{
int recommended = (int)(sampler.Iterations * 100.0 / minESS);
Console.WriteLine($"\nRecommendation: Increase iterations to ~{recommended}");
}While the library doesn't provide plotting, these are essential checks:
Plot parameter values vs. iteration:
Console.WriteLine("Export trace data for plotting:");
Console.WriteLine("Iteration | Parameter Values");
for (int i = 0; i < Math.Min(samples.Count, 100); i++)
{
Console.Write($"{i,9} | ");
foreach (var val in samples[i].Values)
{
Console.Write($"{val,8:F3} ");
}
Console.WriteLine();
}
Console.WriteLine("\nGood traces: Stationary, well-mixed 'hairy caterpillar'");
Console.WriteLine("Bad traces: Trending, stuck, or oscillating patterns");// Export posterior samples for histogram
for (int param = 0; param < nParams; param++)
{
var values = samples.Select(s => s.Values[param]).ToArray();
Console.WriteLine($"\nParameter {param} summary:");
Console.WriteLine($" Mean: {values.Average():F4}");
Console.WriteLine($" Median: {Statistics.Percentile(values, 0.50):F4}");
Console.WriteLine($" SD: {Statistics.StandardDeviation(values):F4}");
Console.WriteLine($" 95% CI: [{Statistics.Percentile(values, 0.025):F4}, " +
$"{Statistics.Percentile(values, 0.975):F4}]");
}Diagnosis:
if (rhat.Max() > 1.1)
{
Console.WriteLine("Convergence issue detected");
Console.WriteLine("Possible causes:");
Console.WriteLine(" 1. Insufficient warmup");
Console.WriteLine(" 2. Poor mixing");
Console.WriteLine(" 3. Multimodal posterior");
Console.WriteLine(" 4. Bad initialization");
}Solutions:
-
Increase warmup
sampler.WarmupIterations = 5000; // Double it
-
Try different sampler
// Switch from RWMH to DEMCz for better mixing var betterSampler = new DEMCz(priors, logLikelihood);
-
Check initialization
// Ensure initial values are reasonable // Not too far from expected values
Diagnosis:
if (ess.Min() < 100)
{
Console.WriteLine($"Low ESS detected: {ess.Min():F0}");
Console.WriteLine("Chains are highly autocorrelated");
}Solutions:
-
Increase iterations
int factor = (int)Math.Ceiling(100.0 / ess.Min()); sampler.Iterations *= factor; Console.WriteLine($"Suggested iterations: {sampler.Iterations}");
-
Increase thinning
sampler.ThinningInterval = 20; // Keep every 20th sample
-
Improve sampler
// Use ARWMH instead of RWMH // Use DEMCz for high dimensions
Diagnosis:
// Check if chains explore different modes
foreach (int param in new[] { 0, 1, 2 })
{
var chainMeans = chains.Select(chain =>
chain.Select(ps => ps.Values[param]).Average()).ToArray();
double meanRange = chainMeans.Max() - chainMeans.Min();
double overallSD = Statistics.StandardDeviation(
samples.Select(s => s.Values[param]).ToArray());
if (meanRange > 2 * overallSD)
{
Console.WriteLine($"Parameter {param} may have multiple modes");
Console.WriteLine(" Chain means differ substantially");
}
}Solutions:
-
Use population-based sampler
var demcz = new DEMCz(priors, logLikelihood); demcz.NumberOfChains = 10; // More chains
-
Check for model identification issues
-
Consider transforming parameters
// Minimum 4 chains
sampler.NumberOfChains = 4;
// Benefits:
// - Can compute R̂
// - Detect multimodality
// - More robust inference// Rule of thumb: Warmup ≥ 50% of iterations
sampler.WarmupIterations = Math.Max(2000, sampler.Iterations / 2);// Always check before using samples
bool ready = (rhat.Max() < 1.1) && (ess.Min() > 100);
if (!ready)
{
Console.WriteLine("⚠ WARNING: Do not use these samples!");
Console.WriteLine("Run diagnostics and extend sampling");
return;
}
// Proceed with inference...Console.WriteLine("MCMC Configuration:");
Console.WriteLine($" Sampler: {sampler.GetType().Name}");
Console.WriteLine($" Chains: {sampler.NumberOfChains}");
Console.WriteLine($" Warmup: {sampler.WarmupIterations}");
Console.WriteLine($" Iterations: {sampler.Iterations}");
Console.WriteLine($" Thinning: {sampler.ThinningInterval}");
Console.WriteLine($" Total samples: {samples.Count}");
Console.WriteLine($" R̂ range: [{rhat.Min():F3}, {rhat.Max():F3}]");
Console.WriteLine($" ESS range: [{ess.Min():F0}, {ess.Max():F0}]");Note: Setting any sampler property (e.g., Iterations) calls Reset() internally, which clears all chains and output. This means increasing Iterations and calling Sample() runs a completely new simulation with the updated iteration count -- it does not extend the previous run.
int iteration = 1;
int totalIterations = sampler.Iterations;
while (rhat.Max() > 1.05 || ess.Min() < 200)
{
totalIterations += 5000;
Console.WriteLine($"\nIteration {iteration}: Restarting with {totalIterations} iterations...");
// Setting Iterations triggers Reset(), clearing all previous chains/output.
// Sample() then runs a fresh simulation with the new iteration count.
sampler.Iterations = totalIterations;
sampler.Sample();
// Recompute diagnostics
chains = sampler.MarkovChains.ToList();
rhat = MCMCDiagnostics.GelmanRubin(chains, sampler.WarmupIterations);
ess = MCMCDiagnostics.EffectiveSampleSize(chains, out _);
iteration++;
if (iteration > 5)
{
Console.WriteLine("Convergence issues persist - check model/sampler");
break;
}
}| Diagnostic | Target | Action if Not Met |
|---|---|---|
| R̂ | < 1.1 | Increase warmup, run longer |
| ESS | > 100 (per param) | Increase iterations, improve mixing |
| Visual traces | Stationary | Check initialization, try different sampler |
| ACF | Drops quickly | Increase thinning, better sampler |
[1] Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457-472.
[2] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian Data Analysis (3rd ed.). CRC Press.
[3] Geyer, C. J. (1992). Practical Markov chain Monte Carlo. Statistical Science, 7(4), 473-483.
[4] Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2021). Rank-normalization, folding, and localization: An improved R̂ for assessing convergence of MCMC. Bayesian Analysis, 16(2), 667-718.
[5] Flegal, J. M., Haran, M., & Jones, G. L. (2008). Markov chain Monte Carlo: Can we trust the third significant figure? Statistical Science, 23(2), 250-260.