Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
d05c494
fix: update LP supply tests to use DynamicInputArrays and test data d…
MatthewWilletts Mar 9, 2026
933b653
test: add pinned regression tests for calibration pipeline
MatthewWilletts Mar 10, 2026
32f2e43
feat: fixed-gas calibration mode for loss, per-pool fit, and joint fit
MatthewWilletts Mar 9, 2026
834e547
feat: replace Balancer hourly volatility with Binance minute data
MatthewWilletts Mar 9, 2026
a2ae5a4
fix: grid builder handles stale Binance data and multi-worker dispatch
MatthewWilletts Mar 9, 2026
f6ef0ec
test: comprehensive tests for fixed-gas calibration and Binance volat…
MatthewWilletts Mar 9, 2026
6f98236
test: strengthen calibration tests — fix vacuous assertions and add m…
MatthewWilletts Mar 10, 2026
a42a961
feat: composable CalibrationModel with pluggable Head components
MatthewWilletts Mar 10, 2026
c7ee40d
feat: add MLPHead for nonlinear pool-attribute-to-cadence mapping
MatthewWilletts Mar 10, 2026
ef7d24a
feat: add MLPNoiseHead for nonlinear pool-attribute-to-noise mapping
MatthewWilletts Mar 10, 2026
ccc3c6c
fix: use small random W2 init for MLP heads to avoid degenerate L-BFG…
MatthewWilletts Mar 10, 2026
dbd62b2
WIP: MLP calibration with lstsq warm-start and tuned hyperparameters
MatthewWilletts Mar 10, 2026
16f5bf3
merge dev
bulkcade Mar 10, 2026
49671f8
WIP: output clipping on heads, two-stage joint calibration
MatthewWilletts Mar 13, 2026
71f1a3a
feat: add reduced x_obs (k_obs=4) to calibration pipeline
MatthewWilletts Mar 16, 2026
ba30663
feat: parameterize k_obs in noise heads
MatthewWilletts Mar 16, 2026
5a2cbdf
feat: calibrated 8-covariate noise model for reCLAMM simulator
MatthewWilletts Mar 16, 2026
246de0e
feat: configurable n_evaluation_points for Optuna and keep startDateS…
MatthewWilletts Mar 16, 2026
227fe77
feat: add token encoding for token-factored noise model
MatthewWilletts Mar 16, 2026
c7a0c26
feat: add TokenFactoredNoiseHead with additive token decomposition
MatthewWilletts Mar 16, 2026
7edf6ec
feat: integrate token-factored noise into joint calibration pipeline
MatthewWilletts Mar 16, 2026
e8e9031
feat: add reduced x_obs (k_obs=4) pipeline to calibration runner
MatthewWilletts Mar 16, 2026
229e38d
feat: token-factored calibration script with Phase 0 diagnostic and LOO
MatthewWilletts Mar 16, 2026
5db3919
feat: support calibrated noise model in Optuna parameter tuning
MatthewWilletts Mar 16, 2026
5398c3b
feat: add token canonicalization and cross-pool lagged volume features
MatthewWilletts Mar 16, 2026
e99d3e2
feat: report data_loss and reg_loss separately in CalibrationModel.fit
MatthewWilletts Mar 16, 2026
3475198
feat: v2 runner with lambda annealing, cross-pool ablation, and LOO
MatthewWilletts Mar 16, 2026
31f8829
fix: cross-pool x_obs shape mismatch and warm-start k_obs padding
MatthewWilletts Mar 16, 2026
21fb161
feat: cross-pool volume prediction experiments
MatthewWilletts Mar 17, 2026
1056ee0
wip on deepsets
MatthewWilletts Mar 17, 2026
7ab989a
feat: deepsets v2 improvements — relational features, Huber loss, enc…
MatthewWilletts Mar 17, 2026
d6d1e7b
feat: LOO evaluation, warm-start decoder, residual target, minimal en…
MatthewWilletts Mar 17, 2026
5c04a47
feat: learnable cadence via PCHIP, linear market noise model, hybrid …
MatthewWilletts Mar 19, 2026
e661031
feat: per-pool linear noise model, market_linear simulator integratio…
MatthewWilletts Mar 23, 2026
825c8a7
feat: simulator integration for market_linear noise model, TVL standa…
MatthewWilletts Mar 23, 2026
afb8511
feat: causal TVL elasticity analysis — deconfounder + LP event study
MatthewWilletts Mar 26, 2026
82c64c7
feat: TVL counterfactual validation script
MatthewWilletts Mar 26, 2026
75b683f
feat: MLP noise model (Binance-only, no cross-pool DEX dependency) + …
MatthewWilletts Mar 26, 2026
b41d793
merge origin
bulkcade Mar 26, 2026
11b8858
feat: remove panel dependency from simulator arrays, Binance-only pip…
MatthewWilletts Mar 26, 2026
414903d
data: per-pool linear noise model artifact (Binance-only, 22 features…
MatthewWilletts Mar 26, 2026
c936eee
Merge branch 'noise-modelling' of https://github.com/QuantAMMProtocol…
bulkcade Mar 26, 2026
a88fd66
fix: noise volume cadence scaling + price preservation for 2-CLP
MatthewWilletts Mar 27, 2026
5d40a65
compare improvements
bulkcade Mar 27, 2026
5ec81c5
compare improvements
bulkcade Mar 27, 2026
29af7df
data: add sim arrays for 0x9d1fcf346ea1b0
MatthewWilletts Mar 29, 2026
8331b7f
diagnostics
bulkcade Mar 29, 2026
bf033bb
Merge branch 'noise-modelling' of https://github.com/QuantAMMProtocol…
bulkcade Mar 29, 2026
00144e2
perforance improvements
bulkcade Mar 29, 2026
4c9eae7
feat: feature-appropriate scaling, TVL clamp, protocol fee default
MatthewWilletts Mar 31, 2026
b44c228
feat: CMA-ES optimiser + Optuna min_train_returns_over_hodl rejection
MatthewWilletts Mar 31, 2026
b89fa64
feat: Michaelis-Menten noise model + MLP sweep + comparison tooling
MatthewWilletts Mar 31, 2026
1fae546
feat: MM noise model — per-pool K, Optuna sweep, cross-pool TVL analysis
MatthewWilletts Apr 7, 2026
e202d67
different scripts
bulkcade Apr 7, 2026
fc616a8
initial take
bulkcade Apr 7, 2026
629ff83
feat: observed competitor TVL as K via DeFi Llama network conductance
MatthewWilletts Apr 7, 2026
d3632ec
feat: integrate mm_observed noise model into simulator pipeline
MatthewWilletts Apr 7, 2026
76bcc96
feat: add mm_observed noise model to reClAMM tuning pipeline
MatthewWilletts Apr 7, 2026
fa32c9d
merge noise-modelling
bulkcade Apr 16, 2026
1a9f644
merge conflict fix
bulkcade Apr 16, 2026
d123a9f
refactor balancer hypersurge
bulkcade Apr 16, 2026
4003371
dynamic input array slicing defensive fix
bulkcade Apr 16, 2026
c3b0ede
reclamm hypersurge first implementation
bulkcade Apr 16, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added .codex
Empty file.
230 changes: 230 additions & 0 deletions docs/joint_calibration_analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
# Joint Calibration Analysis: MLP Capacity vs Identification

## Training results (2026-03-10)

### R2 progression across model architectures

| Attempt | Architecture | Noise params | Total params | Median R2 | Joint loss |
|---|---|---|---|---|---|
| Structural MoE (numpyro) | 3 archetypes x 8 coeffs | 24 | ~40 | -0.70 | - |
| Linear joint | SharedLinearNoiseHead | 63 | 63 | -0.15 | 9.62 |
| MLP noise joint | MLPNoiseHead(hidden=16) | 255 | 255 | 0.01 | 9.39 |
| Full MLP | MLPHead(cad,16) + MLPNoiseHead(16) | 255 | 377 | -0.02 | 8.59 |
| Option C (per-pool) | PerPoolNoiseHead | 37x8=296 | 37x9=333 | 0.61 | 1.25 (median) |

The direction is clear: more capacity on the noise side helps substantially
(-0.70 -> -0.15 -> 0.01). Adding cadence capacity (MLP noise -> full MLP)
improved joint loss (9.39 -> 8.59) but not per-pool R2 (-0.02), and didn't
converge within 500 iterations.

### Convergence concern

The full MLP explicitly failed to converge (scipy `success=False`). The MLP
noise model converged but only reduced loss from 12.70 to 9.39 — a 26%
reduction vs the linear baseline's 99.5% reduction (2011.77 -> 9.62). This
suggests the MLPs are undertraining.

Current optimizer settings:
- L-BFGS-B with maxiter=500
- ftol=1e-10, gtol=1e-8
- maxcor=10 (L-BFGS memory, scipy default)
- alpha=0.01 for all heads (L2 regularization on weights)
- hidden=16 for all MLPs
- He init for W1, W2=0, b2=pooled OLS / mean of Option C

### Why the MLPs may not be converging

1. **maxiter=500 is low for 255-377 params.** L-BFGS-B typically needs
O(1000-5000) iterations for MLP-scale problems. The linear model with
63 params converges easily in 500; the MLP with 377 params does not.

2. **maxcor=10 may be too small.** The default L-BFGS memory of 10 past
gradients may not provide a good enough Hessian approximation for 377
parameters. Increasing to 20-50 can help.

3. **Regularization alpha=0.01 may be wrong.** With 37 pools and 255 noise
params, the model is overparameterized (255/37 ≈ 7 params per pool).
alpha=0.01 might be too weak (overfitting some pools, underfitting
others) or too strong (preventing the MLP from expressing the necessary
nonlinearity). This is the most important hyperparameter to sweep.

4. **W2=0 initialization creates a flat starting surface.** Since the MLP
starts as a constant function (output = b2 everywhere), L-BFGS-B must
first learn to differentiate between pools. The initial gradients
through W1 are informative (He init + backprop through ReLU), but the
first few iterations may be slow compared to the linear model which
starts from an OLS warm-start.

5. **Dead ReLU units.** With He init and k_attr=6 features, some hidden
units may have all-negative pre-activations across the 37 pool
attribute vectors, making them permanently dead with zero gradient.

6. **Per-pool loss weighting.** All observations contribute equally.
USDC/WETH (1757 obs) dominates RDNT/WETH (89 obs) by 20x. The
optimizer may be fitting a few high-obs pools at the expense of many
low-obs ones.

## Diagnosis: identification vs convergence

Two distinct problems:

1. **Convergence problem** (addressable via hyperparameters):
The MLP isn't reaching its minimum. Fix: more iterations, better
hyperparameters, multiple restarts.

2. **Identification problem** (addressable via architecture):
Even at the minimum, the shared mapping can't match per-pool R2.
37 pools is tiny for a nonlinear model. Cadence is idiosyncratic.
Fix: DeltaHead (per-pool residuals with shrinkage), better features.

These are **independent** problems that compound. We should fix convergence
first (hyperparameter sweep) to understand the true capacity of the current
architecture before adding structural complexity.

## Hyperparameter sweep design

### Parameters to sweep

| Parameter | Current | Sweep values | Rationale |
|---|---|---|---|
| maxiter | 500 | 500, 2000, 5000 | Primary convergence bottleneck |
| alpha (noise) | 0.01 | 0.0001, 0.001, 0.01, 0.1 | Controls overfitting vs underfitting |
| alpha (cadence) | 0.01 | 0.001, 0.01, 0.1 | Separate from noise reg |
| hidden | 16 | 8, 16, 32 | Capacity vs overfitting |
| maxcor | 10 | 10, 30 | L-BFGS Hessian quality |
| loss_type | l2 | l2, huber | Outlier robustness |

### Sweep strategy

Full grid is 3 x 4 x 3 x 3 x 2 x 2 = 432 runs. Too many.

**Phase 1: Fix convergence (1D sweeps)**
- Sweep maxiter = [500, 2000, 5000] with defaults. Cheapest diagnostic.
- If 5000 converges, use that going forward.

**Phase 2: Regularization (most important)**
- alpha_noise x alpha_cad grid: 4 x 3 = 12 runs at converged maxiter.
- Evaluate both joint loss AND per-pool median R2.

**Phase 3: Architecture**
- hidden = [8, 16, 32] at best alpha settings: 3 runs.
- loss_type = [l2, huber] at best settings: 2 runs.
- maxcor = [10, 30] at best settings: 2 runs.

Total: ~22 runs, each ~2-5 min = ~1-2 hours.

### Metrics to track per run

- Joint loss (final)
- Joint loss (init) — sanity check
- Converged (bool)
- Number of L-BFGS iterations used
- Per-pool median R2
- Per-pool mean R2
- Per-pool R2 distribution (10th, 25th, 50th, 75th, 90th percentiles)
- Wall time

### What success looks like

- Converged = True for the full MLP
- Joint loss < 8.0 (below current 8.59)
- Per-pool median R2 > 0.3 (closing the gap toward Option C's 0.61)
- The R2 improvement should be spread across pools, not concentrated

## Features / data that would help

### Missing pool attributes (from docs)

Current features (k_attr=6 after chain dummy removal):
log_fee, mean_log_tvl, log_mcap_product, has_stable, same_asset_type,
weight_imbalance.

These describe what the pool IS but not the market around it. Cadence is
driven by arbitrage frequency, which depends on:

| Missing feature | Why it matters | Source | Effort |
|---|---|---|---|
| Block time | Directly limits minimum cadence. Arb=0.25s vs Main=12s | Static per chain | Trivial |
| Mean pair volatility | Pool-level (not obs-level) vol predicts arb intensity | Binance minute data (loaded) | Small |
| CEX daily volume | More CEX vol = more arb opportunities | Binance API | Medium |
| Competing DEX pools | More pools for same pair = faster arb | Balancer subgraph | Medium |
| Pool routing share | Dominant pool gets arbitraged first | DEX aggregator data | Hard |
| Mean daily swap count | Direct proxy for pool activity | Panel data | Small |

The pair-intrinsic formula bias (1.26-2.22x) documented in
noise_calibration_review.md is the largest unexplained variance source.
It varies with pair liquidity characteristics in ways that the current
token classification doesn't capture. CEX volume/depth would help.

### Observation-level features (x_obs, K_OBS=8)

Current: [1, log_tvl_lag1, log_sigma, tvl*sigma, tvl*fee, sigma*fee,
dow_sin, dow_cos]

Missing:
- Rolling CEX volume (daily) — high volume days have more noise/organic flow
- Gas price that day (mainnet) — affects whether arbs execute
- Market regime (rolling momentum) — trending vs mean-reverting
- Number of swaps that day — direct activity measure

### Time-varying dynamics

Panel spans 2021-2026. MEV dynamics changed dramatically:
- Flashbots launched mid-2021
- L2s matured 2023-2024
- EIP-4844 (March 2024) dropped L2 gas costs
The current model assumes constant cadence per pool over this period.

## Structural improvements (post-sweep)

### DeltaHead (per-pool residuals with shrinkage)

Most important structural change. For cadence:
```
log_cadence_i = f(x_attr_i) + delta_i
regularization: alpha_shared * ||W||^2 + alpha_delta * sum(delta_i^2)
```

At alpha_delta=0: pure per-pool (Option C)
At alpha_delta=inf: pure shared (current joint)
Cross-validate alpha_delta.

For new pools: predict f(x_attr_new) with delta=0.

This is essentially a mixed-effects model fitted end-to-end through the
grid interpolation loss.

### Per-pool loss weighting

Weight each pool's contribution by 1/sqrt(n_obs_i) to equalize pool-level
influence. Currently USDC/WETH (1757 obs) has 20x the influence of any
Sonic pool (89 obs).

### Hybrid: per-pool cadence + shared noise

Cadence is idiosyncratic (LOO R2 = 0.24 at best). Noise structure is
more regular (hierarchical model R2 = 0.71 on total volume). Natural split:
- Cadence: per-pool (Option C)
- Noise: shared MLP (generalizable)
- Gas: fixed to chain values

### Sensitivity analysis (the decision point)

Before investing more in mapping improvement: does reCLAMM optimal
concentration change materially when cadence varies +/-50%? This is
recommendation #1 in calibration_results.md, noise_calibration_review.md,
and joint_calibration_design.md. Still not done.

If the optimum is robust, the current pipeline (Option C + Ridge LOO) is
already sufficient and further mapping improvement is nice-to-have.

## Priority order

1. **Hyperparameter sweep** — fix convergence before changing architecture
2. **DeltaHead** — if R2 gap persists post-sweep, this is the minimal
structural change
3. **Per-pool loss weighting** — simple fix, helps all joint models
4. **Add block_time and mean_pair_volatility** — high-signal, low-effort
features
5. **Sensitivity analysis** — the real decision point for whether any of
this matters for the downstream task
Loading
Loading