Skip to content

Hyper surge#78

Open
bulkcade wants to merge 63 commits into
devfrom
hyper-surge
Open

Hyper surge#78
bulkcade wants to merge 63 commits into
devfrom
hyper-surge

Conversation

@bulkcade
Copy link
Copy Markdown
Contributor

No description provided.

MatthewWilletts and others added 30 commits March 9, 2026 18:22
…ates

- test_lp_supply_through_pool_class: use DynamicInputArrays bundle
  instead of old positional-args signature
- test_lp_supply_e2e_do_run_on_historic_data: use TEST_DATA_DIR and
  date range within test data coverage (2023-01-01 to 2023-01-15)
- test_noise_trade_does_not_affect_virtual_balances: carry/input_list
  already fixed in previous commit
Add 25 numerical regression tests that pin exact values computed from
synthetic fixtures. These protect against silent computation errors
during refactoring — existing tests only check shapes and signs.

Covers: grid interpolation (knot exactness, midpoint values, monotonicity,
differentiability), loss function (pinned value + gradient at known params),
noise volume, per-pool fit convergence (loss, cadence), joint fit (both
noise modes, predict_new_pool, warm start), pack/unpack roundtrips.
Add CHAIN_GAS_USD lookup and pool_loss_fixed_gas to fix gas to known
chain-level costs, removing the cadence-gas degeneracy. Per-pool fit
and joint fit (Option A) both support fix_gas_to_chain flag, optimizing
only cadence and noise coefficients when gas is held constant.
Compute daily realized volatility from Binance minute prices instead of
Balancer API hourly prices, removing the 90-day data restriction. Each
pool now uses its full historical date range (up to 1761 days). The
calibration runner calls replace_panel_volatility_with_binance() and
supports train_days=0 for unrestricted history.
Clip panel dates to Binance price data range so pools with stale token
data (BAL, MKR, BADGER, LIT) use their available overlap instead of
failing. Snap sim start to next midnight for tokens starting mid-day.
Set max_memory_days=0 and preslice_burnin=False to prevent negative
start_idx. Workers load their own price data to avoid pickling large
DataFrames across processes.
…ility

87 new tests covering:
- CHAIN_GAS_USD constants (pinned values, completeness)
- pack/unpack fixed-gas params (roundtrip, shape, position)
- pool_loss_fixed_gas (zero-when-perfect, matches free-gas, gradients)
- per-pool fit fixed-gas (gas_usd pinned, gas_fixed flag, loss decreases)
- fit_all_pools with fix_gas_to_chain (chain-level gas matching)
- TOKEN_MAP resolution (wrapped native, LSTs, stablecoins, vault tokens)
- compute_binance_pair_volatility (synthetic data, edge cases)
- replace_panel_volatility_with_binance (immutability, no NaN)
- joint fit fixed-gas (prepare, pack/unpack, loss, bounds, fit, predict)
…issing coverage

Replace near-vacuous tests with substantive ones:
- test_loss_with_heterogeneous_y (trivial !=) → test_day_indices_affect_loss
- test_predict_with_nonzero_attrs (conditional guard) → test_predict_matches_linear_model
- test_stable_vs_volatile_uses_single_asset (20x range) → hand-computed ground truth

Add missing coverage:
- PCHIP boundary clamping (cadence below min, gas above max)
- replace_panel_volatility correctness (replaced values match compute_binance_pair_volatility)
- Ground truth recovery, pinned loss values, OLS coefficient pinning

Fix misleading name: test_grad_invariant_to_fixed_gas_perturbation → test_grad_changes_with_gas
Add CalibrationModel coordinator and 5 Head implementations (PerPoolHead,
FixedHead, LinearHead, PerPoolNoiseHead, SharedLinearNoiseHead) so that
new model variants (MLP, delta heads, Huber loss) require only a new Head
+ tests, not edits across the codebase. All 207 existing tests pass
unchanged; 69 new tests added (276 total).
Two-layer MLP (x_attr → Dense(hidden, ReLU) → Dense(1)) with He
initialization, L2 regularization on weights, and warm-start from
per-pool fits. 16 unit tests + 6 integration tests with CalibrationModel.
Two-layer MLP (x_attr, Dense(hidden, ReLU), Dense(K_OBS)) that replaces
the linear SharedLinearNoiseHead for the noise coefficient mapping.
Initialized with W2=0 so output starts at pooled OLS noise coefficients.
16 unit tests + 6 CalibrationModel integration tests.
…S Hessian

Add MLP calibration and sweep scripts.
- MLPHead/MLPNoiseHead init uses lstsq warm-start for W2 instead of
  zeros (fixes zero-iteration L-BFGS bug)
- Best hyperparameters from sweep: alpha_cad=0.001, alpha_noise=0.1,
  maxiter=5000
- MLP noise R²=0.575 (vs Option C 0.612) but cadence is degenerate —
  noise head absorbs arb volume, decomposition not identified
- Add sweep script and analysis doc
- Add output_lo/output_hi to LinearHead and MLPHead for cadence bounds
- Add run_two_stage_joint() and _extract_two_stage_per_pool() to MLP
  calibration script
Remove sigma- and fee-dependent features from observation covariates
so the arb channel is the only path for volatility-driven volume
variation (see docs/noise_covariate_design.md). build_x_obs gains
reduced=True, per_pool_fit derives k_obs from data shape, and
prepare_joint_data forwards reduced_x_obs.
PerPoolNoiseHead, SharedLinearNoiseHead, and MLPNoiseHead accept
k_obs=4 to match the reduced x_obs. Defaults to K_OBS=8 so existing
usage is unchanged.
Add reclamm_calibrated_noise_volume (c_0..c_7 log-linear model with
TVL, volatility, fee interactions, and DOW harmonics). Wire through
all 4 reserve calculation paths with dow_sin/dow_cos scan inputs.
Consolidate volatility/DOW array prep into _prepare_noise_arrays.
…tring in static dict

Read n_evaluation_points from optuna_settings instead of hardcoding 20.
Keep startDateString in the static fingerprint dict — the calibrated
noise model needs it to compute day-of-week arrays.
Add encode_tokens() to build token index, per-pool token/chain
assignments, and token covariate matrix (D_TOKEN=5) from the matched
pool set. Token classification via symbol lookup for stablecoins,
ETH derivatives, and L1 natives. Market cap from hardcoded values
or JSON fallback.

Foundation for the token-factored noise head where pool noise
coefficients decompose as u[token_a] + u[token_b] + alpha[chain]
+ beta_fee * log(fee) + delta_i.
noise_coeffs_i = u[token_a] + u[token_b] + alpha[chain]
               + beta_fee * log(fee) + delta_i

Token effects regularized toward x_token @ Gamma (population
prediction from market cap and asset class). Per-pool deltas
L2-regularized for partial pooling. Warm-start init decomposes
Option C noise_coeffs into token/chain/fee effects via lstsq.

predict_new_pool() handles seen tokens (learned u_t), unseen
tokens (Gamma fallback), and unseen chains (zero alpha).

Comprehensive tests: additivity, regularization, warm-start
round-trip, gradient finiteness, new-pool prediction for
seen/unseen tokens/chains.
Add prepare_token_factored_data() combining joint data preparation
with token encoding. End-to-end tests verify TokenFactoredNoiseHead
fits through CalibrationModel with PerPoolHead(cadence) +
FixedHead(gas), both cold-start and warm-started from Option C.
Add run_option_c_reduced() for 4-covariate per-pool fits and
run_reduced_joint() for joint MLPNoiseHead with k_obs=4. Wire
reduced model into the comparison pipeline with correct x_obs
dispatch in compute_per_pool_predictions(). Save reduced Option C
results to JSON immediately for downstream use.
Full pipeline: Phase 0 pooled Ridge diagnostic (baseline vs pool attrs
vs token dummies vs full), token-factored fit with lambda_delta sweep,
token/chain/delta analysis tables, leave-one-pool-out cross-validation
comparing LOO R² to Option C in-sample R², and diagnostic plots.

Phase 0 results: token dummies +0.091 vs pool attrs +0.072 above
baseline (R²=0.058), confirming compositional structure exists.
LOO results: median R²=0.33 vs Option C 0.59, 7/36 wins — the
static coefficient prediction bottleneck limits transfer to unseen
pools. This motivates lagged cross-pool features.
Load per-pool noise_coeffs from calibration JSON, derive
arb_frequency from calibrated log_cadence, and pick up token pair,
fee, and gas from pool metadata. Supports both 4-covariate (reduced)
and 8-covariate (full) noise coefficient formats. Add
--n-eval-points flag for evaluation sub-window control.
Token canonicalization (_CANON_MAP) maps wrapped/derivative tokens to
their base symbols (WETH→ETH, waBasWETH→ETH, WBTC→BTC, etc.),
reducing the token graph from ~32 to ~22 unique tokens and thickening
peer groups for cross-pool information sharing.

Cross-pool lag features (build_cross_pool_x_obs, K_OBS_CROSS=7) enrich
observation-level covariates with lagged peer volume averages for
token A, token B, and chain — so daily noise predictions can adapt to
market conditions without autoregressive cold-start issues.
Re-evaluates pool loss functions at the optimum to decompose total loss
into data_loss (mean per-pool MSE) and reg_loss (head regularization).
Enables tracking whether lambda annealing is reducing data fit or just
shrinking regularization.
Lambda sweep now runs descending (high→low regularization) with each
fit warm-starting from the previous result. Runner runs two ablations
side-by-side: baseline (K_OBS_REDUCED=4) vs cross-pool (K_OBS_CROSS=7),
reporting separated data/reg loss and LOO R² for each configuration.

prepare_token_factored_data() gains cross_pool parameter to swap in
cross-pool lag features automatically.
TokenFactoredNoiseHead.init() now zero-pads when warm_start
noise_coeffs are shorter than k_obs (e.g. warm-starting k_obs=7 from
k_obs=4 Option C results).

prepare_token_factored_data(cross_pool=True) now trims y_obs and
day_indices to match the first-day-dropped x_obs from
build_cross_pool_x_obs.

Runner gains --cross-pool-only flag with pickle caching of stage 1
(Option C + filtering) and baseline results so ablation 2 can run
independently. Baseline-missing paths handled gracefully.

Adds TestPrepareTokenFactoredCrossPool with shape consistency tests
that would have caught the broadcast error.
Diagnostic experiments establishing the cross-pool prediction landscape:
- run_cross_pool_diagnostics: lambda_token sweep, leave-one-in, AR1
  baseline, pool connectivity analysis
- run_cross_pool_linear: ridge regression (peers only, peers+own lag,
  LOO with overlap transfer, 30d burn-in, peer mean)
- run_cross_pool_noise_linear: same battery on noise residuals
  (log_vol - log_V_arb)
- run_residual_comparison: apples-to-apples R² on noise residual
  target across all methods including Option C
- run_deepsets_volume: DeepSets on total volume (v1, raw)
- run_deepsets_noise: DeepSets with V_arb decomposition and Optuna
- run_deepsets_v2: full feature menu with Optuna feature selection,
  trains on total volume, evaluates on noise residual

Key findings: ridge in-sample peers+own = 0.599 (matching Option C),
but cross-pool signal is almost entirely shared arb response — noise
residual ridge ceiling is 0.098. Option C noise residual R² = 0.060.
MatthewWilletts and others added 22 commits March 23, 2026 13:08
…rdization fix

- noise_model_arrays.py: new module to precompute noise_base and
  noise_tvl_coeff arrays from trained artifact. Decomposes per-pool
  coefficients into TVL-dependent and TVL-independent components.
  Returns tvl_mean/tvl_std for runtime standardization.

- noise_trades.py: add tvl_mean/tvl_std params to
  reclamm_market_linear_noise_volume() — standardizes log(TVL) at
  runtime to match training scale. Fixes NaN blowup from raw TVL.

- reclamm_reserves.py: pass noise_params (tvl_mean, tvl_std) through
  to market_linear dispatch.

- reclamm.py: cache loaded noise arrays on pool instance to avoid
  repeated disk reads. Support noise_arrays_path in fingerprint.

- tune_reclamm_calibrated_noise.py: add --noise-model flag
  (calibrated vs market_linear), --artifact-dir, --initial-pool-value.
  Save precomputed arrays to disk, pass path + tvl stats via fingerprint.
  Default dates adjusted to panel coverage period. 100 trials × 3
  objectives all complete successfully.

- plot_reclamm_optuna_result.py: forward noise_arrays_path in
  run_full_period() for market_linear re-runs.
- run_deconfounder_noise.py: Four-strategy causal analysis of b_tvl:
  1. Variance decomposition (62% between-pool, 38% within-pool)
  2. Within-pool Δ regressions (median b_tvl=+0.12, daily too fast)
  2b. Lagged-average TVL across windows (stable ~0.95 at all horizons)
  3. TVL decomposition: price-driven vs flow-driven (IV-style)
  4. Deconfounder sensitivity (Wang & Blei 2019, n_factors sweep)
  D'Amour critique acknowledged in docstring. Ridge warm-start,
  standardized Z_hat. Convergent finding: per-pool b_tvl ~1.0 is
  the right working estimate for counterfactuals.

- scan_lp_events.py: Scan all pools for large LP deposit/withdrawal
  events (semi-exogenous TVL shocks). Filters pool creation events
  via min-age and min-tvl. Computes per-event elasticity from
  ±window day volume comparison. 836 events across 118 pools,
  median elasticity +0.84 (clean: +0.98, OLS: +0.89).
  Saves CSV + generates plots: elasticity histograms, deposits vs
  withdrawals, elasticity vs pool size, log-log scatter with OLS,
  boxplot by chain. No asymmetry between deposits/withdrawals,
  flat across pool sizes and chains.
Validates the full model (PCHIP arb + per-pool linear noise) against
the AAVE/WETH natural experiment (70x TVL increase from LP deposit).

Model predicts 44.3x total volume increase vs 39.2x observed (113%
accuracy). V_arb carries 111x through PCHIP grid, V_noise adds 7.4x
through the noise model (raw elasticity 0.42). Combined response
matches observed despite individual channels having different
elasticities from the event study total.

Also evaluates counterfactual noise volumes at arbitrary TVL levels,
using median pre-deposit market features with only TVL varying.
…volume_zscore features

- run_mlp_noise.py: MLP noise model with learnable cadence, no panel
  dependency. Uses only Binance market data + pool TVL. Supports
  variable depth/width, per-pool bias, optax cosine LR decay, and
  Optuna sweep over architecture + hyperparameters.
  Best eval R² = 0.39 (matches linear baseline) with [16,8,4].
  In-sample R² = 0.70 with [128,64,32] — overfits on temporal split.

- market_features.py: add volume_zscore feature — within-token rolling
  z-score of daily Binance USD volume (today vs 30d trailing mean/std).
  Captures "unusually active day for this token" without cross-token
  scale issues. Added for BTC, token A, and token B.
…eline

- noise_model_arrays.py: rewrite build_simulator_arrays to use Binance
  parquets directly (no panel/API dependency). Takes token_a, token_b
  + date range, builds all features from market data. Works for any
  date range covered by Binance data. Tested: 639 days for AAVE/ETH.

- tune_reclamm_calibrated_noise.py: update to new build_simulator_arrays
  interface (token_a/token_b instead of pool_id + matched_clean).
  Extended date range (2024-06 to 2026-03) now works.

- run_mlp_noise.py: add Optuna sweep (--tune), optax cosine LR decay
  (--cosine), pool attributes (--pool-attrs).
Two bugs in noise fee income application for reClAMM pools:

1. Cadence scaling: noise model returns per-minute volume but was
   applied once per arb step (every arb_frequency minutes) without
   scaling. Now multiplies by minutes_per_step. At cadence=5, this
   was underestimating noise fee income by 5x.

2. Price preservation: uniform real-reserve scaling (Ra*s, Rb*s)
   preserves price for weighted pools but NOT for 2-CLPs where
   price depends on effective reserves (Ra+Va)/(Rb+Vb). Fixed by
   scaling effective reserves uniformly then subtracting virtuals:
   Ra_new = (Ra+Va)*scale - Va. Preserves quoted marginal price.
   Total value added still equals noise_fee_income (verified
   algebraically: effective_value * (scale-1) = fee_income).

Both fixes applied to all CLP noise model variants (tsoukalas,
loglinear, calibrated, market_linear).

Also: tune script adds L-BFGS support, 25% protocol fee split,
extended date range, $7M default TVL. New plotting scripts for
model vs real comparison.
Feature scaling reform in build_data(): TVL and BTC log_price kept in
raw log scale (absolute level matters), returns/trends/volume_zscore
unscaled, volatilities lightly centered. Eliminates global z-score that
squeezed TVL into [-2,+2] and prevented models from learning TVL response.

Add ±3σ clamp on standardized log(TVL) in reclamm_market_linear_noise_volume
to prevent extreme concentration from wireheading the noise model.

Change default protocol_fee_split from 0.0 to 0.25 to match reClAMM
production configuration.
Add CMA-ES as third optimisation method in tune_reclamm_calibrated_noise.py
alongside Optuna and BFGS, with population_size, sigma0, n_generations
controls.

Add min_train_returns_over_hodl rejection in Optuna objective: trials
with catastrophic in-sample returns_over_hodl are rejected early
(return -inf) to avoid wasting evaluation budget.
Michaelis-Menten noise model (run_mm_noise.py): structural TVL saturation
via V_noise = alpha_i * TVL/(K_i + TVL) * exp(x_market @ gamma_i), with
learned EWMA smoothing on TVL (discovery lag). Per-pool alpha, K, gamma;
shared lambda. Achieves R²=0.64 matching per-pool linear while adding
saturation (K_med ~$19M).

MLP noise model (run_mlp_noise.py): Optuna sweep with per-trial model
saving, TVL response check at sweep end. Investigation showed shared MLP
cannot learn TVL relationship due to cross-pool confounding.

Model comparison (run_model_comparison.py): linear vs MLP noise model
across TVL levels with time series and summary plots.

MM fit plotting (plot_mm_noise_fit.py): 6-panel per-pool time series,
cross-pool TVL response/elasticity curves, K distribution analysis.
MM model now supports per-pool log_K (default) and shared Binance-volume
K (--shared-K). Per-pool K with per-pool gamma achieves R²=0.66 at 20K
epochs with structural TVL saturation (median K≈$2M).

Optuna sweep searches lr, l2, huber_delta, init_log_K, per_pool_gamma.
Best shared-gamma eval R²=0.42 (huber=0.5, lr=1e-4 consistently).

Removed learned EWMA (lambda stayed near 1, no benefit).
Removed TVL interaction features (hurt eval R², subsumed by K).

Plot script handles both K modes, shows all pools by default.

New: verify_vol_volume_slope.py — cross-pool volume/TVL analysis
confirming sublinear scaling (TVL^0.80, R²=0.79) and TVL elasticity
~0.9 across fee tiers.
New: fetch_competitor_tvl.py fetches historical TVL from DeFi Llama for
all competing pools per token pair (same-chain), computes effective K
via network conductance model (direct + multi-hop through hub tokens
WETH, WSTETH, USDC, USDT, DAI, WBTC with harmonic mean for series
combination). Self-exclusion via own TVL subtraction.

MM model (run_mm_noise.py) now supports --observed-K flag: K is fixed
from DeFi Llama data (0 learned K params), with per-pool alpha + gamma
learning the noise level and temporal variation. R²=0.625 with
economically meaningful K values (AAVE/WETH: $89M, USDC/WETH: $432M).

New noise function: reclamm_mm_observed_noise_volume() in noise_trades.py
evaluates V_noise = exp(base) * TVL/(K+TVL) per minute.

New array builder: build_mm_simulator_arrays() in noise_model_arrays.py
precomputes noise_base + competitor_tvl minute arrays for the simulator.
MatthewWilletts and others added 7 commits April 7, 2026 15:45
Wire reclamm_mm_observed_noise_volume through the full simulator:
- reclamm.py: load noise_base + competitor_tvl arrays from npz
- reclamm_reserves.py: dispatch mm_observed in scan step, append
  competitor_tvl to scan_inputs alongside noise_base
- noise_model_arrays.py: build_mm_simulator_arrays() precomputes
  both arrays from MM model artifact + DeFi Llama competitor TVL
- jax_runner_utils.py: add noise array keys to _TRAINING_ONLY_FIELDS

Fingerprint usage: noise_model="mm_observed",
noise_arrays_path="path/to/arrays.npz"
tune_reclamm_calibrated_noise.py now supports --noise-model mm_observed
which uses the Michaelis-Menten model with observed competitor TVL from
DeFi Llama as K. Precomputes noise_base + competitor_tvl arrays via
build_mm_simulator_arrays, saves to npz, passes path in fingerprint.

Usage:
  python experiments/tune_reclamm_calibrated_noise.py \
    --noise-model mm_observed \
    --artifact-dir results/mm_noise \
    --competitor-tvl-path results/competitor_tvl/competitor_tvl.npz
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants