Skip to content

Trading metrics + equity curve in backtest report (ABI v2)#31

Merged
luisleo526 merged 15 commits into
mainfrom
feat/trading-metrics
Jun 11, 2026
Merged

Trading metrics + equity curve in backtest report (ABI v2)#31
luisleo526 merged 15 commits into
mainfrom
feat/trading-metrics

Conversation

@luisleo526

Copy link
Copy Markdown
Collaborator

What

Adds a comprehensive trading-metrics suite and the raw per-script-bar equity curve to pf_report_t, validated against a real TradingView export and two independent quant libraries.

  • New ABI surface (PF_ABI_VERSION 2): pf_metrics_t (trade-stat blocks for All/Long/Short + equity stats) and pf_equity_point_t* equity_curve appended to pf_report_t; commission / entry_bar_index / exit_bar_index appended to pf_trade_t. New pf_abi_version() export — pf_report_t is caller-allocated, so every harness now asserts the version before running (old mirror = silent stack corruption).
  • Metrics: net/gross profit ±%, profit factor, win rate, avg/largest win/loss ±%, expectancy, win/loss streaks, avg bars in trade, commission paid; max equity drawdown/run-up ±%, buy & hold, TV-method monthly Sharpe/Sortino (RF 2%, chart-tz month bucketing) + per-bar density-annualized variants, CAGR, Calmar, recovery factor, time in market, open P&L. Conventions pinned per-field in pineforge.h doxygen.
  • Collection: one equity point per script bar with magnifier-invariant timestamps (ab.bar.timestamp; pinned by a magnifier on/off bit-identity test), reset-safe for handle reuse.
  • Computation: pure functions in src/engine_metrics.cpp (no engine dependency), dd/run-up walk mirrors update_equity_extremes (enforced by integration test + reciprocal comments).
  • TV-arbitrated corrections (vs a real Strategy Tester export of composite-4emarsi-integration-01, 336 trades): per-trade pnl_pct is now net return-on-cost (closes long-standing O5 finding; TV trade #258: 102.44 USD / 2276.66 entry ⇒ 4.50%, old price-ratio formula said 4.72%); largest_*_pct tracked independently of the largest-USD trade; avg_bars counted inclusively.
  • All Python ctypes mirrors updated + ABI-guarded: scripts/run_strategy.py, tutorials, docker/run_json.py, bench script. Docs FFI pages (Python/Rust) and report schema synced.
  • New optional scripts/crossvalidate_metrics.py (venv-based, no repo deps).

Validation

  • ctest: 71/71 (new test_metrics: 1200+ checks incl. closed-form Sharpe/Sortino oracles 19/20 and 114/61, non-UTC month bucketing, magnifier bit-identity, handle-reuse reset, engine-vs-walk drawdown invariant). ASan clean.
  • Corpus: full sweep holds the exact baseline — excellent=245, anomaly=1; regenerated trade CSVs differ only in the Net PnL % column (expected from the pnl_pct correction; verified across all 158 changed files).
  • Bench: with-magnifier hot loop at parity vs same-machine baseline control (median ratio 0.995); collection is one push_back per script bar outside the sub-bar loop.
  • TradingView: every field both sides compute in the exported panels matches within TV 2-dp rounding (counts/PF/percent-profitable exact; USD deltas ≤0.34 total from known 2-trade price drift).
  • quantstats 0.0.81 + empyrical-reloaded 0.5.12: max DD ±%, Sharpe/Sortino (both variants), CAGR, Calmar, recovery factor all match to ≤1e-11 relative on two strategies; every residual delta proven to be a library convention (geometric vs arithmetic RF, abs()-based recovery).

Known deltas / follow-ups

  • TV "Max run-up (close-to-close)" 426.44 vs engine 406.33 — run-up definition difference (engine mirrors its pre-existing trough-reset-on-new-peak semantics); needs TV Run-ups/Drawdowns tab semantics pinned before changing anything.
  • TV Sharpe/Sortino and max-DD panel values still unexported (Risk-adjusted performance / Drawdowns tabs) — engine values are library-validated meanwhile.
  • Sibling repo pineforge-codegen-oss branch feat/abi-v2-metrics (compile-test fix for the new version.h include) must merge in lockstep; codegen-mcp docker image needs a rebuild.
  • Corpus submodule has regenerated engine_trades.csv files (Net PnL % column) — submodule bump intentionally left out of this PR.

🤖 Generated with Claude Code

luisleo526 and others added 15 commits June 11, 2026 22:51
…ndex fields, pf_abi_version (PF_ABI_VERSION=2)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rsion C-side; document include-order

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…, reset-safe

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…t tests

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…arpe/sortino, cagr/calmar) + unit tests

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…oracles; document tz-lock contract

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…_version

pf_report_t grew in ABI v2 (pf_metrics_t by value + equity_curve ptr +
int64 len appended; pf_trade_t gained commission + entry/exit bar
indices). pf_report_t is CALLER-allocated, so every ctypes harness that
allocated it from the old field list would under-size the buffer and
let the runtime write past it.

Updated every ctypes mirror in the repo and added an ABI guard at every
CDLL load site (pf_abi_version() must equal 2; a missing symbol means a
pre-v2 .so and is rejected with a rebuild hint):

- scripts/run_strategy.py   (corpus harness; trade dicts now also carry
  commission + entry/exit bar_index)
- tutorial/run.py           (mirrors + reusable check_abi)
- tutorial/run_mtf.py       (guard via run.check_abi)
- tutorial/run_advanced.py  (guard via run.check_abi)
- docker/run_json.py        (codegen-mcp container harness)
- benchmarks/throughput/grid_search_repro.py

Mirror layouts verified against include/pineforge/pineforge.h:
ctypes sizeof matches the C side exactly (trade 96, trade_stats 216,
equity_stats 120, metrics 768, equity_point 24, report 944; metrics/
equity_curve/equity_curve_len offsets 160/928/936).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…pty-curve coverage

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…urve, pf_abi_version)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…c reference from source comment

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…n-cost pnl_pct, independent largest-%, inclusive avg bars

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…SOP in CLAUDE.md

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@luisleo526 luisleo526 merged commit 1bd075c into main Jun 11, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant