Trading metrics + equity curve in backtest report (ABI v2) by luisleo526 · Pull Request #31 · pineforge-4pass/pineforge-engine

luisleo526 · 2026-06-11T18:01:51Z

What

Adds a comprehensive trading-metrics suite and the raw per-script-bar equity curve to pf_report_t, validated against a real TradingView export and two independent quant libraries.

New ABI surface (PF_ABI_VERSION 2): pf_metrics_t (trade-stat blocks for All/Long/Short + equity stats) and pf_equity_point_t* equity_curve appended to pf_report_t; commission / entry_bar_index / exit_bar_index appended to pf_trade_t. New pf_abi_version() export — pf_report_t is caller-allocated, so every harness now asserts the version before running (old mirror = silent stack corruption).
Metrics: net/gross profit ±%, profit factor, win rate, avg/largest win/loss ±%, expectancy, win/loss streaks, avg bars in trade, commission paid; max equity drawdown/run-up ±%, buy & hold, TV-method monthly Sharpe/Sortino (RF 2%, chart-tz month bucketing) + per-bar density-annualized variants, CAGR, Calmar, recovery factor, time in market, open P&L. Conventions pinned per-field in pineforge.h doxygen.
Collection: one equity point per script bar with magnifier-invariant timestamps (ab.bar.timestamp; pinned by a magnifier on/off bit-identity test), reset-safe for handle reuse.
Computation: pure functions in src/engine_metrics.cpp (no engine dependency), dd/run-up walk mirrors update_equity_extremes (enforced by integration test + reciprocal comments).
TV-arbitrated corrections (vs a real Strategy Tester export of composite-4emarsi-integration-01, 336 trades): per-trade pnl_pct is now net return-on-cost (closes long-standing O5 finding; TV trade #258: 102.44 USD / 2276.66 entry ⇒ 4.50%, old price-ratio formula said 4.72%); largest_*_pct tracked independently of the largest-USD trade; avg_bars counted inclusively.
All Python ctypes mirrors updated + ABI-guarded: scripts/run_strategy.py, tutorials, docker/run_json.py, bench script. Docs FFI pages (Python/Rust) and report schema synced.
New optional scripts/crossvalidate_metrics.py (venv-based, no repo deps).

Validation

ctest: 71/71 (new test_metrics: 1200+ checks incl. closed-form Sharpe/Sortino oracles 19/20 and 114/61, non-UTC month bucketing, magnifier bit-identity, handle-reuse reset, engine-vs-walk drawdown invariant). ASan clean.
Corpus: full sweep holds the exact baseline — excellent=245, anomaly=1; regenerated trade CSVs differ only in the Net PnL % column (expected from the pnl_pct correction; verified across all 158 changed files).
Bench: with-magnifier hot loop at parity vs same-machine baseline control (median ratio 0.995); collection is one push_back per script bar outside the sub-bar loop.
TradingView: every field both sides compute in the exported panels matches within TV 2-dp rounding (counts/PF/percent-profitable exact; USD deltas ≤0.34 total from known 2-trade price drift).
quantstats 0.0.81 + empyrical-reloaded 0.5.12: max DD ±%, Sharpe/Sortino (both variants), CAGR, Calmar, recovery factor all match to ≤1e-11 relative on two strategies; every residual delta proven to be a library convention (geometric vs arithmetic RF, abs()-based recovery).

Known deltas / follow-ups

TV "Max run-up (close-to-close)" 426.44 vs engine 406.33 — run-up definition difference (engine mirrors its pre-existing trough-reset-on-new-peak semantics); needs TV Run-ups/Drawdowns tab semantics pinned before changing anything.
TV Sharpe/Sortino and max-DD panel values still unexported (Risk-adjusted performance / Drawdowns tabs) — engine values are library-validated meanwhile.
Sibling repo pineforge-codegen-oss branch feat/abi-v2-metrics (compile-test fix for the new version.h include) must merge in lockstep; codegen-mcp docker image needs a rebuild.
Corpus submodule has regenerated engine_trades.csv files (Net PnL % column) — submodule bump intentionally left out of this PR.

🤖 Generated with Claude Code

…ndex fields, pf_abi_version (PF_ABI_VERSION=2) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…rsion C-side; document include-order Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…, reset-safe Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…t tests Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…arpe/sortino, cagr/calmar) + unit tests Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…oracles; document tz-lock contract Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…_version pf_report_t grew in ABI v2 (pf_metrics_t by value + equity_curve ptr + int64 len appended; pf_trade_t gained commission + entry/exit bar indices). pf_report_t is CALLER-allocated, so every ctypes harness that allocated it from the old field list would under-size the buffer and let the runtime write past it. Updated every ctypes mirror in the repo and added an ABI guard at every CDLL load site (pf_abi_version() must equal 2; a missing symbol means a pre-v2 .so and is rejected with a rebuild hint): - scripts/run_strategy.py (corpus harness; trade dicts now also carry commission + entry/exit bar_index) - tutorial/run.py (mirrors + reusable check_abi) - tutorial/run_mtf.py (guard via run.check_abi) - tutorial/run_advanced.py (guard via run.check_abi) - docker/run_json.py (codegen-mcp container harness) - benchmarks/throughput/grid_search_repro.py Mirror layouts verified against include/pineforge/pineforge.h: ctypes sizeof matches the C side exactly (trade 96, trade_stats 216, equity_stats 120, metrics 768, equity_point 24, report 944; metrics/ equity_curve/equity_curve_len offsets 160/928/936). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…pty-curve coverage Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…urve, pf_abi_version) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…c reference from source comment Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…n-cost pnl_pct, independent largest-%, inclusive avg bars Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…SOP in CLAUDE.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

luisleo526 and others added 15 commits June 11, 2026 22:51

feat(abi): metrics/equity-curve report fields, trade commission+bar-i…

d44988b

…ndex fields, pf_abi_version (PF_ABI_VERSION=2) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

docs(abi): per-field metric doxygen with pinned units; test pf_abi_ve…

fd539cf

…rsion C-side; document include-order Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

feat(engine): capture per-trade commission at close; expose in TradeC

85e5f61

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

feat(engine): per-script-bar equity curve + bars-in-market collection…

f186f84

…, reset-safe Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

feat(metrics): compute_trade_stats with TV sign/NaN conventions + uni…

2938603

…t tests Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

feat(metrics): compute_equity_stats (dd walk, TV monthly + per-bar sh…

1e11730

…arpe/sortino, cagr/calmar) + unit tests Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

test(metrics): engine-vs-walk dd invariant, per-bar + non-UTC sharpe …

2cb2afd

…oracles; document tz-lock contract Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

feat(report): wire metrics + equity curve into fill_report/free_report

19e8f27

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

polish(metrics): review nits — ordering comment, test deref guard, em…

7453958

…pty-curve coverage Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

docs: sync FFI mirrors + report schema with ABI v2 (metrics, equity c…

e2f878a

…urve, pf_abi_version) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

docs(abi): truncation caveat on equity_curve len; drop gitignored-spe…

67c68a0

…c reference from source comment Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

fix(metrics): TV conventions arbitrated vs real export — net return-o…

dd624d6

…n-cost pnl_pct, independent largest-%, inclusive avg bars Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

test(metrics): cross-validation script vs quantstats/empyrical-reloaded

ff9b350

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

ci: register pf_abi_version in check_c_abi_runtime EXPECTED_RUNTIME; …

5e17538

…SOP in CLAUDE.md Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

luisleo526 merged commit 1bd075c into main Jun 11, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trading metrics + equity curve in backtest report (ABI v2)#31

Trading metrics + equity curve in backtest report (ABI v2)#31
luisleo526 merged 15 commits into
mainfrom
feat/trading-metrics

luisleo526 commented Jun 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

luisleo526 commented Jun 11, 2026

What

Validation

Known deltas / follow-ups

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant