perf: reuse ProcessPoolExecutor + as_completed + drop per-miner Simulation rebuild in sync_forward_multiprocess

## Context

Follow-up to #253 (ticket 05), which bounded each miner call at `timeout` seconds of wall clock. Today's dendrite forward sits at ~148 s under a 120 s timeout — the extra ~28 s is main-process overhead around the I/O phase. Three structural changes in `synth/base/dendrite_multiprocess.py:sync_forward_multiprocess` can squeeze that overhead.

## Proposed changes (land together)

**1. Reuse the `ProcessPoolExecutor` across cycles.**

Currently (line 338): `with concurrent.futures.ProcessPoolExecutor(nprocs) as executor:` forks `nprocs` fresh children every cycle, each re-importing `bittensor`/`httpx`/`uvloop`, and tears them down at the end. `synth/validator/reward.py:147` already uses the module-level singleton pattern for the CRPS workers; apply the same here.

**2. Replace `executor.map(...)` with `as_completed(...)`.**

`executor.map` yields chunks in submission order — main-process CPU work (unpickling ~90 MB of floats per chunk, rebuilding the synapse, validation) is serialised *after* the slowest chunk finishes. `as_completed` lets the main thread start processing each chunk the moment it lands, overlapping with remaining I/O. Requires carrying the chunk index / original axon index into the work item so final ordering is preserved.

**3. Drop the per-miner `Simulation(...).from_headers(...) + model_copy()` rebuild.**

For every miner (~240 per cycle) the current code constructs a fresh `Simulation` Pydantic model, calls `from_headers()`, assigns the ~289k-float `simulation_output`, and calls `model_copy()`. The caller (`synth/validator/forward.py:231`) only uses three fields. Return plain `(response, process_time)` tuples (or a `dict` keyed by `miner_uid`) and let the caller index by uid directly.

## Why this will help

- Change 1 eliminates per-cycle fork + import cost (several seconds on cold fork, worse under memory pressure).
- Change 2 pipelines main-process CPU work with remaining network I/O; the more main-process CPU there is to do, the bigger the overlap win.
- Change 3 removes ~240 Pydantic model constructions, 240 `from_headers` calls, and 240 `model_copy()` operations per cycle — each touches the large `simulation_output`.

Combined upper-bound reclaim: ~23 s off forward (148 s → ~125 s) with current miner behaviour.

## Priority

Low relative to:
- batched `save_responses` inserts (#<TBD-02>): ~85 s reclaim from save.
- vectorised `validate_responses_v2` (#<TBD-03>): ~30 s reclaim between forward and save.

Pick up after those land, or sooner if `as_completed` overlap starts being valuable (miners hitting full timeout consistently again).

## Risk / review notes

- `sync_forward_multiprocess` return type changes. Caller in `query_available_miners_and_save_responses` needs the matching update (positional `miner_uids[i]` → keyed by uid, or carry index through).
- Singleton pool lifecycle: register a shutdown hook so workers are cleaned up on SIGTERM, but never inside the normal cycle path.
- Workers that crash are replaced by `ProcessPoolExecutor` — no behaviour change there.
- Individual workers can't hang past `timeout` because of the `asyncio.wait_for` from #253.

## Verification plan

- Timing logs around pool acquisition, per-chunk completion, per-miner post-processing. Before/after on staging against a replay.
- Success criteria: forward under timeout + ~10 s overhead; no regression in `miner_predictions` row count or `format_validation` distribution; no leaked workers after SIGTERM.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reuse ProcessPoolExecutor + as_completed + drop per-miner Simulation rebuild in sync_forward_multiprocess #254

Context

Proposed changes (land together)

Why this will help

Priority

Risk / review notes

Verification plan

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

perf: reuse ProcessPoolExecutor + as_completed + drop per-miner Simulation rebuild in sync_forward_multiprocess #254

Description

Context

Proposed changes (land together)

Why this will help

Priority

Risk / review notes

Verification plan

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions