Initialize and reuse V1 renderer pools efficiently#1791
Conversation
ApprovabilityVerdict: Needs human review This PR modifies concurrency patterns for renderer pool initialization and client caching. An unresolved review comment raises a substantive concern about a missing upstream dependency fix that could cause concurrent tokenizer loading issues at runtime. You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c16b823f2d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
c16b823 to
0726607
Compare
0726607 to
0528816
Compare
Dismissing prior approval to re-evaluate 0528816
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 0528816. Configure here.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1f165371f8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| self._pool_task = asyncio.create_task( | ||
| asyncio.to_thread( |
There was a problem hiding this comment.
Pin renderers before offloading pool creation
This moves create_renderer_pool into a worker thread, but the package metadata still allows renderers>=0.1.8.dev40 and the lockfile still resolves 0.1.8.dev43; the required upstream fix (renderers#91) is still open. Fresh evidence beyond the prior comment is that this commit did not bump or lock the dependency, so installs resolving the current range can still run the old process-wide fastokens/Transformers patch concurrently with environment or harness tokenizer loads during the first train request.
Useful? React with 👍 / 👎.
1f16537 to
86ff846
Compare

Overview
Initialize V1 training renderer pools through one shared background task, and reuse one explicitly pinned renderer client across LoRA/checkpoint sampling IDs. Cold tokenizer construction no longer blocks the env-server event loop, while adapters that share a base tokenizer no longer retain duplicate pools.
Dependency
This PR must not merge until PrimeIntellect-ai/renderers#91 is merged, released, and resolved by Verifiers. That upstream change removes the process-wide fastokens patch window, which is required before renderer construction can safely overlap unrelated tokenizer work.
Design
Off-loop initialization
The first
TrainClientcaller stores anasyncio.Taskbefore yielding, and that task runscreate_renderer_poolthroughasyncio.to_thread. Concurrent first requests share one initialization. Waiters useasyncio.shield, so cancelling one rollout does not cancel startup for others; the completed task caches either the renderer pool or its startup failure.Pinned renderer reuse
EnvServercaches a nativeTrainClientConfigunder its serialized config plus an explicitrenderer_model_name. LoRA/checkpoint request models that use the same pinned base tokenizer therefore share one client and pool. Unpinned training clients and eval clients remain keyed by request model.The request model is still stored in each
RolloutContextand passed to generation, so adapter routing is unchanged.Performance
Off-loop initialization, using 64 concurrent first callers and a 200 ms synchronous initializer:
Pinned-client reuse, using eight adapter IDs, 64 requests per adapter, concurrency 128, and a locally cached Qwen tokenizer:
Note
Medium Risk
Touches rollout hot paths (async pool lifecycle and shared clients across adapter IDs); behavior is covered by new tests but depends on correct shielding and cache-key semantics for training vs eval.
Overview
TrainClient now builds renderer pools on a background thread via a single
asyncio.Task, so concurrent first callers share one initialization and the env-server loop is not blocked by synchronouscreate_renderer_pool. Waiters useasyncio.shieldso rollout cancellation does not abort startup; a failed init is cached and replayed for later calls.EnvServer client cache keys training clients with an explicit
renderer_model_namewhen set, so multiple LoRA/adapter request models share one TrainClient and pool whileRolloutContextstill passes the per-request model into generation.Adds tests/v1/test_serve.py covering cache key behavior, one-pool routing for many pinned-adapter requests, and single-shot failure caching under concurrency.
Reviewed by Cursor Bugbot for commit 86ff846. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Initialize and reuse V1 renderer pools efficiently across concurrent callers
TrainClient._renderer_poolin train.py is now async and usesasyncio.create_task+asyncio.shieldto ensure the renderer pool is created exactly once, even under concurrent access.create_renderer_pool.EnvServer._clientin server.py now keysTrainClientConfigcache entries byrenderer_model_name(when set), so clients with the same pinned renderer model are shared across different adapters/models.TrainClientConfigclients retain per-model caching; pinned clients now share a singleTrainClientinstance and its renderer pool across adapters.Macroscope summarized 86ff846.