feat: add scaffolding rollout workflow#1064
feat: add scaffolding rollout workflow#1064WeiHaocheng wants to merge 20 commits intoinclusionAI:mainfrom
Conversation
|
Warning Gemini encountered an error creating the summary. You can try again by commenting |
…fixes - Fix data race in ScaffoldingLlm by cloning controller synchronously before async handoff - Fix sampling_params propagation in GSM8KScaffoldingWorkflow (delegate to parent build_scaffolding_llm) - Simplify controllers.py by removing unused code paths - Add chat_scaffolding example with YAML config - Add 2-node GSM8K RLVR scaffolding config - Simplify MathVerifyWorker (remove signal-based timeout) - Increase Ray scheduler startup_timeout to 600s for large clusters Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert controllers.py to scaffolding_pr state (no changes needed) - Revert workflow.py arun_episode to original approach (set task_data on trajectory_maker before generate_async) - Keep synchronous clone in scaffolding_llm.py but remove unused **kwargs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rams Without this, NativeGenerationController() is created with empty sampling_params, so max_tokens/temperature/stop are never set on tasks. SGLang defaults to ~16 tokens, producing near-zero rewards. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep explicit controller/trajectory_maker construction but pass max_tokens and temperature from gconfig to NativeGenerationController. Without sampling_params, SGLang defaults to ~16 tokens output. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: improve scaffolding workflow with multi-worker support and bug fixes
garrett4wade
left a comment
There was a problem hiding this comment.
LGTM in general. Since the scaffolding module is relatively mature and not necessarily related to AReaL's core modules, I recommend to move all the new files into the examples directory. It'll be a complete and standalone example.
| if self.worker is None: | ||
| self._lazy_init_scaffolding(engine) |
There was a problem hiding this comment.
A new rollout workflow object is created for each trajectory. It will also create a ScaffoldingLlm object. Will this be very expensive?
Move code to the example. |
Description
Related Issue
Fixes #(issue)
Type of Change
Checklist
pre-commit run --all-files)./docs/build_all.sh)main/review-prcommand/create-prBreaking Change Details (if applicable):
Additional Context
Need help? Check the Contributing Guide or ask in
GitHub Discussions!