Skip to content

feat: add scaffolding rollout workflow#1064

Open
WeiHaocheng wants to merge 20 commits intoinclusionAI:mainfrom
WeiHaocheng:scaffolding_pr
Open

feat: add scaffolding rollout workflow#1064
WeiHaocheng wants to merge 20 commits intoinclusionAI:mainfrom
WeiHaocheng:scaffolding_pr

Conversation

@WeiHaocheng
Copy link
Copy Markdown

Description

Related Issue

Fixes #(issue)

Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • 💥 Breaking change
  • 📝 Documentation update
  • ♻️ Refactoring
  • ⚡ Performance improvement
  • ✅ Test coverage improvement

Checklist

  • I have read the Contributing Guide
  • Pre-commit hooks pass (pre-commit run --all-files)
  • Relevant tests pass; new tests added for new functionality
  • Documentation updated (if applicable; built with ./docs/build_all.sh)
  • Branch is up to date with main
  • Self-reviewed via /review-pr command
  • This PR was created by a coding agent via /create-pr
  • This PR is a breaking change

Breaking Change Details (if applicable):

Additional Context


Need help? Check the Contributing Guide or ask in
GitHub Discussions!

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

luhongyu.4869 and others added 16 commits March 22, 2026 13:41
…fixes

- Fix data race in ScaffoldingLlm by cloning controller synchronously
  before async handoff
- Fix sampling_params propagation in GSM8KScaffoldingWorkflow (delegate
  to parent build_scaffolding_llm)
- Simplify controllers.py by removing unused code paths
- Add chat_scaffolding example with YAML config
- Add 2-node GSM8K RLVR scaffolding config
- Simplify MathVerifyWorker (remove signal-based timeout)
- Increase Ray scheduler startup_timeout to 600s for large clusters

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Revert controllers.py to scaffolding_pr state (no changes needed)
- Revert workflow.py arun_episode to original approach (set task_data on
  trajectory_maker before generate_async)
- Keep synchronous clone in scaffolding_llm.py but remove unused **kwargs

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rams

Without this, NativeGenerationController() is created with empty
sampling_params, so max_tokens/temperature/stop are never set on tasks.
SGLang defaults to ~16 tokens, producing near-zero rewards.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep explicit controller/trajectory_maker construction but pass
max_tokens and temperature from gconfig to NativeGenerationController.
Without sampling_params, SGLang defaults to ~16 tokens output.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
feat: improve scaffolding workflow with multi-worker support and bug fixes
Copy link
Copy Markdown
Collaborator

@garrett4wade garrett4wade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. Since the scaffolding module is relatively mature and not necessarily related to AReaL's core modules, I recommend to move all the new files into the examples directory. It'll be a complete and standalone example.

Comment thread examples/scaffolding/workflow.py Outdated
Comment on lines +177 to +178
if self.worker is None:
self._lazy_init_scaffolding(engine)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new rollout workflow object is created for each trajectory. It will also create a ScaffoldingLlm object. Will this be very expensive?

@WeiHaocheng
Copy link
Copy Markdown
Author

LGTM in general. Since the scaffolding module is relatively mature and not necessarily related to AReaL's core modules, I recommend to move all the new files into the examples directory. It'll be a complete and standalone example.

Move code to the example.

@garrett4wade garrett4wade added the safe-to-test Ready to run unit-tests in a PR. label Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reviewed safe-to-test Ready to run unit-tests in a PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants