From 4fcb6475afa88fb25cb71b4fc1331f69ed54981f Mon Sep 17 00:00:00 2001 From: Lin Chai Date: Tue, 19 May 2026 14:39:28 -0700 Subject: [PATCH] Code update PiperOrigin-RevId: 918035781 --- docs/agentic_rl.md | 12 +++--------- docs/design.md | 4 +--- 2 files changed, 4 insertions(+), 12 deletions(-) diff --git a/docs/agentic_rl.md b/docs/agentic_rl.md index 3a3b4a7ff..3aef14043 100644 --- a/docs/agentic_rl.md +++ b/docs/agentic_rl.md @@ -4,9 +4,7 @@ ## Architecture -

- Trajectory Collect Engine Overview -

+![Trajectory Collect Engine Overview](images/agentic_rollout_pipeline.png) ## Core Components @@ -65,9 +63,7 @@ calls in parallel for efficiency. ### Agent/Environment interaction -

- Batch vs Async Rollout -

+![Agent/Environment interaction](images/agentic_agent:env.png) -------------------------------------------------------------------------------- @@ -116,9 +112,7 @@ lock ensures that rollouts (`acquire_rollout`) are temporarily paused when a weight sync (`acquire_weight_sync`) is requested, preventing agents from generating trajectories with stale parameters. -

- Batch vs Async Rollout -

+![Batch vs Async Rollout](images/batch_vs_async_rollout.png) ### Trajectory Batching and Grouping diff --git a/docs/design.md b/docs/design.md index 770c695d4..0a1019ab9 100644 --- a/docs/design.md +++ b/docs/design.md @@ -123,9 +123,7 @@ training agents that can perform multi-turn reasoning and interact with external tools. The design follows a standard RL paradigm where an **Agent** interacts with an **Environment** over multiple steps to complete a task. -

- Agentic RL Flow -

+![Agentic RL Flow](images/agentic_rollout_pipeline.png) The core design supports agents that engage in **multi-turn conversations**, breaking down complex problems into sequential steps of reasoning, tool