From 4fcb6475afa88fb25cb71b4fc1331f69ed54981f Mon Sep 17 00:00:00 2001
From: Lin Chai <linchai@google.com>
Date: Tue, 19 May 2026 14:39:28 -0700
Subject: [PATCH] Code update

PiperOrigin-RevId: 918035781
---
 docs/agentic_rl.md | 12 +++---------
 docs/design.md     |  4 +---
 2 files changed, 4 insertions(+), 12 deletions(-)
diff --git a/docs/agentic_rl.md b/docs/agentic_rl.md
index 3a3b4a7ff..3aef14043 100644
--- a/docs/agentic_rl.md
+++ b/docs/agentic_rl.md
@@ -4,9 +4,7 @@
 
 ## Architecture
 
-<p align="center">
-  <img src="images/agentic_rollout_pipeline.png" alt="Trajectory Collect Engine Overview" width="100%">
-</p>
+![Trajectory Collect Engine Overview](images/agentic_rollout_pipeline.png)
 
 ## Core Components
 
@@ -65,9 +63,7 @@ calls in parallel for efficiency.
 
 ### Agent/Environment interaction
 
-<p align="center">
-  <img src="images/agentic_agent:env.png" alt="Batch vs Async Rollout" width="80%">
-</p>
+![Agent/Environment interaction](images/agentic_agent:env.png)
 
 --------------------------------------------------------------------------------
 
@@ -116,9 +112,7 @@ lock ensures that rollouts (`acquire_rollout`) are temporarily paused when a
 weight sync (`acquire_weight_sync`) is requested, preventing agents from
 generating trajectories with stale parameters.
 
-<p align="center">
-  <img src="images/batch_vs_async_rollout.png" alt="Batch vs Async Rollout" width="50%">
-</p>
+![Batch vs Async Rollout](images/batch_vs_async_rollout.png)
 
 ### Trajectory Batching and Grouping
 
diff --git a/docs/design.md b/docs/design.md
index 770c695d4..0a1019ab9 100644
--- a/docs/design.md
+++ b/docs/design.md
@@ -123,9 +123,7 @@ training agents that can perform multi-turn reasoning and interact with external
 tools. The design follows a standard RL paradigm where an **Agent** interacts
 with an **Environment** over multiple steps to complete a task.
 
-<p align="center">
-  <img src="images/agentic_rollout_pipeline.png" alt="Agentic RL Flow" width="80%">
-</p>
+![Agentic RL Flow](images/agentic_rollout_pipeline.png)
 
 The core design supports agents that engage in **multi-turn conversations**,
 breaking down complex problems into sequential steps of reasoning, tool