Skip to content

Agent Engine: _init_session replays all events via append_event causing 429 RESOURCE_EXHAUSTED on long sessions #5714

@JoeyC1990

Description

@JoeyC1990

🔴 Required Information

Is your feature request related to a specific problem?

Yes. When an ADK agent is deployed to Vertex AI Agent Engine and invoked mid-conversation (e.g. via @agent in Gemini Enterprise after building context with the standard Gemini agent), the _init_session() method in vertexai/agent_engines/templates/adk.py (line ~810) replays every prior session event by calling session_service.append_event() individually:

# adk.py line ~810
await session_service.append_event(session, Event(**event))

For sessions with many events (common when users build context in the standard agent before handing off to a custom ADK agent), this triggers:

google.genai.errors.ClientError: 429 RESOURCE_EXHAUSTED
Quota exceeded for quota metric 'Session Event Append Requests'
and limit 'Session Event Append Requests per minute per region'
of service 'aiplatform.googleapis.com'

This is a self-perpetuating failure: each retry attempt consumes more quota, making recovery impossible until the user starts a fresh session.

Steps to reproduce:

  1. Deploy an ADK agent to Vertex AI Agent Engine with A2A enabled
  2. In Gemini Enterprise, have a multi-turn conversation with the standard Gemini agent (10+ turns with file attachments, images, etc.)
  3. @mention the custom ADK agent in the same conversation
  4. Observe the 429 error in Reasoning Engine stderr logs — the agent never reaches user code

Describe the Solution You'd Like

Any of the following (in order of preference):

  1. Batched event replay with rate-limiting: _init_session should batch append_event calls or use a bulk append API rather than individual sequential calls
  2. Incremental session resumption: Only replay events that haven't already been persisted, rather than replaying the entire history on every invocation
  3. Session event windowing: Allow configuring a maximum number of recent events to replay (similar to the GetSessionConfig.num_recent_events pattern from Allow limiting num. of Session events fetched when calling Runner.run_async #3562), discarding or summarising older context
  4. Expose session configuration on Agent Engine: Allow developers to configure session replay behaviour (e.g. max_replay_events, replay_strategy) when deploying to Agent Engine

Impact on your work

Critical. We have deployed multi-brand A2A agents (PowerPoint generation agents) to Agent Engine for enterprise users. The intended workflow is:

  1. Users discuss their presentation needs with the standard Gemini agent in Gemini Enterprise (building context, sharing images, iterating on ideas)
  2. Users then @mention our custom ADK agent to generate the actual presentation

This workflow is completely broken for any non-trivial conversation because the session replay crashes the agent before any user code executes. Users are forced to start a brand new conversation and re-provide all context, which defeats the purpose of the A2A cross-agent handoff pattern.

This is blocking production usage for our organisation today.

Willingness to contribute

Yes — happy to test and provide feedback on proposed solutions. Would consider submitting a PR if pointed to the right code path (the replay logic appears to be in the vertexai SDK rather than adk-python directly).


🟡 Recommended Information

Describe Alternatives You've Considered

  1. Starting a new conversation — Works but defeats the purpose of building context with the standard agent first. Poor UX.
  2. Trimming llm_request.contents in before_model_callback — Only prevents future growth; cannot fix the initial _init_session crash since it runs before any callbacks.
  3. Requesting a quota increase — May raise the ceiling but doesn't address the O(n) replay design. Sessions will still break at some threshold.
  4. "Context brief" pattern — Ask the standard agent to summarise context, then paste into a new conversation. Functional workaround but requires user discipline and breaks the seamless handoff experience.

Proposed API / Implementation

Option A — Add a max_replay_events config to Agent Engine deployment:

# In agent_engine deployment config
session_config = SessionConfig(
    max_replay_events=50,  # Only replay last 50 events
    replay_strategy="recent"  # or "summarise", "checkpoint"
)

Option B — Use bulk append in _init_session:

# Instead of:
for event in events:
    await session_service.append_event(session, Event(**event))

# Use a batch API:
await session_service.append_events_batch(session, [Event(**e) for e in events])

Additional Context

Environment:

  • OS: macOS
  • Python: 3.12
  • ADK version: Latest (deployed via Agent Engine)
  • Deployment: Vertex AI Agent Engine (Reasoning Engine) in europe-west1
  • Model: Gemini 3.1 Pro Preview / Gemini 3 Flash
  • A2A: Enabled

Related Issues:

Metadata

Metadata

Assignees

Labels

agent engine[Component] This issue is related to Vertex AI Agent Engine

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions