Skip to content

fix: retry with resume message when model returns empty response#5006

Open
gurjot-05 wants to merge 6 commits intogoogle:mainfrom
gurjot-05:fix/retry-empty-model-response
Open

fix: retry with resume message when model returns empty response#5006
gurjot-05 wants to merge 6 commits intogoogle:mainfrom
gurjot-05:fix/retry-empty-model-response

Conversation

@gurjot-05
Copy link
Copy Markdown

@gurjot-05 gurjot-05 commented Mar 26, 2026

Bug

Some models (notably Gemini 2.5 Pro/Flash) intermittently return empty content (parts: [], candidatesTokenCount: 0, finishReason: STOP) after processing tool results. This is especially common under concurrent load and with streaming + thinking enabled.

ADK's is_final_response() treats this as a valid completed turn because it only checks for the absence of function calls — not the presence of actual content. The agent loop stops and the user sees nothing.

Observed with:

  • Gemini 2.5 Pro via vertex_ai — after tool execution in orchestrator agents
  • Intermittent: ~15-29% failure rate under parallel load (7+ concurrent sessions)
  • Affects both streaming (SSE + thoughts) and non-streaming modes

Example session showing the bug:

Event: agent calls nth_research → completes successfully
Event: agent calls review_research → completes successfully (status: "approved")
Event: agent should call draft_response next, but returns parts: [] ← EMPTY
Agent loop ends. User sees nothing.

Related: #3525

Root Cause

There are three distinct failure modes, all leading to the same silent halt:

Mode 1: Non-streaming empty response

In BaseLlmFlow.run_async():

if not last_event or last_event.is_final_response() or last_event.partial:
    break

An event with parts: [] passes is_final_response() — no function calls, no function responses, not partial — so the loop breaks silently.

Mode 2: Streaming + thinking, no final response yielded

With streaming + thinking enabled, the LiteLLM adapter yields thought chunks as partial=True. When finish_reason=stop arrives with no text content and no accumulated text/reasoning:

  • lite_llm.py finalization requires (text or reasoning_parts) which is False
  • No final aggregated response is ever yielded from the streaming generator
  • last_event is either None or partial=True from a thought chunk
  • The loop breaks on not last_event or last_event.partialbefore any retry logic

Mode 3: Model returns content=None

When the model returns content=None, _postprocess_async filters it out (returns without yielding if no content and no error). last_event stays None, and the loop breaks.

Fix

Two-layer fix addressing the root cause and adding defense-in-depth:

Layer 1: lite_llm.py — Ensure streaming always yields a final response

Added a fallback after the streaming loop: if the model produced no meaningful output at all (no text, no reasoning, no tool calls), yield an explicit empty non-partial LlmResponse so downstream retry logic can detect and handle it.

This converts Mode 2 into Mode 1, making it catchable by the retry logic.

Layer 2: base_llm_flow.py — Unified retry for all empty response modes

Restructured run_async() to check for empty responses before normal termination conditions. The retry now handles all three cases:

  1. not last_event — no events yielded at all (Mode 3)
  2. last_event.partial with no meaningful content — streaming thought-only chunks, no final response (Mode 2, defense-in-depth)
  3. last_event.is_final_response() with no meaningful content — non-streaming empty response (Mode 1)

When an empty response is detected:

  • A resume message is injected: "Your previous response was empty. Please resume execution from where you left off."
  • This gives the model context about why it's being re-prompted, improving recovery rate vs. silent retry
  • _MAX_EMPTY_RESPONSE_RETRIES = 2 prevents infinite loops

False positive prevention

  • Partial events with real content (normal streaming): _has_meaningful_content() returns True — no retry
  • Agent transfer events: Author check (last_event.author == agent.name) prevents retrying events from other agents
  • Thought-only parts: Treated as non-meaningful (they have no user-visible content)

Tests

Existing tests: 384 passed (0 failures)

PR tests (test_empty_response_retry.py): 12 passed

Comprehensive scenario tests (test_empty_response_all_scenarios.py): 12 passed

# Scenario Expected Result
1 Non-streaming empty parts:[] then recovery 1 retry, good response PASS
2 Thought-only final response 1 retry PASS
3a No events at all (last_event=None) Retries up to max PASS
3b Partial + empty content 1 retry, then recovery PASS
3b' Partial + thought-only content 1 retry, then recovery PASS
4 Partial WITH real text content No retry (normal break) PASS
5 Empty exhausts max retries Exactly 2 retries, then stop PASS
6 Empty, Empty, Good (multi-retry recovery) 2 retries, then good PASS
7 LiteLLM fallback response is non-partial is_final_response()=True PASS
8 Whitespace-only text 1 retry PASS
9 Function call is meaningful Not retried PASS
10 Partial empty then partial with content 1 retry, then normal break PASS

Test plan

  • pytest tests/unittests/flows/llm_flows/ — 384 passed, 0 failed
  • All 12 original PR tests pass
  • All 12 comprehensive scenario tests pass
  • Agent transfer tests not affected (author check prevents false positives)
  • Verified with production logs from real orchestrator agent running Gemini 2.5 Pro — both non-streaming and streaming+thinking failure modes are covered

@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Mar 26, 2026
@gurjot-05 gurjot-05 force-pushed the fix/retry-empty-model-response branch from 5e0a344 to 3acd2ac Compare March 26, 2026 06:44
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Mar 26, 2026

Response from ADK Triaging Agent

Hello @gurjot-05, thank you for your contribution!

It looks like the Contributor License Agreement (CLA) check has failed. Before we can merge this PR, you will need to sign the CLA. You can do so by following the instructions at https://cla.developers.google.com/.

Signing the CLA is a one-time process and is required for all contributions. Thanks!

@gurjot-05 gurjot-05 force-pushed the fix/retry-empty-model-response branch from 10f9073 to e6289db Compare March 26, 2026 07:26
@rohityan rohityan self-assigned this Mar 26, 2026
Some models (notably Gemini 2.5) intermittently return empty content
(parts: [], candidatesTokenCount: 0, finishReason: STOP) after
processing tool results. This is especially common under concurrent
load and with streaming + thinking enabled.

ADK's is_final_response() treats this as a valid completed turn because
it only checks for the absence of function calls, not the presence of
actual content. The agent loop stops and the user sees nothing.

This fix adds retry logic in BaseLlmFlow.run_async():

1. _has_meaningful_content() helper detects empty/thought-only events
2. When an empty final response is detected from the current agent,
   a resume message ("Your previous response was empty. Please resume
   execution from where you left off.") is injected into the session
   as a user event before re-prompting the model
3. Maximum 2 retries to prevent infinite loops
4. Author check (last_event.author == agent.name) prevents false
   positives on legitimate empty events from agent transfers

Unlike a silent re-prompt, the injected message gives the model
context about why it is being called again, improving recovery rate.

Fixes google#3525
…cases

The original fix only retried when is_final_response() was True with
empty content. This missed two scenarios observed in production:

1. Streaming + thinking: model streams thought chunks (partial=True)
   then stops with no text — the LiteLLM adapter dropped the response
   entirely, and the loop broke on last_event.partial without retry.

2. No events at all: model returned content=None which was filtered by
   _postprocess_async, leaving last_event=None — loop broke immediately.

Changes:
- lite_llm.py: Add fallback after streaming loop to yield an explicit
  empty non-partial LlmResponse when nothing was finalized, so
  downstream retry logic can detect and handle it.
- base_llm_flow.py: Restructure run_async() to check for empty
  responses (None, partial+empty, final+empty) before normal
  termination, enabling retry across all three scenarios.
- Update existing test for new retry-on-None behavior.
- Add 12 comprehensive scenario tests covering all cases.
The resume nudge event was being yielded from run_async(), which sent
it through the SSE stream to the frontend. Users saw "Your previous
response was empty" as a visible chat message.

Fix: use session_service.append_event() to write the resume message
directly to the session history. The model sees it on the next call
(for better recovery), but it never reaches the UI/SSE stream.
The retry counter was per-invocation, not per-failure-burst. If a model
returned empty responses at different points during the same invocation,
earlier (recovered) empties consumed the budget. A later empty response
would exhaust the counter and halt silently.

Fix: reset empty_response_count to 0 after any successful (non-empty)
response. Also add a warning log when retries are exhausted so the halt
is not silent.
…nses

The empty response retry was too aggressive — it triggered on:
1. Sub-agents (AgentTool, ParallelAgent) that legitimately return no content
2. First LLM calls with no prior tool execution

Fixes:
- Add null guard for last_event in is_final_response check (NoneType crash)
- Only retry after at least one tool call in the invocation, since the
  bug only manifests when models return empty after processing tool results
- Remove append_event for resume message (caused session state corruption
  in pause/resume flows and leaked to UI)
- Silent retry instead (proven 100% recovery rate in production tests)
- Update scenario tests to include tool call before empty response
@gurjot-05 gurjot-05 force-pushed the fix/retry-empty-model-response branch from 6e331f5 to 2f489f4 Compare March 26, 2026 18:51
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @gurjot-05 , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share.

@rohityan rohityan added the needs review [Status] The PR/issue is awaiting review from the maintainer label Mar 26, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @GWeale , can you please review this.

@gurjot-05
Copy link
Copy Markdown
Author

Thank you @rohityan I look forward to your feedback. Let me know if there’s anything else I can provide in the meantime.

@pras2309
Copy link
Copy Markdown

@gurjot-05, please rebase the branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation needs review [Status] The PR/issue is awaiting review from the maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants