fix: retry with resume message when model returns empty response#5006
fix: retry with resume message when model returns empty response#5006gurjot-05 wants to merge 6 commits intogoogle:mainfrom
Conversation
5e0a344 to
3acd2ac
Compare
|
Response from ADK Triaging Agent Hello @gurjot-05, thank you for your contribution! It looks like the Contributor License Agreement (CLA) check has failed. Before we can merge this PR, you will need to sign the CLA. You can do so by following the instructions at https://cla.developers.google.com/. Signing the CLA is a one-time process and is required for all contributions. Thanks! |
10f9073 to
e6289db
Compare
Some models (notably Gemini 2.5) intermittently return empty content
(parts: [], candidatesTokenCount: 0, finishReason: STOP) after
processing tool results. This is especially common under concurrent
load and with streaming + thinking enabled.
ADK's is_final_response() treats this as a valid completed turn because
it only checks for the absence of function calls, not the presence of
actual content. The agent loop stops and the user sees nothing.
This fix adds retry logic in BaseLlmFlow.run_async():
1. _has_meaningful_content() helper detects empty/thought-only events
2. When an empty final response is detected from the current agent,
a resume message ("Your previous response was empty. Please resume
execution from where you left off.") is injected into the session
as a user event before re-prompting the model
3. Maximum 2 retries to prevent infinite loops
4. Author check (last_event.author == agent.name) prevents false
positives on legitimate empty events from agent transfers
Unlike a silent re-prompt, the injected message gives the model
context about why it is being called again, improving recovery rate.
Fixes google#3525
…cases The original fix only retried when is_final_response() was True with empty content. This missed two scenarios observed in production: 1. Streaming + thinking: model streams thought chunks (partial=True) then stops with no text — the LiteLLM adapter dropped the response entirely, and the loop broke on last_event.partial without retry. 2. No events at all: model returned content=None which was filtered by _postprocess_async, leaving last_event=None — loop broke immediately. Changes: - lite_llm.py: Add fallback after streaming loop to yield an explicit empty non-partial LlmResponse when nothing was finalized, so downstream retry logic can detect and handle it. - base_llm_flow.py: Restructure run_async() to check for empty responses (None, partial+empty, final+empty) before normal termination, enabling retry across all three scenarios. - Update existing test for new retry-on-None behavior. - Add 12 comprehensive scenario tests covering all cases.
The resume nudge event was being yielded from run_async(), which sent it through the SSE stream to the frontend. Users saw "Your previous response was empty" as a visible chat message. Fix: use session_service.append_event() to write the resume message directly to the session history. The model sees it on the next call (for better recovery), but it never reaches the UI/SSE stream.
The retry counter was per-invocation, not per-failure-burst. If a model returned empty responses at different points during the same invocation, earlier (recovered) empties consumed the budget. A later empty response would exhaust the counter and halt silently. Fix: reset empty_response_count to 0 after any successful (non-empty) response. Also add a warning log when retries are exhausted so the halt is not silent.
…nses The empty response retry was too aggressive — it triggered on: 1. Sub-agents (AgentTool, ParallelAgent) that legitimately return no content 2. First LLM calls with no prior tool execution Fixes: - Add null guard for last_event in is_final_response check (NoneType crash) - Only retry after at least one tool call in the invocation, since the bug only manifests when models return empty after processing tool results - Remove append_event for resume message (caused session state corruption in pause/resume flows and leaked to UI) - Silent retry instead (proven 100% recovery rate in production tests) - Update scenario tests to include tool call before empty response
6e331f5 to
2f489f4
Compare
|
Hi @gurjot-05 , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Your PR has been received by the team and is currently under review. We will provide feedback as soon as we have an update to share. |
|
Hi @GWeale , can you please review this. |
|
Thank you @rohityan I look forward to your feedback. Let me know if there’s anything else I can provide in the meantime. |
|
@gurjot-05, please rebase the branch. |
Bug
Some models (notably Gemini 2.5 Pro/Flash) intermittently return empty content (
parts: [],candidatesTokenCount: 0,finishReason: STOP) after processing tool results. This is especially common under concurrent load and with streaming + thinking enabled.ADK's
is_final_response()treats this as a valid completed turn because it only checks for the absence of function calls — not the presence of actual content. The agent loop stops and the user sees nothing.Observed with:
vertex_ai— after tool execution in orchestrator agentsExample session showing the bug:
Related: #3525
Root Cause
There are three distinct failure modes, all leading to the same silent halt:
Mode 1: Non-streaming empty response
In
BaseLlmFlow.run_async():An event with
parts: []passesis_final_response()— no function calls, no function responses, not partial — so the loop breaks silently.Mode 2: Streaming + thinking, no final response yielded
With streaming + thinking enabled, the LiteLLM adapter yields thought chunks as
partial=True. Whenfinish_reason=stoparrives with no text content and no accumulated text/reasoning:lite_llm.pyfinalization requires(text or reasoning_parts)which is Falselast_eventis eitherNoneorpartial=Truefrom a thought chunknot last_eventorlast_event.partial— before any retry logicMode 3: Model returns content=None
When the model returns
content=None,_postprocess_asyncfilters it out (returns without yielding if no content and no error).last_eventstaysNone, and the loop breaks.Fix
Two-layer fix addressing the root cause and adding defense-in-depth:
Layer 1: lite_llm.py — Ensure streaming always yields a final response
Added a fallback after the streaming loop: if the model produced no meaningful output at all (no text, no reasoning, no tool calls), yield an explicit empty non-partial
LlmResponseso downstream retry logic can detect and handle it.This converts Mode 2 into Mode 1, making it catchable by the retry logic.
Layer 2: base_llm_flow.py — Unified retry for all empty response modes
Restructured
run_async()to check for empty responses before normal termination conditions. The retry now handles all three cases:not last_event— no events yielded at all (Mode 3)last_event.partialwith no meaningful content — streaming thought-only chunks, no final response (Mode 2, defense-in-depth)last_event.is_final_response()with no meaningful content — non-streaming empty response (Mode 1)When an empty response is detected:
_MAX_EMPTY_RESPONSE_RETRIES = 2prevents infinite loopsFalse positive prevention
_has_meaningful_content()returns True — no retrylast_event.author == agent.name) prevents retrying events from other agentsTests
Existing tests: 384 passed (0 failures)
PR tests (test_empty_response_retry.py): 12 passed
Comprehensive scenario tests (test_empty_response_all_scenarios.py): 12 passed
Test plan
pytest tests/unittests/flows/llm_flows/— 384 passed, 0 failed