Summary
When google.LLM (Gemini 3 / Gemini 2.5) is placed as a fallback inside a FallbackAdapter after a non-Google LLM (e.g. OpenAI), any second turn that includes a function call from the primary LLM causes Gemini to return HTTP 400 INVALID_ARGUMENT:
Function call is missing a thought_signature in functionCall parts
Steps to reproduce
from livekit.agents.llm import FallbackAdapter, ChatContext, FunctionCall, FunctionCallOutput
from livekit.plugins import google
ctx = ChatContext.empty()
ctx.add_message(role="user", content="I feel pain")
ctx.insert(FunctionCall(call_id="call_openai_abc123", name="manage_pain", arguments="{}"))
ctx.insert(FunctionCallOutput(call_id="call_openai_abc123", name="manage_pain", output="ok", is_error=False))
fallback = FallbackAdapter([<openai_llm_that_fails>, google.LLM(model="gemini-3-flash-preview")])
async with fallback.chat(chat_ctx=ctx, tools=[]) as stream:
async for chunk in stream:
... # raises APIConnectionError — all LLMs failed
Root cause
The Google plugin guards thought-signature injection with:
if thought_signatures and (sig := thought_signatures.get(msg.call_id)):
fc_part["thought_signature"] = sig
When google.LLM is a FallbackAdapter fallback, its _thought_signatures dict is empty (no Gemini turn has happened yet). The guard short-circuits on the falsy empty dict, so no thought_signature is injected for the OpenAI-produced function call, and Gemini rejects the request.
Even pre-populating _thought_signatures with an unrelated key doesn't help — .get(unknown_call_id) returns None, which fails the walrus condition.
Current workaround
Google documents b"skip_thought_signature_validator" as a sentinel for this scenario:
There are rare cases where you need to provide functionCall parts that were not generated by the API and therefore don't have an associated thought signature (for example, when transferring history from a model that does not include thought signatures). You can set thought_signature to skip_thought_signature_validator, but, this should be a last resort as it will negatively impact model performance.
As a stopgap I subclassed dict to override __bool__ (always True) and get() (returns the sentinel for unknown call IDs), then replaced _thought_signatures on a google.LLM subclass.
This unblocks the fallback but can trigger the documented performance degradation: in some models sometimes the agent occasionally repeats itself around function calls (e.g. "Thank you for your call. Thank you for your call. See you soon!" when a wrap-up tool is called). I last seen this a couple of months ago with Gemini 3.1 preview but I can't confirm this on later models.
Proposed fix
When converting ChatContext to API messages, for any function_call part whose call_id is absent from _thought_signatures, inject the sentinel rather than silently omitting the field. While this is a no-op for sessions where Gemini is the primary LLM (all call_ids present with real signatures) and fixes the fallback case transparently. However, one might one to place it as a parameter, like "fallback_thought_signatures"
Summary
When
google.LLM(Gemini 3 / Gemini 2.5) is placed as a fallback inside aFallbackAdapterafter a non-Google LLM (e.g. OpenAI), any second turn that includes a function call from the primary LLM causes Gemini to return HTTP 400INVALID_ARGUMENT:Steps to reproduce
Root cause
The Google plugin guards thought-signature injection with:
When
google.LLMis aFallbackAdapterfallback, its_thought_signaturesdict is empty (no Gemini turn has happened yet). The guard short-circuits on the falsy empty dict, so nothought_signatureis injected for the OpenAI-produced function call, and Gemini rejects the request.Even pre-populating
_thought_signatureswith an unrelated key doesn't help —.get(unknown_call_id)returnsNone, which fails the walrus condition.Current workaround
Google documents
b"skip_thought_signature_validator"as a sentinel for this scenario:As a stopgap I subclassed
dictto override__bool__(alwaysTrue) andget()(returns the sentinel for unknown call IDs), then replaced_thought_signatureson agoogle.LLMsubclass.This unblocks the fallback but can trigger the documented performance degradation: in some models sometimes the agent occasionally repeats itself around function calls (e.g. "Thank you for your call. Thank you for your call. See you soon!" when a wrap-up tool is called). I last seen this a couple of months ago with Gemini 3.1 preview but I can't confirm this on later models.
Proposed fix
When converting
ChatContextto API messages, for anyfunction_callpart whosecall_idis absent from_thought_signatures, inject the sentinel rather than silently omitting the field. While this is a no-op for sessions where Gemini is the primary LLM (all call_ids present with real signatures) and fixes the fallback case transparently. However, one might one to place it as a parameter, like "fallback_thought_signatures"