Skip to content

bug(e2e): multi-turn structured output regresses for Anthropic on native combined path (#605 follow-up) #613

@tombeckenham

Description

@tombeckenham

Symptom

When the engine takes the #605 native combined path for Anthropic (Claude 4.5+), the multi-turn structured output E2E test fails on the second turn:

```
Locator: getByTestId('structured-output-part').nth(1)
Expected substring: "Vegan"
Received string: "{"title":"Classic Spaghetti Pomodoro",...}"
```

The second turn's rendered structured-output part shows the first turn's content (Pomodoro) instead of the new content (Vegan). Both `structured-output-part` elements exist on the page (so `toHaveCount(2)` passes), but both contain identical turn-1 data.

The same test passes for every other provider in the matrix:

  • ✅ openai (also native combined path)
  • ✅ gemini, ollama, groq, grok, openrouter (legacy finalization path)

Test: `testing/e2e/tests/multi-turn-structured.spec.ts:28` — "anthropic — multi-turn-structured › every assistant turn keeps its own typed structured-output part".

Anthropic is currently excluded from `feature-support.ts`'s `'multi-turn-structured'` set as a temporary workaround so CI stays green while this is investigated.

Hypotheses (none verified)

The failure is specific to Anthropic on the native combined path — that's the only thing that changed for this provider in #605. Suspects:

  1. aimock-matching desync. The assistant message in turn 2's history shape changed from a `tool_use` block (pre-Route structured-output through native combined-mode where supported (skip finalization round-trip) #605 forced-tool workaround) to plain text content (post-Route structured-output through native combined-mode where supported (skip finalization round-trip) #605 schema-constrained text). aimock's matcher uses the last user message text from the converted completion-request shape, which should still resolve correctly — but maybe the conversion differs in a subtle way that causes the wrong fixture to win.

  2. Engine state bleed. `harvestCombinedStructuredOutput` reads `this.accumulatedContent` per engine instance. Per-call instances should isolate this, but worth verifying with a unit test that exercises two sequential `chat({ outputSchema, messages: [u1, a1, u2] })` calls against a mock adapter.

  3. Client-side StreamProcessor binding. The structured-output.start event carries `messageId` derived from the adapter's `TEXT_MESSAGE_START.messageId`. If both turns somehow ended up using the same id (deterministic generator?), the client would attach turn 2's part to turn 1's message. The adapter uses `generateId(this.name)` per call — should be unique, but worth checking.

  4. structured-output part rendering in useChat. The `structured-output` part type may be matched / replaced by id rather than appended per message. If the messageId-binding is wrong (per hypothesis 3) the part could overwrite incorrectly.

Reproducer

`pnpm --filter @tanstack/ai-e2e test:e2e -- --grep "anthropic — multi-turn-structured"`

The test is currently gated out of the matrix; restore `'anthropic'` to the `'multi-turn-structured'` set in `testing/e2e/src/lib/feature-support.ts` to re-enable.

Suggested approach

  1. Write a vitest unit test in `packages/typescript/ai/tests` that simulates the multi-turn anthropic native-combined-mode flow against a mocked adapter (two sequential `chat()` calls with the previous JSON text in conversation history). If the structured-output.complete events carry the right per-turn data, the issue is downstream (client-side rendering or aimock). If they don't, narrow to the engine's harvest path.
  2. If the engine path looks correct, capture the actual aimock fixture-match log when the test runs locally (`LLMOCK_VERBOSE=1` if supported, or temporarily instrument the e2e test page to log the request body / response per turn).

Once root cause is identified, restore `'anthropic'` to the multi-turn-structured matrix.

Out of scope for this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions