Fix empty code key by hanna-paasivirta · Pull Request #448 · OpenFn/apollo

hanna-paasivirta · 2026-04-09T07:57:30Z

Short Description

Fix conversational answer handling bugs + upgrade to Sonnet 4.6 with structured outputs and adaptive thinking in workflow_chat, job_chat, global_chat to make structured outputs more consistent.

Fixes #442 #310 #449

Implementation Details

Problem

Conversational answers (no generated code/YAML) were broken in both workflow_chat and job_chat:

workflow_chat: Since we reversed the streaming order in a previous PR, the model started to handle the empty yaml field more inconsistently: backtick fences, unescaped newlines. Bad yaml strings leaked through to the frontend, causing YAML parse errors. When the JSON was malformed enough, retries would fire and sometimes produce empty responses (response: "").
job_chat: A changes SSE event was sent even when code_edits was empty ([]), causing the "apply changes" button to appear with no code.

The root cause for workflow_chat was the old prefill-based approach ({"yaml": ") the model struggled to output an empty string as its first token. A bug in split_format_yaml then let backticks leak through: when yaml.safe_load threw on bad content, the broad except caught it but never reset output_yaml, so the raw string (e.g., '```\n```') was returned as response_yaml.

What changed

Model upgrade: claude-sonnet-4-6 (centrally in models.py).

Structured outputs replace prefilling across all three chat services. Instead of prefilling the assistant message and hoping the model produces valid JSON, we now pass a JSON schema via output_config that guarantees schema-compliant responses through constrained decoding. This eliminates all JSON parsing failures, so we no longer have retries for these structural issues.

Adaptive thinking (thinking: {"type": "adaptive"}) added to all Sonnet API calls in workflow_chat, job_chat, and global_chat's planner. The effort level is lowered to "medium" in all services except the planner to account for simple conversation turns that need answers fast. This lets the model dynamically decide when to use extended thinking, improving quality on tasks like comprehensive code renames. Just switching from 4-5 to 4-6 caused tests involving multiple changes to the code to fail far more often than before, as if the new model was not being as thorough, but adaptive thinking solved this. This also means the model streams thinking statuses in the background, but I've blocked those from being shown in the front-end for now before I move onto streaming fixes because they can be unsettling ("the user just said hello instead of asking about a workflow...I think I'll answer with a brief greeting...") and they're a bit long in the front-end space currently allocated for status updates.

Specific fixes:

workflow_chat split_format_yaml: Separated YAML parsing into its own inner try/except so bad yaml can never leak through to the response. If yaml.safe_load fails, output_yaml is explicitly set to "".
workflow_chat streaming: Added validation that yaml content is actually a valid YAML dict before sending a changes SSE event. Previously any non-empty yaml string triggered the event.
workflow_chat prompt: Changed the "no yaml" convention from empty string ("") to null, added a conversational example showing "yaml": null
workflow_chat tweak prompt to tell model not to generate YAML unnecessarily. We'll address similar behaviour in more detail in another PR but it was necessary to tweak it here to test conversational turns faster.
job_chat empty code_edits: Changed else to elif code_edits: so the changes event only fires when there are actual edits. This avoids the "apply changes" button appearing in the front-end when there are no changes.
job_chat error correction: Converted the error correction prompt from prefill to structured outputs, fixing a crash when code edits fail and need self-correction.
job_chat retrieve_docs: Converted search query generation from prefill to structured outputs.
global_chat router: Converted routing decision from prefill + custom brace-counting JSON parser to structured outputs.
Streaming flush (both services): Strip trailing JSON closing characters ("}) from the text stream flush, since text is now the last field in the JSON.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

You can read more details in our Responsible AI Policy

hanna-paasivirta added 5 commits April 9, 2026 00:49

make empty code answers more consistent

c58f1f6

try none

9de0f99

switch to strucutred outputs

7915ff8

add adaptive thinking

a5c59aa

structured and adaptive in global chat

f336785

hanna-paasivirta requested a review from josephjclark April 9, 2026 14:26

josephjclark marked this pull request as ready for review April 9, 2026 14:41

hanna-paasivirta added 2 commits April 9, 2026 23:44

adjust effort

d8c1193

clean up accidental status update push

21be147

hanna-paasivirta added this to Core Apr 9, 2026

github-project-automation bot moved this to New Issues in Core Apr 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix empty code key#448

Fix empty code key#448
hanna-paasivirta wants to merge 7 commits intomainfrom
fix-empty-code-key

hanna-paasivirta commented Apr 9, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hanna-paasivirta commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Short Description

Implementation Details

Problem

What changed

AI Usage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hanna-paasivirta commented Apr 9, 2026 •

edited

Loading