Skip to content

Fix empty code key#448

Open
hanna-paasivirta wants to merge 7 commits intomainfrom
fix-empty-code-key
Open

Fix empty code key#448
hanna-paasivirta wants to merge 7 commits intomainfrom
fix-empty-code-key

Conversation

@hanna-paasivirta
Copy link
Copy Markdown
Contributor

@hanna-paasivirta hanna-paasivirta commented Apr 9, 2026

Short Description

Fix conversational answer handling bugs + upgrade to Sonnet 4.6 with structured outputs and adaptive thinking in workflow_chat, job_chat, global_chat to make structured outputs more consistent.

Fixes #442 #310 #449

Implementation Details

Problem

Conversational answers (no generated code/YAML) were broken in both workflow_chat and job_chat:

  • workflow_chat: Since we reversed the streaming order in a previous PR, the model started to handle the empty yaml field more inconsistently: backtick fences, unescaped newlines. Bad yaml strings leaked through to the frontend, causing YAML parse errors. When the JSON was malformed enough, retries would fire and sometimes produce empty responses (response: "").
  • job_chat: A changes SSE event was sent even when code_edits was empty ([]), causing the "apply changes" button to appear with no code.

The root cause for workflow_chat was the old prefill-based approach ({"yaml": ") the model struggled to output an empty string as its first token. A bug in split_format_yaml then let backticks leak through: when yaml.safe_load threw on bad content, the broad except caught it but never reset output_yaml, so the raw string (e.g., '```\n```') was returned as response_yaml.

What changed

Model upgrade: claude-sonnet-4-6 (centrally in models.py).

Structured outputs replace prefilling across all three chat services. Instead of prefilling the assistant message and hoping the model produces valid JSON, we now pass a JSON schema via output_config that guarantees schema-compliant responses through constrained decoding. This eliminates all JSON parsing failures, so we no longer have retries for these structural issues.

Adaptive thinking (thinking: {"type": "adaptive"}) added to all Sonnet API calls in workflow_chat, job_chat, and global_chat's planner. The effort level is lowered to "medium" in all services except the planner to account for simple conversation turns that need answers fast. This lets the model dynamically decide when to use extended thinking, improving quality on tasks like comprehensive code renames. Just switching from 4-5 to 4-6 caused tests involving multiple changes to the code to fail far more often than before, as if the new model was not being as thorough, but adaptive thinking solved this. This also means the model streams thinking statuses in the background, but I've blocked those from being shown in the front-end for now before I move onto streaming fixes because they can be unsettling ("the user just said hello instead of asking about a workflow...I think I'll answer with a brief greeting...") and they're a bit long in the front-end space currently allocated for status updates.

Specific fixes:

  • workflow_chat split_format_yaml: Separated YAML parsing into its own inner try/except so bad yaml can never leak through to the response. If yaml.safe_load fails, output_yaml is explicitly set to "".
  • workflow_chat streaming: Added validation that yaml content is actually a valid YAML dict before sending a changes SSE event. Previously any non-empty yaml string triggered the event.
  • workflow_chat prompt: Changed the "no yaml" convention from empty string ("") to null, added a conversational example showing "yaml": null
  • workflow_chat tweak prompt to tell model not to generate YAML unnecessarily. We'll address similar behaviour in more detail in another PR but it was necessary to tweak it here to test conversational turns faster.
  • job_chat empty code_edits: Changed else to elif code_edits: so the changes event only fires when there are actual edits. This avoids the "apply changes" button appearing in the front-end when there are no changes.
  • job_chat error correction: Converted the error correction prompt from prefill to structured outputs, fixing a crash when code edits fail and need self-correction.
  • job_chat retrieve_docs: Converted search query generation from prefill to structured outputs.
  • global_chat router: Converted routing decision from prefill + custom brace-counting JSON parser to structured outputs.
  • Streaming flush (both services): Strip trailing JSON closing characters ("}) from the text stream flush, since text is now the last field in the JSON.

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our Responsible AI Policy

@josephjclark josephjclark marked this pull request as ready for review April 9, 2026 14:41
@github-project-automation github-project-automation bot moved this to New Issues in Core Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: New Issues

Development

Successfully merging this pull request may close these issues.

Conversational answers with no generated code/YAML are not parsed correctly in Lightning

1 participant