All bugs identified during stress testing on 2026-03-15. Two tests conducted: (1) Context Rotation with complex dashboard HTML, (2) Todo List with e-commerce backend. Model: Qwen3.5-4B-Q4_K_M.gguf, ctx=14080, GPU:20L EVERY bug listed here MUST be addressed with a deep infrastructural root cause solution. No band-aids.
Symptom: append_to_file calls are counted in the same write dedup tracker as write_file. After 3 writes to the same filename, ALL subsequent writes are silently blocked. The model generates correct code but it never reaches disk.
Log evidence:
[MCP] Write dedup: blocking write_file to "dashboard.html" (already written 3x)at log lines 7678, 10378, 12909
Root cause file: main/mcpToolServer.js — write dedup logic does not distinguish append_to_file from write_file
Impact: Primary cause of incomplete files. The 239-line JavaScript section in Test 1 was generated by the model but NEVER written to disk. The model had no idea its writes were being silently eaten.
User quote: "OH IT FAILED!!! WOW critical BUG #8!!!!!!!"
Required fix: Write dedup must either (a) not apply to append_to_file at all, or (b) track append vs write separately with appropriate limits for each. Appending content to a file being built incrementally is the expected behavior — blocking it defeats the purpose of the tool.
Symptom: When the model uses append_to_file, the UI code block shows only the CURRENT append chunk (e.g., 259 lines) instead of the FULL cumulative file (e.g., 633 lines). Lines "disappear" from the user's view.
User quote: "It needs to display the entire file, the whole file, not just the part that it's working on, bro. That's crazy. It needs to appear as one code block. The 600 something that was there should never have disappeared."
Root cause: The frontend renders append_to_file tool results showing only the appended content, not the full file contents. The tool result message only contains the newly appended portion.
Impact: User cannot see the full state of the file being built. Breaks visual continuity. Makes it impossible to verify file integrity during generation.
Required fix: When displaying append_to_file results, the UI code block must show the ENTIRE file contents accumulated so far — not just the latest append chunk. This may require the backend to return the full file content after each append, or the frontend to accumulate and display all chunks.
Symptom: The tool result header shows one line count (e.g., 464 lines) while the code block below it shows a different count (e.g., 317 lines). These numbers should always match.
User quote: "why the code block say 317, but the header says 464??? why cant it update with the header????????"
Root cause: The header line count and the code block content are sourced from different data — header may reflect the actual file on disk while the code block only shows the current tool result content.
Required fix: The header line count and the code block line count MUST be synchronized. Both must reflect the same number — the total lines in the full file on disk.
Symptom: The /* Responsive Design */ @media CSS block appears THREE times in the file: at lines 539, 596, and 615. The model re-generates content that already exists in the file when a continuation boundary is crossed.
Root cause: When seamless continuation fires and the model resumes, it regenerates content that was already written in the previous continuation. No overlap/dedup detection exists for file content across continuations.
Affected file in test: dashboard.html lines 539, 596, 615
Required fix: The continuation system needs to provide enough context about what was already written so the model picks up from where it actually left off — not from what it "remembers" (which may overlap). Alternatively, the append system needs overlap detection at continuation seams.
Symptom: After 24 minutes and 13K+ tokens, dashboard.html has 633 lines but contains ONLY CSS. Missing: </style>, </head>, <body>, all body content, <script>, JavaScript, </html>.
Root cause: BUG 1 (write dedup) blocked the body/script writes. BUG 8 (forced continuation loop) consumed remaining iterations without progressing.
Impact: The file is completely non-functional. 24 minutes of generation produced a file that cannot render in a browser.
Required fix: Resolving BUG 1 (write dedup) and BUG 8 (forced loop) should prevent this. But also: the system should detect when a file is fundamentally incomplete (no closing tags, no body element for HTML) and prioritize structural completion before optional content.
Symptom: At ~250s into streaming, the output showed </script></body></html> followed by additional CSS content being appended after the document closure. Content was written after the HTML document was already closed.
Root cause: Continuation seam handling — when continuation fires after the model writes closing tags, the resumed generation starts appending more content without awareness that the document was already closed.
Required fix: Content integrity validation at continuation seams. The system should detect when a structured file has been closed and not append additional content of the same type after the closure.
Symptom: Model tried tail -10 dashboard.html, tail -100 dashboard.html (Test 1) and mkdir -p src/{config,middleware,...} with bash brace expansion (Test 2). All failed on Windows PowerShell.
User quote: "all models should be system aware. They should know what operating system they're running on... This is production software, Sir."
Root cause: The system prompt / preamble does not include the operating system. Models have no way to know they're running on Windows vs Linux.
Affected file: main/constants.js — system preamble (both DEFAULT_COMPACT_PREAMBLE and DEFAULT_SYSTEM_PREAMBLE)
Required fix: Detect the OS at runtime and inject it into the system prompt. Something like: "The user's operating system is Windows. Use PowerShell commands, not bash/Linux commands." This must be dynamic (detected via process.platform or equivalent), not hardcoded.
Symptom: System detects file is incomplete -> forces continuation -> model generates append_to_file -> write dedup blocks the write -> file stays incomplete -> system detects incomplete again -> repeat. This consumed 4 of 9 available iterations (iterations 4, 5, 7, 8).
Root cause: Two subsystems interact destructively:
- "Incomplete file detected after tool execution" logic in
agenticChat.jsforces continuation - Write dedup in
mcpToolServer.jsblocks the writes that continuation produces
Result: infinite waste loop that consumes iteration budget without making progress.
Required fix: At minimum, BUG 1 (write dedup blocking appends) must be resolved. Additionally, the incomplete file detection should check whether the continuation actually made progress — if dedup blocked the write, forcing another continuation is pointless. A "no progress" detector should break the loop.
Symptom: The JavaScript section (239 lines) shows as a RED X FAIL icon in the UI. The model generated correct code but the pipeline rejected the write via dedup.
Root cause: Same as BUG 1 — write dedup blocked the write. The UI correctly shows the failure, which is informative, but the underlying cause is the dedup.
Impact: User sees that the model tried to write and FAILED. Extremely jarring. The model then claims success in its summary despite the visible failure.
Required fix: Resolving BUG 1 resolves this. Additionally (BUG 20), the model should be informed when a write fails so it doesn't claim success.
Symptom: Token generation was visibly active (streaming text) but the file line count was stuck at 596 lines and not progressing.
User quote: "it's currently stuck at 596 lines. But you can see there's still tokens being generated. So that right there is a very, very critical bug."
Root cause: The model was generating content that was being blocked by write dedup (BUG 1). Tokens were flowing but the tool calls that would write to the file were being silently rejected.
Required fix: Resolving BUG 1 is the primary fix. But the UI should also indicate when writes are being blocked — the user should never see active generation with zero progress and no explanation.
Symptom: At 22:59:05 in Test 2: "Generation force-aborted via sequence disposal — treating as timeout" followed by "Object is disposed" cascade x4 in LlamaChat.js:73/84, llmEngine.js:859. After this, the conversation is permanently dead — no recovery possible. Even summary generation fails.
Log evidence:
[LLM] Generation force-aborted via sequence disposal — treating as timeout
TypeError: Object is disposed (x4)
at LlamaChat.js:84:73 (sequence getter)
at llmEngine.js:859 (_runGeneration)
Root cause file: main/llmEngine.js line 859 (_runGeneration) — no recovery mechanism after sequence disposal. Once the sequence is disposed, the entire conversation context is lost.
Required fix: Need a sequence recovery mechanism — when disposal occurs, the engine should be able to recreate the sequence/context and continue the conversation. This may require saving conversation state to a recoverable format before disposal occurs, or catching the disposal error and reinitializing.
Symptom: "The context size is too small to generate a response" at iteration 3 of Test 2. ctx=14080, only ~3 exchanges deep into a fresh conversation. The model couldn't generate a response at all.
Root cause candidates:
- System prompt + tool descriptions may consume too much of the 14080 token context
- Reasoning/thinking tokens from Qwen3.5 may be uncapped, consuming context without producing visible output
- The model's tool description overhead may be excessive for small context sizes
Required fix: Investigate exactly how much context the system prompt + tool descriptions consume. If it's >50% of a 14K context, the compact preamble needs to be more compact. Reasoning tokens need a cap. The system should never overflow context after only 3 exchanges — that means the baseline overhead is too large.
Symptom: mkdir -p src/{config,middleware,...} (bash brace expansion) in Test 2. Failed twice before model self-corrected to PowerShell on 3rd attempt.
Root cause: Same as BUG 7 — no OS detection in system prompt.
Required fix: Same as BUG 7 — inject OS into system prompt.
Symptom: write_todos was generating correctly (2 items with proper id/status/text structure) but BUG 11 (sequence disposal crash) killed the conversation before the 7-part plan could complete.
Root cause: BUG 11 (sequence disposal cascade).
Required fix: Resolving BUG 11 resolves this.
Symptom: Models repeatedly generate Linux commands on Windows across multiple tests and multiple models. No mechanism exists to inform the model what OS it's running on.
Root cause: main/constants.js system preamble (both compact and full) does not include OS information.
Required fix: Add dynamic OS detection: const os = process.platform === 'win32' ? 'Windows' : process.platform === 'darwin' ? 'macOS' : 'Linux'; and inject into preamble: "The user's operating system is {os}. Always use {os}-appropriate commands."
Symptom: The write dedup counter treats append_to_file and write_file identically. A file that needs 10 appends to build incrementally is blocked after the 3rd write.
Root cause: main/mcpToolServer.js — write dedup tracking uses a single counter per filename regardless of operation type.
Required fix: Separate tracking for append_to_file vs write_file. append_to_file should have either no dedup limit or a much higher one (append is inherently incremental). write_file dedup (overwriting the same file repeatedly) may still be valid with a limit.
Symptom: Once LlamaChat sequence is disposed, there is no retry, reconnect, or reinitialize mechanism. The conversation dies permanently.
Root cause: main/llmEngine.js — _runGeneration does not catch disposal errors and attempt recovery.
Required fix: Implement a sequence recovery path: catch "Object is disposed" errors, recreate the chat session with conversation history from the rolling summary, and retry generation. This is the safety net for BUG 11.
Symptom: Two subsystems interact destructively, consuming iteration budget without making progress. The incomplete file detector forces continuation, but dedup blocks the writes, so the file stays incomplete, triggering the detector again.
Root cause: main/agenticChat.js (incomplete file detection) and main/mcpToolServer.js (write dedup) are not aware of each other.
Required fix: Either (a) the incomplete file detector should check whether the last continuation attempt made progress (if not, don't force another), or (b) resolving BUG 1/16 eliminates the trigger. Both should be done for defense in depth.
Symptom: Each append_to_file tool result shows only the newly appended content. The user sees lines "disappear" as the code block shrinks to show only the latest chunk instead of the growing file.
Root cause: Frontend code block rendering for tool results displays only the tool result content, not the cumulative file.
Required fix: Same as BUG 2 — the code block must show the ENTIRE file contents accumulated so far after each append operation.
Symptom: Model's final summary said "dashboard.html exists with 239 lines (10KB) built incrementally" when the actual file was 633 lines of CSS-only with no body/script. The model doesn't know its writes were blocked.
Root cause: When write dedup blocks a write, the model receives no error feedback. It believes the write succeeded. So its self-assessment is based on what it thinks it wrote, not what actually reached disk.
Required fix: When a write is blocked by dedup (or any other mechanism), the tool MUST return an error to the model indicating the write was rejected. The model MUST know when its writes fail so it can adjust its plan.
- BUG 1/16 — Write dedup blocking appends (root cause of BUGs 5, 8, 9, 10, 18, 20)
- BUG 11/17 — Sequence disposal cascade + no recovery (root cause of BUGs 14)
- BUG 12 — Context overflow after 3 exchanges
- BUG 2/19 — Code block showing partial file instead of full file
- BUG 3 — Header vs code block line count mismatch
- BUG 20 — Model not informed of failed writes
- BUG 4 — Triple duplicate content at continuation seams
- BUG 6 — Content appended after
- BUG 7/13/15 — OS detection in system prompt
- BUG 8/18 — Forced continuation loop (no-progress detection)
| File | Bugs Addressed |
|---|---|
main/mcpToolServer.js |
1, 9, 16, 20 |
main/llmEngine.js |
11, 17 |
main/agenticChat.js |
4, 6, 8, 12, 18 |
main/constants.js |
7, 13, 15 |
| Frontend (ChatPanel/tool display) | 2, 3, 19 |
ALL changes must go to BOTH main/ AND pipeline-clone/main/.
ALL changes must be logged in pipeline-clone/CHANGES_LOG.md.