AGENT: Read this file at the start of every new context window before touching any code. Also re-read
.github/copilot-instructions.mdandAGENT_RULES.mdbefore proceeding. DO NOT build. DO NOT guess. Read the relevant file before editing it.
STATUS SUMMARY (updated 2026-02-25 — context overflow clean-break, web search date injection, thinking-bubble code fence strip)
| Fix | Description | Status |
|---|---|---|
| BUG-IMG1 | ```json artifact visible in thinking bubble from context rotation summary |
✅ IN SOURCE — agenticChat.js: strip tool-call code blocks from convSummary before sending as llm-thinking-token; useChatStreaming.ts: regex upgraded to strip full code blocks not just opening fence |
| BUG-IMG2 | Context overflow death spiral: all 10 rotations fail → raw CONTEXT_OVERFLOW error sent as visible llm-token |
✅ IN SOURCE — agenticChat.js: isContextOverflow now also matches "default context shift strategy did not return" (node-llama-cpp error not prefixed with CONTEXT_OVERFLOW:); after rotation exhaustion a clean user-facing message is sent and loop breaks instead of falling through to nonContextRetries |
| BUG-DATE | web_search returns stale year (2023) — model has no ground truth to reject it |
✅ IN SOURCE — agenticChat.js: prefixed every web_search tool result block with actual new Date() in plain English so model can evaluate snippet freshness |
| FIX 1 | Llama 3.2-3B word salad | [LLM:DIAG] lines in log to see exact wrapper state + sampling params + full chat history being sent to model. |
| REGRESSION | All probes failing on 4GB GPU | ✅ FIXED — probe moved pre-context |
| SESSION RESET BUG | Wrapper not reapplied after resetSession() | ✅ FIXED — resetSession now calls _applyNamedWrapper(_selectedWrapperName) after creating new LlamaChat |
| FIX 2 | \b JSON escape in Windows paths breaks pathValidator |
✅ ALREADY IN SOURCE — targetPath.replace(/[\x00-\x1F]/g, '') strips all control chars |
| FIX 3 | Mid-stream tool JSON leaks as visible text | ❌ NOT APPLIED |
| FIX 4 | OpenAI function-call format [{"name":...}] not parsed |
✅ ALREADY IN SOURCE — Method 3e exists in main/tools/mcpToolParser.js lines 555-577 |
| FIX 5 | Seeded history primes model for wrong behavior | ✅ IN SOURCE (PF-3) — strips json\n{"tool": blocks; not yet in installed build |
| FIX 6 | Qwen3-0.6B hallucinates file listing | ❌ NOT APPLIED |
| FIX 7 | agenticPhases: duplicate entries + persists after generation | ✅ DONE — dedup replaces in-place; useEffect clears 1500ms after isGenerating→false |
| FIX 8 | Qwen3-4B-Function-Calling-Pro: "hi" outputs raw function JSON | ✅ DONE — chat-type gate detects pure function-call JSON and routes to processResponse |
| PF-2 | Qwen2.5-1.5B getting ThoughtTokenBudget:256 | ✅ IN SOURCE — modelProfiles+llmEngine fixed; not yet in installed build |
| PF-5 | BUG-029: small context drops tools entirely | ✅ IN SOURCE — compact fallback added; not yet in installed build |
Source-ready (needs build): All of the above, plus PF-1/PF-4 (executingTools spinner), VS Code phase UI
DO NOT delete wrapper-cache.json — cache stores wrapper NAMES (still correct); date preamble fix applies at build-time via _buildWrapperInstance.
main/modelProfiles.js: tiny → maxToolsPerTurn=8, effectiveContextSize=32768; small → maxToolsPerTurn=12main/agenticChatHelpers.js: iteration 1 general tool array reordered;wrong_tool_format+fabricated_infodetectors addedmain/constants.js: compact preamble rule added: "Never tell the user to run a command themselves"main/llmEngine.js:msg.includes('disposed')added to isContextError;isThinkingVariantextended withqwen3; thinkMode default'budget'→'none'; trusted-wrapper early return forLlama3_2LightweightChatWrappermain/modelProfiles.js+main/modelDetection.js: SmolLM family entirely removedmain/modelProfiles.js: Qwen base/tiny/small →thinkTokens: { mode: 'none' }+_thinkBudgetWhenActivemain/agenticChat.js: PF-3 history seeding strips tool JSON; PF-5 BUG-029 compact downgrade;agentic-phaseIPC eventspreload.js:onAgenticPhasechannelsrc/types/electron.ts:onAgenticPhaseinterfacesrc/components/Chat/ChatPanel.tsx:agenticPhasesstate, listener, render, clearing
STATUS:
⚠️ ROOT CAUSE UNKNOWN — DO NOT GUESS AGAIN. Diagnostic logging is now in place. Build → test with Llama-3.2-3B-Instruct-Q4_K_S → read log → find[LLM:DIAG]lines → read the exact system prompt, user message, wrapper state, and sampling params.
-
Jinja override suppression — added
Llama3_2LightweightChatWrapperto trusted-wrapper early-return list. Wrapper correctly applied, confirmed in log. Word salad continued. -
Date preamble disabled —
{ todayDate: null, cuttingKnowledgeDate: null }. Confirmed working (wrapper log showstodayDate=null). Word salad continued. -
Probe moved pre-context — fixed the VRAM regression for 4GB GPUs. Did NOT fix word salad.
-
topK 20→40 in defaultParams — this change does NOTHING.
modelProfiles.jsllama/smalloverrides topK to 20 viamergedParams = { ...defaultParams, ...modelOverrides }. The fix was pointless. The comment about it has been corrected. -
frequencyPenalty/presencePenalty to 0 — also did nothing.
llama/smallprofile already has both at 0.0 via theBASE_DEFAULTS→llama.basemerge. The generate call usesfreqPenalty ?? 0.1which correctly returns 0.0 since 0.0 is not null/undefined.
After building, send "hi" to Llama-3.2-3B-Instruct-Q4_K_S. Find these lines in the log:
[LLM:DIAG] ══ PRE-GENERATION SNAPSHOT ══
wrapper : Llama3_2LightweightChatWrapper
todayDate : null
cuttingDate : null
sampling : temp=0.4 topP=0.85 topK=20 repeat=1.12 freq=0 pres=0
history (2 turns):
[0] SYSTEM (N chars): <FULL SYSTEM PROMPT HERE>
[1] USER: hi
Read the SYSTEM prompt content. Compare it against what LM Studio sends. That is the next diagnostic step. Do not write another line of fix code until this has been done.
resetSession()was creating a newLlamaChatwithout reapplying the probe-confirmed wrapper. After any context overflow → reset, the wrapper would revert to node-llama-cpp auto-detected default (with live todayDate). Fixed:_applyNamedWrapper(_selectedWrapperName)now called in resetSession after LlamaChat creation.- Probe now passes production sampling params to
generateResponseso it tests the same regime as real inference.
File: main/llmEngine.js
Root cause (confirmed from logs):
[LLM] _applyChatWrapperOverride: arch="llama" wrapper="Llama3_2LightweightChatWrapper" hasJinja=true
[LLM] BUG-044: "Llama3_2LightweightChatWrapper" selected for arch "llama" — overriding with JinjaTemplateChatWrapper
The Jinja-first logic we added overrides Llama3_2LightweightChatWrapper even when node-llama-cpp correctly auto-detected it. The Llama 3.2 Jinja template, when executed raw, produces malformed output for that model → word salad.
Correct fix (GENERALIZED — not model-specific):
The early-return block at the top of _applyChatWrapperOverride() currently reads:
if (currentWrapper === 'JinjaTemplateChatWrapper' || currentWrapper === 'QwenChatWrapper') return;This needs to include ALL wrappers that node-llama-cpp correctly auto-detects for known families.
Add Llama3_2LightweightChatWrapper to this trusted list — and when it IS that wrapper, reconstruct
it with todayDate: null, cuttingKnowledgeDate: null (to prevent double-system-block corruption)
then return. Do NOT fall through to the Jinja-first block.
File: main/pathValidator.js
Root cause (confirmed from logs):
[MCPToolServer] Path traversal blocked: "C:\Usersrend\my-static-appddc" → escapes project
Model outputs C:\Users\brend\... in JSON. The \b is parsed as a backspace char (ASCII 8),
so the path becomes C:\Usersrend\.... The path validator reads this garbled path,
detects it doesn't match the project root, and rejects it as traversal.
Fix: In pathValidator.js, add a repairJsonEscapedPath(p) function that runs
BEFORE any security check. It must convert JSON escape sequences that appear inside
what looks like a Windows file path back to their literal characters:
\b→ keep as literalb(backspace has no meaning in a file path)\t→ keep as literalt(tab not valid in Windows paths)\f→ keep as literalf\n,\r→ remove (newlines cannot be in a path)
Only apply this repair if the string looks like a Windows absolute path
(starts with a drive letter X:\ or X:/).
Apply this repair at the entry point of validatePath() (or whatever the main
exported function is — read the file first to find the correct function name).
File: src/utils/chatContentParser.ts
Root cause:
During streaming, the raw ```json\n{"tool":"list_directory"...} ``` block is sent
character-by-character to the UI as streamingText. The UI renders it as plain text
while the block is building, so the user sees the raw JSON appear → flicker → disappear
when the tool card replaces it. This is not acceptable for production.
Fix: Add a function stripToolCallBlocksFromStream(text: string): string that:
- Removes complete
```json\n{"tool":...}\n```blocks entirely (they'll show as tool cards) - Removes INCOMPLETE/in-progress
```jsonblocks that haven't closed yet (use a regex that detects an opening```jsonwithout a closing```) - Also strips
<tool_call>...</tool_call>patterns (used by some models)
This function should be called in ChatPanel.tsx on streamingText before it's rendered —
do NOT modify the underlying buffer, only what's displayed.
Read src/utils/chatContentParser.ts and src/components/Chat/ChatPanel.tsx fully
before implementing — the streaming display pipeline needs to be understood before touching it.
File: main/mcpToolServer.js (the processResponse / parser section)
Root cause (confirmed from logs + screenshot 11): Qwen3-4B-Function-Calling-Pro (and potentially other models) outputs:
[{"name": "greet", "arguments": {"person": "Bob"}}, {"name": "say_hello", "arguments": {}}]This is OpenAI's native function-call format. Our parser only understands:
{"tool": "tool_name", "params": {"key": "value"}}So the output is rendered as raw text and no tool is executed.
Fix: In the response parser (wherever processResponse or tool detection runs —
read the file to find the exact function), add a pre-processing translation step:
Before the existing parsing logic, check if the response contains a JSON array where
items have "name" and "arguments" keys (OpenAI format). If detected, translate each
item to our format: {"tool": item.name, "params": item.arguments} and substitute
into the response text so the existing parser handles it normally.
Detection regex: /\[\s*\{\s*"name"\s*:/ — only fires on clear OpenAI format,
no false positives on normal JSON.
This supports BOTH formats for any model without changing any existing logic.
File: main/agenticChat.js
Root cause (confirmed from logs + screenshot 11):
[AI Chat] Seeded local chatHistory from renderer (4 of 4 turns, max=11 for 4352-token context)
When the user switches from Qwen3-0.6B to Qwen3-4B-Function-Calling-Pro, the renderer sends the previous model's conversation history. The 4B model receives turns that contain function definitions it generated in the 0.6B model's format. On the very first "hi" it fires function calls because the seeded history primes it for tool use.
Fix: In agenticChat.js, where chatHistory is seeded from the renderer, add a
model-change guard. If the model currently loading is different from the model that
generated the incoming history turns, do NOT seed — start fresh.
Read the seeding code in agenticChat.js first — search for Seeded local chatHistory from renderer
to find the exact location. The model name that generated the history is available on the
message objects or in a separate field — verify before implementing.
File: main/agenticChatHelpers.js — classifyResponseFailure() function
Root cause:
grammar=never for tiny models (0.6B). Grammar is disabled because it caused crashes
at that size. Without forced grammar, the model invents plausible text ("The files are
at C:\Users...") instead of calling list_directory. The agentic loop doesn't detect
this as a failure and returns the hallucinated text.
Fix: In classifyResponseFailure(), add a failure pattern check:
- If
hasToolCalls === falseANDtaskType !== 'chat'AND the response contains file-path-like strings (C:\,./,/) without any tool call → classify as{ type: 'hallucinated_file_info', severity: 'nudge' }and return a recovery prompt:"Use the list_directory tool to actually list the files — do not describe them from memory."
This is a general hallucination detector for file-info responses, not just 0.6B.
GPT-OSS-20B with effectiveContextSize=32768 only got 4 GPU layers on 4GB VRAM.
It OOM'd mid-generation. Consider: when gpuLayers < 10 for an xlarge model,
emit a warning to the UI: "This model requires more VRAM than available. Responses
may be slow or fail." This is a nice-to-have, not blocking.
Files: src/components/Chat/ChatPanel.tsx
Root cause (confirmed from screenshots 2 and 3 + code review):
- When
summarizing-history(or another phase) firesstartAFTER a previousdonefor the same phase (e.g., context compacted twice in one response), the dedup only blocks duplicaterunningstates — astartafter adoneadds a NEW entry. Result: 4 stacked "✓ Summarizing conversation history" entries. - The phases appear OUTSIDE the current streaming message bubble in screenshot 3 — they render at the bottom of the page below the model chip, suggesting they persisted after
isGeneratingwent false.
Fix:
- In the
starthandler (line ~321): instead of only checkingp.status === 'running', replace any EXISTING entry for thatphase(regardless of status) with a freshrunningentry. This way re-firing the same phase resets it in-place rather than stacking. - In the
donehandler: same key — update in place. - For persistence: add a
useEffectwatchingisGenerating— when it becomesfalse, callsetAgenticPhases([])with a 500ms delay (allows final render of done states before clearing).
Exact location: src/components/Chat/ChatPanel.tsx lines 315–335 (the onAgenticPhase listener callback).
Files: main/tools/mcpToolParser.js, main/agenticChat.js
Root cause (confirmed from screenshot 6 + 2026-02-22T01:44:55 log):
main/tools/mcpToolParser.jsdoes NOT have Method 3e (OpenAI array format:[{"name":"tool","arguments":{...}}]). It exists only in_pipeline_backup_2026-02-21/tools/mcpToolParser.js. It was never migrated to the live file.- The agentic loop's
Chat-type hard gateskipsprocessResponse()entirely whentaskType === 'chat'— so even if Method 3e were added, it wouldn't fire for a "hi" message. - The model fabricates non-existent tools ("greeting", "say_hello") because it always tries to emit function JSON regardless of input.
Fix (two parts):
- Part A — Add Method 3e to
main/tools/mcpToolParser.js: before the fallback section, detect[{"name":..., "arguments":...}]top-level array format and translate to{"tool": item.name, "params": item.arguments}for existing parser to handle. Read the LIVE file to find exact insertion point. - Part B — In
main/agenticChat.js, after the chat-type hard gate, check if the response is ONLY OpenAI function JSON (regex:/^\s*\[\s*\{\s*"name"\s*:/). If so, and the tool name is not a real tool in the tool list → treat as fabricated output, do NOT display it. Return empty/retry OR return a nudge. This handles the fabrication case regardless of task type.
Read before editing: main/tools/mcpToolParser.js entirely (to find insertion point for Method 3e), main/agenticChat.js around the chat-type hard gate.
| Fix | File to read fully before touching |
|---|---|
| 1 | main/llmEngine.js lines 500–615 (_applyChatWrapperOverride) |
| 2 | main/pathValidator.js entirely |
| 3 | src/utils/chatContentParser.ts entirely + src/components/Chat/ChatPanel.tsx lines 100–300 |
| 4 | main/tools/mcpToolParser.js entirely (Method 3e is in backup, needs port to live file) |
| 5 | main/agenticChat.js — search for Seeded local chatHistory from renderer |
| 6 | main/agenticChatHelpers.js — classifyResponseFailure() function (~line 760) |
| 7 | src/components/Chat/ChatPanel.tsx lines 310–360 (agenticPhase listener) |
| 8 | main/tools/mcpToolParser.js entirely + main/agenticChat.js chat-type hard gate |
- NEVER build — say "Ready to build" when done
- Plan before code — present each fix, wait for approval, then implement exactly that
- Read before editing — never assume file contents, always read first
- Production software — ships to ALL users on ALL hardware, every fix must be general
- Never say done without proof — verify each change landed correctly after applying