Skip to content

Commit 7391b5a

Browse files
author
Brendan Gray
committed
v1.8.38: Fix 10/10b VRAM partial offload, Fix 11 salvage continuation, Fix 12 enhanced logging
Fix 10: Partial GPU offload context limiter - min(ramCtx, vramCtx) prevents 131K context on 4GB VRAM Fix 10b: Embedding table VRAM overhead - accounts for ~25% model size as embedding always on GPU Fix 11: Salvage continuation completeness detection - detects complete files and redirects to remaining task Fix 12: Enhanced streaming logs with token content preview (last 80 chars in progress log) Also includes Fix 5-9 from earlier this session (fence stripping, streaming logs, session recovery, literal newline unescape, multi-file preamble)
1 parent b526eac commit 7391b5a

8 files changed

Lines changed: 292 additions & 46 deletions

File tree

.github/copilot-instructions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ Read this list first. Every item has a full section below.
154154
- **Tests: three dimensions** — Coherence + tool correctness + response quality. Never count alone
155155
- **Never tailor changes to pass tests** — Tests reveal real behavior. Don't teach to the test
156156
- **Never revert without explaining why pre-fix state was working** — A revert is not a fix
157-
- **Hardware-agnostic always** — Never target a specific machine, GPU, or model size
157+
- **Hardware-agnostic always** — Never target a specific machine, GPU, or model size. Never hardcode a context size number (e.g. `contextSize: 8192`) — always compute from actual available RAM/VRAM at runtime. Hardcoded resource numbers are a hardware-specific violation even if they "work" on the dev machine. **Corollary — do NOT use `{min, max}` range-based context selection for `qwen35` SSM/Mamba hybrid architectures**: node-llama-cpp's binary-search estimator inflates KV cache requirements by 100x for these models (uses `trainContextSize=262144` as base), returning near-minimum context even on 32GB RAM machines. Always use explicit computed size + `ignoreMemorySafetyChecks: true` + `failedCreationRemedy: {retries:8, autoContextSizeShrink:0.5}` so actual hardware capacity drives the result.
158158
- **Production software** — Every fix must work for 4GB GPU users AND 128GB workstation users
159159
- **No cloud APIs as primary** — This is a local-first product. Cloud is not the answer
160160
- **Read CHANGES_LOG.md before proposing any fix** — Context resets. The log is the anchor

main/agenticChat.js

Lines changed: 128 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -572,9 +572,10 @@ function register(ctx) {
572572
memoryStore.addConversation('assistant', fullCloudResponse);
573573

574574
// Clean up display — strip inline JSON tool calls with proper brace matching
575+
// Only match objects with "tool" key — "name" is too common in generated code
575576
let cleanResponse = fullCloudResponse;
576577
{
577-
const toolPat = /\[?\s*\{\s*"(?:tool|name)"\s*:\s*"/g;
578+
const toolPat = /\[?\s*\{\s*"tool"\s*:\s*"/g;
578579
let tm;
579580
const ranges = [];
580581
while ((tm = toolPat.exec(cleanResponse)) !== null) {
@@ -887,8 +888,11 @@ function register(ctx) {
887888
const recovered = sessionStore.initialize(sessionId);
888889
if (!recovered) {
889890
// Check for crash recovery from recent session
891+
// Only recover if message key matches — prevents cross-task session contamination
890892
const recoverable = SessionStore.findRecoverableSession(sessionBasePath);
891-
if (recoverable?.hasRollingSummary) {
893+
const currentMsgKey = message.substring(0, 30).replace(/[^a-z0-9]/gi, '');
894+
const recoveredMsgKey = (recoverable?.sessionId || '').replace(/^\d+_/, '');
895+
if (recoverable?.hasRollingSummary && recoveredMsgKey === currentMsgKey) {
892896
const recoveredSummary = sessionStore.initialize(recoverable.sessionId)
893897
? sessionStore.loadRollingSummary(RollingSummary)
894898
: null;
@@ -1078,13 +1082,23 @@ function register(ctx) {
10781082

10791083
// Throttled context usage updates during streaming (every 500ms)
10801084
let _streamingResponseLen = 0;
1085+
let _streamingTailBuf = ''; // last ~100 chars for log preview
1086+
let _lastStreamLogTime = Date.now();
10811087
const promptLen = typeof currentPrompt === 'string' ? currentPrompt.length : ((currentPrompt.systemContext || '').length + (currentPrompt.userMessage || '').length);
10821088
const _contextUsageInterval = mainWindow ? setInterval(() => {
10831089
try {
10841090
let used = 0;
10851091
try { if (llmEngine.sequence?.nextTokenIndex) used = llmEngine.sequence.nextTokenIndex; } catch (_) {}
10861092
if (!used) used = Math.ceil((promptLen + _streamingResponseLen) / 4);
10871093
mainWindow.webContents.send('context-usage', { used, total: totalCtx });
1094+
// Periodic streaming progress log (every 30s) for debugging
1095+
const now = Date.now();
1096+
if (now - _lastStreamLogTime >= 30000) {
1097+
// Include content preview: last 80 chars of what was generated
1098+
const preview = _streamingTailBuf.slice(-80).replace(/\n/g, '\\n');
1099+
console.log(`[AI Chat] Streaming progress: ${_streamingResponseLen} chars, ~${used}/${totalCtx} ctx tokens (${Math.round(used/totalCtx*100)}%) | tail: "${preview}"`);
1100+
_lastStreamLogTime = now;
1101+
}
10881102
} catch (_) {}
10891103
}, 500) : null;
10901104

@@ -1094,7 +1108,7 @@ function register(ctx) {
10941108
const nativeResult = await llmEngine.generateWithFunctions(
10951109
currentPrompt, nativeFunctions,
10961110
{ ...(context?.params || {}), maxTokens: effectiveMaxTokens },
1097-
(token) => { if (isStale()) { llmEngine.cancelGeneration('user'); return; } _streamingResponseLen += token.length; localTokenBatcher.push(token); },
1111+
(token) => { if (isStale()) { llmEngine.cancelGeneration('user'); return; } _streamingResponseLen += token.length; _streamingTailBuf = (_streamingTailBuf + token).slice(-100); localTokenBatcher.push(token); },
10981112
(thinkToken) => { if (isStale()) { llmEngine.cancelGeneration('user'); return; } localThinkingBatcher.push(thinkToken); },
10991113
(funcCall) => {
11001114
if (mainWindow && !mainWindow.isDestroyed()) {
@@ -1118,6 +1132,7 @@ function register(ctx) {
11181132
}, (token) => {
11191133
if (isStale()) { llmEngine.cancelGeneration('user'); return; }
11201134
_streamingResponseLen += token.length;
1135+
_streamingTailBuf = (_streamingTailBuf + token).slice(-100);
11211136
localTokenBatcher.push(token);
11221137

11231138
// Live tool-call bubble
@@ -1401,13 +1416,24 @@ function register(ctx) {
14011416

14021417
fullResponseText += newContent;
14031418

1404-
// Strip tool fences and raw inline JSON tool calls from display copy
1419+
// Strip tool fences from display copy — but preserve legitimate ```json code blocks
1420+
// Only strip ```tool/```tool_call fences (always tool calls) and ```json fences
1421+
// that actually contain tool call JSON ({"tool": "..."}). This prevents stripping
1422+
// legitimate JSON code examples (package.json, API responses, configs) which caused
1423+
// line count regressions (e.g. 210 → 160 lines when Express.js tutorials were stripped).
14051424
let displayChunk = newContent
1406-
.replace(/\n?```(?:json|tool_call|tool)\b[\s\S]*?```\n?/g, '')
1407-
.replace(/\n?```(?:json|tool_call|tool)\b[\s\S]*$/g, '');
1425+
.replace(/\n?```(?:tool_call|tool)\b[\s\S]*?```\n?/g, '')
1426+
.replace(/\n?```(?:tool_call|tool)\b[\s\S]*$/g, '')
1427+
.replace(/\n?```json\b([\s\S]*?)```\n?/g, (match, content) => {
1428+
return /"\s*tool\s*"\s*:\s*"/.test(content) ? '' : match;
1429+
})
1430+
.replace(/\n?```json\b([\s\S]*)$/g, (match, content) => {
1431+
return /"\s*tool\s*"\s*:\s*"/.test(content) ? '' : match;
1432+
});
14081433
// Strip inline JSON tool calls with proper brace matching (handles nested objects)
1434+
// Only match objects with "tool" key — "name" is too common in generated code
14091435
{
1410-
const toolPattern = /\[?\s*\{\s*"(?:tool|name)"\s*:\s*"/g;
1436+
const toolPattern = /\[?\s*\{\s*"tool"\s*:\s*"/g;
14111437
let tm;
14121438
const ranges = [];
14131439
while ((tm = toolPattern.exec(displayChunk)) !== null) {
@@ -1737,9 +1763,90 @@ function register(ctx) {
17371763
const maxTailChars = Math.max(500, Math.min(Math.floor(remainingTokens * 0.3 * 4), 3000));
17381764
if (_hasUnclosedToolFence) {
17391765
const partialFence = _stitchedForMcp.slice(_fenceIdx);
1740-
_pendingPartialBlock = partialFence; // keep FULL text for stitching
1741-
const tailForModel = partialFence.length > maxTailChars ? partialFence.slice(-maxTailChars) : partialFence;
1742-
continuationMsg = `${taskHint}${fileManifest}[Continue the tool call JSON from exactly where it was cut. Output ONLY the JSON continuation. Do NOT restart the tool call. Continue from:\n${tailForModel}]`;
1766+
1767+
// ── Salvage-and-Append: when a write_file call is truncated mid-content,
1768+
// salvage the partial file, write it to disk, then switch the model to
1769+
// append_to_file for the remaining content. This prevents the model from
1770+
// re-generating the entire file (which causes 87%+ overlap → rotation). ──
1771+
const _isWriteFile = partialFence.includes('"write_file"');
1772+
const _hasFP = /"filePath"\s*:\s*"[^"]+"/.test(partialFence);
1773+
const _hasLongContent = /"content"\s*:\s*"[\s\S]{200,}/.test(partialFence);
1774+
1775+
let _didSalvageAppend = false;
1776+
if (_isWriteFile && _hasFP && _hasLongContent) {
1777+
const salvaged = salvagePartialToolCall(_stitchedForMcp, _fenceIdx);
1778+
if (salvaged) {
1779+
try {
1780+
const salvageMatch = salvaged.match(/```json\n([\s\S]*?)\n```/);
1781+
const salvageJson = salvageMatch ? JSON.parse(salvageMatch[1]) : null;
1782+
const salvagePath = salvageJson?.params?.filePath;
1783+
const salvageContent = salvageJson?.params?.content || '';
1784+
1785+
if (salvagePath && salvageContent.length >= 100) {
1786+
const writeResult = await mcpToolServer.executeTool('write_file', {
1787+
filePath: salvagePath,
1788+
content: salvageContent,
1789+
});
1790+
const lineCount = salvageContent.split('\n').length;
1791+
console.log(`[AI Chat] Salvage-and-append: wrote ${lineCount} lines to "${salvagePath}"`);
1792+
1793+
// Update writeFileHistory — blocks future write_file, forces append_to_file
1794+
if (!writeFileHistory[salvagePath]) writeFileHistory[salvagePath] = { count: 0, maxLen: 0 };
1795+
writeFileHistory[salvagePath].count++;
1796+
if (salvageContent.length > writeFileHistory[salvagePath].maxLen) {
1797+
writeFileHistory[salvagePath].maxLen = salvageContent.length;
1798+
}
1799+
1800+
// Send UI event for the artifact
1801+
if (mainWindow && !mainWindow.isDestroyed()) {
1802+
mainWindow.webContents.send('mcp-tool-results', [{
1803+
tool: 'write_file',
1804+
params: { filePath: salvagePath, content: '...(salvaged partial)' },
1805+
result: writeResult,
1806+
}]);
1807+
}
1808+
1809+
// Track in summarizers
1810+
try {
1811+
summarizer.recordToolCall('write_file', { filePath: salvagePath }, writeResult);
1812+
rollingSummary.recordToolCall('write_file', { filePath: salvagePath }, writeResult, iteration);
1813+
} catch (_) {}
1814+
1815+
// Build continuation prompt with completeness detection
1816+
const allLines = salvageContent.split('\n');
1817+
const lastLines = allLines.slice(-10).join('\n');
1818+
_pendingPartialBlock = null; // switch from JSON stitching to free-form
1819+
1820+
// Heuristic: detect if salvaged file looks syntactically complete
1821+
const trimmedEnd = salvageContent.trimEnd();
1822+
const lastCodeLine = trimmedEnd.split('\n').pop().trim();
1823+
const looksComplete = /^(module\.exports\s*=|export\s+(default\s+)?|\}\s*;?\s*$|\}\)\s*;?\s*$)/.test(lastCodeLine);
1824+
1825+
if (looksComplete) {
1826+
// File appears complete — redirect model to remaining task
1827+
console.log(`[AI Chat] Salvaged file "${salvagePath}" looks complete (${lineCount} lines, ends: "${lastCodeLine.substring(0, 50)}")`);
1828+
continuationMsg = `${taskHint}${fileManifest}[File "${salvagePath}" has been written successfully (${lineCount} lines). This file is COMPLETE — do NOT rewrite or append to it. Continue with your original task — create the remaining files that were requested. Use write_file for each new file.]`;
1829+
} else {
1830+
// File is mid-content — tell model to append
1831+
continuationMsg = `${taskHint}${fileManifest}[File "${salvagePath}" has been written with ${lineCount} lines so far. The file is NOT complete — more content is needed. Use append_to_file (NOT write_file) with filePath="${salvagePath}" to continue adding content from where the file ends. Here are the last 10 lines of the file:\n\`\`\`\n${lastLines}\n\`\`\`\nCall append_to_file now to add the next section starting after line ${lineCount}.]`;
1832+
}
1833+
1834+
// Counteract iteration-- from above: salvage is a new agentic pass
1835+
iteration++;
1836+
_didSalvageAppend = true;
1837+
}
1838+
} catch (salvErr) {
1839+
console.warn(`[AI Chat] Salvage-and-append failed: ${salvErr.message}`);
1840+
}
1841+
}
1842+
}
1843+
1844+
if (!_didSalvageAppend) {
1845+
// Existing logic: continue the JSON from where it was cut
1846+
_pendingPartialBlock = partialFence; // keep FULL text for stitching
1847+
const tailForModel = partialFence.length > maxTailChars ? partialFence.slice(-maxTailChars) : partialFence;
1848+
continuationMsg = `${taskHint}${fileManifest}[Continue the tool call JSON from exactly where it was cut. Output ONLY the JSON continuation. Do NOT restart the tool call. Continue from:\n${tailForModel}]`;
1849+
}
17431850
} else {
17441851
_pendingPartialBlock = responseText; // enable overlap detection for ALL continuation types
17451852
const tailForModel = responseText.length > maxTailChars ? responseText.slice(-maxTailChars) : responseText;
@@ -2118,13 +2225,20 @@ function register(ctx) {
21182225
let cleanResponse = displayResponseText;
21192226
cleanResponse = cleanResponse.replace(/<think(?:ing)?>\s*[\s\S]*?<\/think(?:ing)?>/gi, '');
21202227
cleanResponse = cleanResponse.replace(/<\/?think(?:ing)?>/gi, '');
2121-
// Strip any tool call fence artifacts that leaked through continuation boundaries
2122-
cleanResponse = cleanResponse.replace(/\n?```(?:json|tool_call|tool)\b[\s\S]*?```\n?/g, '');
2123-
cleanResponse = cleanResponse.replace(/\n?```(?:json|tool_call|tool)\b[\s\S]*/g, '');
2228+
// Strip tool call fence artifacts — but preserve legitimate ```json code blocks
2229+
cleanResponse = cleanResponse.replace(/\n?```(?:tool_call|tool)\b[\s\S]*?```\n?/g, '');
2230+
cleanResponse = cleanResponse.replace(/\n?```(?:tool_call|tool)\b[\s\S]*/g, '');
2231+
cleanResponse = cleanResponse.replace(/\n?```json\b([\s\S]*?)```\n?/g, (match, content) => {
2232+
return /"\s*tool\s*"\s*:\s*"/.test(content) ? '' : match;
2233+
});
2234+
cleanResponse = cleanResponse.replace(/\n?```json\b([\s\S]*)$/g, (match, content) => {
2235+
return /"\s*tool\s*"\s*:\s*"/.test(content) ? '' : match;
2236+
});
21242237
cleanResponse = cleanResponse.replace(/^\s*```\s*$/gm, '');
21252238
// Strip raw inline JSON tool calls with proper brace matching (handles nested objects)
2239+
// Only match objects with "tool" key — "name" is too common in generated code
21262240
{
2127-
const toolPat = /\[?\s*\{\s*"(?:tool|name)"\s*:\s*"/g;
2241+
const toolPat = /\[?\s*\{\s*"tool"\s*:\s*"/g;
21282242
let tm;
21292243
const ranges = [];
21302244
while ((tm = toolPat.exec(cleanResponse)) !== null) {

main/agenticChatHelpers.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,7 +288,7 @@ function evaluateResponse(responseText, functionCalls, taskType, iteration) {
288288
/\{\s*"(?:tool|name)"\s*:\s*"[^"]+"/.test(text);
289289
if (hasToolJson) return { verdict: 'COMMIT', reason: 'text_tool_call' };
290290

291-
if (text.length < 15) return { verdict: 'ROLLBACK', reason: 'empty' };
291+
if (text.length === 0) return { verdict: 'ROLLBACK', reason: 'empty' };
292292

293293
return { verdict: 'COMMIT', reason: 'default' };
294294
}

main/constants.js

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,8 @@ If your output is cut off mid-generation, the system will automatically continue
6767
- Use web_search when the answer may have changed since your training (current doc versions, real-time info, recent events, anything that varies over time). Do not use for static programming knowledge you can answer directly.
6868
- If a tool fails, analyze the error and retry once with corrected parameters — never give up on the first failure
6969
- Never claim a task is done before calling the tool that does it — writing a file requires write_file, searching requires web_search, running a command requires run_command
70+
- If the user asks for multiple files, create ALL of them. Call write_file for EACH file — do not stop after the first file. Do not claim a file was created unless you received a success result from write_file for that specific file. Do not summarize until every requested file exists.
71+
- Always use the exact filename the user specifies.
7072
- When read_file fails with ENOENT, call find_files to locate the file by name
7173
- **Never output full file content as code blocks in chat.** Always use the appropriate tool: write_file for new files, edit_file for modifications, append_to_file for additions, read_file before editing. Code blocks are only for brief snippets or explanations.
7274
- edit_file: call read_file first to get the exact current text, then supply precise oldText

0 commit comments

Comments
 (0)