- main: Production-ready code (stable releases)
- devel: Development branch for ongoing work and testing
- feature branches: Optional, for complex features (branched off devel)
- Create feature branches from
develfor significant changes - Develop and test in feature branches or directly in
devel - Merge completed features into
devel - Thoroughly test in
develenvironment - Run the build process:
npm run buildto ensure the bundle is updated - When ready for release, merge
develintomain - Tag releases and publish to npm from
main
- Use semantic versioning (MAJOR.MINOR.PATCH)
- Auto-bump: When merging
develintomain, thepost-mergehook automatically runsnpm version patchand commits the bump - Manual version changes are only needed for minor/major bumps
- Document changes in commit messages
- Run unit tests:
npm test - Ensure coverage remains high (>90% statements)
- Manual testing of CLI functionality
- Integration testing with different providers
- Ensure all tests pass in
devel - Merge
develintomain:git checkout main && git merge devel - Version is automatically bumped (patch) by the
post-mergehook - For minor/major bumps: manually run
npm version minorornpm version major - Create git tag:
git tag -a vX.Y.Z -m "Release X.Y.Z" - Push to repository:
git push origin main --tags - Publish to npm:
npm publish - Return to development:
git checkout devel && git merge main
The proactive improvement loop benchmarks nex-code against real-world tasks and auto-fixes regressions:
npm run improve # full loop: benchmark → fix → validate → commit
npm run improve -- --dry-run # benchmark once, show failure clusters
npm run benchmark:reallife # run 35-task benchmark onlyHow it works:
scripts/benchmark-reallife.jsruns 35 commit-sized tasks across 7 categories in temp directories using nex-code headless mode- Results are scored with a weighted composite: taskCompletion (40%) + editPrecision (25%) + efficiency (20%) + quality (15%)
scripts/improve.jsclusters failures, picks the top pattern, runs nex-code to implement ONE fix- Rebuilds dist, re-benchmarks, commits if improved, reverts if regressed
- Stops after 3 plateaus, score >= 95, or 8 passes
Real-life benchmark telemetry is validated before a run counts toward the score. Successful runs without a clean JSON done event are marked as harness-failure and excluded from aggregate scoring so broken measurement does not masquerade as agent regression.
The pre-push smoke gate (npm run benchmark:gate) compares the current run against a rolling median baseline from recent matching gate runs. A single soft regression is surfaced as a warning; clear outliers and repeated regressions still block the push.
Safety bounds are enforced before every commit:
- SSH_STORM_WARN: [6, 12], SSH_STORM_ABORT: [8, 18]
- INVESTIGATION_CAP: [10, 18], POST_WIPE_BUDGET: [10, 17]
Each task runs through plan → implement → verify, each with a different model optimized for that phase. Auto-enabled on Ollama Cloud; see docs/MODEL-SELECTION.md for details.
Key files: cli/task-router.js (routing config + phase resolution), cli/agent.js (phase state machine + transitions), cli/benchmark.js (phase-specific scoring tasks).
Disable: NEX_PHASE_ROUTING=0. Force-enable on non-Ollama: NEX_PHASE_ROUTING=1.
cli/model-profiles.js provides per-model guard thresholds. Each model family gets tuned stale timeouts and investigation caps:
| Model | staleWarn | staleAbort | investigationCap | postEditCap |
|---|---|---|---|---|
| devstral-2 | 30s | 90s | 12 | 10 |
| devstral-small | 20s | 60s | 10 | 8 |
| qwen3-coder | 60s | 180s | 15 | 12 |
| kimi-k2 | 45s | 120s | 15 | 12 |
ENV overrides (NEX_STALE_WARN_MS, NEX_STALE_ABORT_MS) always take precedence.
cli/memory.js provides typed persistent memory stored in .nex/memory/{type}/:
- Types: user, feedback, project, reference
- Storage: Individual .md files with YAML frontmatter
- Index: Auto-generated
MEMORY.md(max 50 entries) injected into system prompt - Tool:
save_memory(type, name, content)registered in tools/index.js - Dedup: Skips save if existing file has identical first 200 chars
- Context guard: Truncates at ~2000 tokens to prevent context pressure
Legacy remember(key, value) / recall(key) API remains for backward compatibility.
.nex/skills/*.md files support YAML frontmatter with trigger patterns:
---
trigger:
- drums
- drumcomputer
- beat sequencer
---
Instructions injected when any trigger matches the user's first message.cli/skills.js matches task descriptions against triggers at session start. Matched skills inject max 3 lines of instructions into the system prompt.
If merge conflicts occur during the merge from devel to main:
- Checkout to main:
git checkout main - Attempt merge:
git merge devel - Resolve conflicts manually in conflicted files
- Important: Stage resolved files with
git add <resolved-files>— forgetting this causes the merge to stay incomplete - Complete merge:
git commit - Push changes:
git push origin main
Startup detection: nex-code detects unresolved merge conflicts at startup and displays a red warning listing affected files. The LLM context also includes conflict info so the agent won't attempt edits on conflicted files.
Project Structure & Indexing
bin/nex-code.js: CLI entrypoint (wrapper)cli/: All source codecli/agent.js: Core agentic loop (processInput)cli/server-mode.js: JSON-lines IPC server for the VS Code extensioncli/safety.js: Confirmation logic (confirm,setConfirmHook)cli/index-engine.js: Fast file indexing (ripgrep/fallback)cli/providers/: Provider implementationsdist/: Final bundled CLI scriptstests/: Jest test files
The VS Code extension at ~/Coding/nex-code-vscode/ spawns nex-code as nex-code --server. Communication uses newline-delimited JSON over stdin/stdout. stderr is forwarded to VS Code's Output channel.
Extension → nex-code (stdin):
{ "type": "chat", "id": "msg-001", "text": "fix the bug" }
{ "type": "confirm", "id": "cfm-001", "answer": true }
{ "type": "cancel" }
{ "type": "clear" }nex-code → Extension (stdout):
{ "type": "ready" }
{ "type": "token", "id": "msg-001", "text": "Here is" }
{ "type": "tool_start", "id": "msg-001", "tool": "read_file", "args": {"path": "src/auth.js"} }
{ "type": "tool_end", "id": "msg-001", "tool": "read_file", "summary": "✓ 142 lines", "ok": true }
{ "type": "confirm_request", "id": "cfm-001", "question": "Run git push?", "tool": "bash", "critical": true }
{ "type": "done", "id": "msg-001" }
{ "type": "error", "id": "msg-001", "message": "Provider not configured" }cli/server-mode.jsredirectsconsole.log/warn/infoto stderr to keep stdout clean for JSONcli/agent.jsprocessInput(input, serverHooks)accepts hooks foronToken,onToolStart,onToolEndcli/safety.jssetConfirmHook(fn)overridesconfirm()so critical tool confirmations are routed through the extension's Yes/No dialog instead of blocking on TTY inputsetAutoConfirm(true)is called beforestartServerMode()so non-critical tools run without interruption- The
--serverbranch inbin/nex-code.jsusesreturnafterstartServerMode()to prevent the REPL from starting
The remote_agent tool (in cli/tools.js) delegates a coding task to a nex-code instance running on a remote server via SSH:
- Reads server config from
.nex/servers.json(same file used byssh_exec) - Base64-encodes the task to avoid shell-escaping issues
- SSH-executes:
nex-code --prompt-file /tmp/nexcode-XXXX.txt --autoon the remote - Returns the last 5000 characters of stdout
- 5-minute timeout; temp file is cleaned up on the remote after execution
Server-side requirement: nex-code must be installed globally on the target server (npm install -g nex-code) and the server must have an Ollama-compatible .env at ~/.nex-code/.env (or use ~/.nex-code/models.env).
Mac↔Server workflow: Use the global skill ~/.nex-code/skills/server-agent.md to teach the Mac agent which project names map to which server paths. The skill instructs the agent to use remote_agent for server-side project work automatically.
Install all hooks with:
10. npm install
11. npm run build # Build the high-performance bundle
12. cp .env.example .env
13. npm run install-hooks| Hook | Purpose |
|---|---|
pre-push |
Scans pushed commits for secrets (API keys, tokens, private keys) and blocks the push if found |
post-merge |
On devel→main merge: auto-bumps patch version and commits. On any merge with package.json changes: runs npm install |
Write-tool file locking uses lockedFiles (module-level Map) guarded by acquireLock/releaseLock.
Two rules enforced since v0.3.26:
- No concurrent same-agent locks:
locksHeld(per-run Set) is checked before callingacquireLock. Even thoughacquireLockallows re-locking by the sameagentId, parallel tool calls within onePromise.allbatch would both pass —locksHeldprevents this. - Lock released on tool completion: Each promise's
.then()/.catch()callsreleaseLockand removes fromlocksHeldimmediately, rather than waiting until end-of-iteration.
rateLimitRetries and networkRetries are reset to 0 after every successful API response.
Without this reset, transient errors early in a session would eat into the retry budget (MAX_RATE_LIMIT_RETRIES = 5,
MAX_NETWORK_RETRIES = 3) for all subsequent calls, causing premature hard-stop errors.
In Phase 4 (message removal), tokens tracks message-only token counts.
available = targetMax - toolTokens, so the correct loop condition is tokens > available
(not tokens + toolTokens > available, which would over-remove messages by targeting targetMax - 2*toolTokens).
continue outer resets i = 0. Therefore iterLimit must be set to exactly 20 (not += 20)
when auto-extending. Using iterLimit += 20 causes the next pass to run iterLimit iterations
from scratch (70, then 90, then 110…), producing up to 1 650 total iterations instead of the
intended 250 (50 initial + 10 × 20). Fixed in v0.3.26.
When the LLM returns an empty assistant message after using tools, a nudge is pushed into
apiMessages to prompt a summary. This nudge must also be appended to conversationMessages
to keep the two arrays in sync. Without this, subsequent turns have two consecutive assistant
messages in conversationMessages, violating the strict user/assistant alternation required by
Anthropic's API (→ 400 Bad Request) and causing context confusion on other providers.
The old two-cache implementation keyed messages by role:content.length:tool_calls.length.
Any two messages with the same role, same content length, and same number of tool calls would
collide — returning the serialized form of the first message for all subsequent ones, silently
corrupting API payloads. Replaced with a single WeakMap keyed by object identity, which is
collision-free by definition and GC-safe (entries are freed when the message object is released
from the conversation array).
Building the tail-window of a large log by calling Array.unshift in a loop is O(k) per call
and O(k²) overall. On a 100 k-line log this caused a UI freeze. Fixed by using push (O(1))
followed by a single reverse() call (O(k)).