Local-first scaffold for a Claude-style advisor strategy across CLI sessions.
The executor model drives the task. The harness injects a starting prompt that tells the executor when to consult the advisor. When the executor reaches a hard decision, it emits an ADVISOR_CONSULT block. The harness records the event, calls the advisor model with reconstructed shared context, records the returned ADVISOR_GUIDANCE, and resumes the same executor session with that guidance.
Default local pairing:
- Executor: Kimi CLI
- Advisor: Codex CLI
- Test backend: deterministic fake adapter
See ROADMAP.md for the staged plan from the generic advisor harness to persistence, UI, vertical applications, and feedback-loop-driven prompt/schema improvement.
python3 -m venv .venv
. .venv/bin/activate
python3 -m pip install wheel
python3 -m pip install -e . --no-build-isolationmaa doctor
maa init
maa run "fake smoke task" --executor fake --advisor fake --max-turns 3 --max-advisor-calls 3
maa eval
maa ui
maa review --run <run_id> --advisor fake
maa release-readiness --sample --executor fake --advisor fakeLive local smoke:
maa run "For this smoke test, consult the advisor once before finalizing, then produce a short final answer." \
--executor kimi --advisor codex --timeout 240 --max-turns 3 --max-advisor-calls 2
maa review --run <run_id> --advisor codex --timeout 240Post-run feedback loop:
maa review --run <run_id> --advisor fakemaa review writes the freeform post-run review, a routing policy patch proposal, and structured improvement_proposals.json / improvement_proposals.md artifacts. Improvement proposals cover memory schema, executor prompt, and advisor prompt targets, validate that every proposal is advisory and requires human approval, and are never applied automatically.
Evaluation harness:
maa eval
maa eval --include-live --live-timeout 240maa eval runs deterministic fake regression scenarios for autonomous consultation, no-consult completion, malformed blocks, advisor stop signals, max-turn exhaustion, and missing EXECUTOR_DONE. It writes evaluation_summary.json, evaluation_summary.md, and scenario_results.jsonl under runs/eval_*/.
Run timeline UI:
maa ui
maa ui --serve --port 8765maa ui renders a local HTML dashboard to runs/ui/index.html from the SQLite run database. The dashboard filters persisted runs by status, backend, advisor call count, task text, and error mode, then shows the selected run timeline, prompts, raw CLI outputs, consult reasons, guidance, memory proposals, and outcome.
Release readiness vertical:
maa release-readiness --sample --executor fake --advisor fake
maa release-readiness --evidence path/to/release-evidence.md \
--executor kimi --advisor codex --timeout 240maa release-readiness runs a focused vertical workflow on top of the same advisor loop, persistence database, and UI timeline. The workflow assesses supplied release evidence, requires an advisor consultation before the final verdict, emits a structured RELEASE_READINESS_REPORT, and writes release_readiness_evaluation.json plus release_readiness_evaluation.md into the run directory.
The executor decides when advice is needed. The advisor does not run tools, mutate files, or write memory. The harness performs the actual advisor call and owns all durable state.
Executor consultation request:
<ADVISOR_CONSULT>
{"question":"...","context":"...","options":["..."],"preferred_option":"...","urgency":"normal"}
</ADVISOR_CONSULT>
Advisor guidance:
<ADVISOR_GUIDANCE>
{"guidance":"...","rationale":"...","stop_signal":false}
</ADVISOR_GUIDANCE>
Executor completion:
<EXECUTOR_DONE>
{"status":"completed","summary":"..."}
</EXECUTOR_DONE>
A run is marked completed only when the executor emits EXECUTOR_DONE. If the executor exits without a consultation request and without EXECUTOR_DONE, the harness records executor_stopped_without_done.
Runtime state is local and gitignored:
runs/<run_id>/mailbox/*.jsonlmemory/*.jsonlmemory/advisor_runs.db
Important run files:
memory/advisor_runs.db: queryable source of truth for persisted runs, events, turns, consults, guidance, memory proposals, malformed blocks, and outcomesversion_manifest.json: prompt, memory schema, and policy file version/hash metadata captured at run startsession_events.jsonl: durable cross-session event logadvisor_consults.jsonl: executor consultation requestsadvisor_guidance.jsonl: advisor guidance returned to executorexecutor_turn_<n>.*: executor CLI outputsadvisor_turn_<n>.*: advisor CLI outputsoutcome.json: run status and countersevaluation_summary.json: evaluation metrics and scenario results formaa evalrelease_readiness_evaluation.json: vertical acceptance result and report metrics formaa release-readinessimprovement_proposals.json: validated post-run proposals for memory schema and prompt improvementsruns/ui/index.html: generated local run timeline dashboard
The outcome.json file includes the executor session id, executor turn count, advisor consultation count, guidance count, and completion status.
python3 -m unittest discover -s tests