Open-source loop engineering for auditable business briefings. Formerly MABW — Multi-Agent Brief Workflow.
Current version: v0.9.0
Public framing: BriefLoop / MABW compatibility period
Current CLI: multi-agent-brief
Current Claude commands: /mabw ...
When someone asks where a number came from, BriefLoop does not ask the model to improvise an explanation. It opens the ledger.
BriefLoop turns AI-assisted business briefing into a governed loop: source packs, claim ledgers, quality gates, human decisions, structured findings, scoped repairs, regression cases, and release records. It is not a prompt that makes AI write faster; it is a process-accountability harness for recurring business briefings.
The v0.9.0 preview introduces the BriefLoop public name while keeping MABW as the implementation lineage and compatibility surface. The runtime commands, Python package, workspace format, artifact names, and MABW-080 experiment IDs are unchanged.
The core claim is deliberately narrow: traceability, not semantic proof yet. Important claims link to registered source entries with source, date, and gate metadata. That tells you where a claim entered the workflow; it does not yet prove the source semantically supports each sub-claim. We published a failure study where exactly that boundary was exposed by an external reviewer, because accountability applies to this project too.
Install From Source — macOS / Linux
git clone https://github.com/Stahl-G/briefloop.git
cd briefloop
bash scripts/setup.shInstall From Source — Windows PowerShell
Windows does not require WSL or Git Bash. PowerShell is the recommended Windows path.
winget install Python.Python.3.12
git clone https://github.com/Stahl-G/briefloop.git
cd briefloop
.\scripts\setup.ps1
.\.venv\Scripts\Activate.ps1
multi-agent-brief versionIf PowerShell blocks script execution:
powershell -ExecutionPolicy Bypass -File .\scripts\setup.ps1Create Your First Brief
multi-agent-brief onboard
multi-agent-brief init ~/mabw-workspace --from-onboarding onboarding.json
multi-agent-brief run --workspace ~/mabw-workspaceOptional: Inspect The Demo
bash scripts/demo.sh
bash scripts/demo-deep-dive.shThe demo is for reviewers and visitors who want to inspect the evidence chain on synthetic materials. Real use starts with onboarding and a workspace created for your own brief.
Advanced Windows installer: irm https://raw.githubusercontent.com/Stahl-G/briefloop/main/scripts/install.ps1 | iex exists, but is currently an Experimental CLI-only installer asset. The default path is source clone plus scripts/setup.ps1.
1,000+ deterministic tests run in CI with zero LLM calls.
I'm a management trainee at a manufacturer. Two months into the job, I was writing real weekly briefings for executives and doing what everyone does: orchestrating a carefully prompted role chain for finding, screening, analysis, editing, and audit.
The deterministic parts never broke. Everything entrusted to prompts eventually did. Facts mutated in handoffs. Weak sources became confident conclusions. The system needed so much human re-verification that it was no better than one long prompt.
The lesson MABW is built on: if a rule actually matters, it cannot live in a prompt. It has to become a schema, a validator, a gate, a transaction, an event log, or a test.
The smart parts have no authority. The authoritative parts are deterministic. Nothing takes effect without a human. Nothing passes a human without a record.
Concretely: Python owns state and ledgers and never calls an LLM. The LLM runtime owns drafts and executes content work through a contract handoff. Humans own approval. One writer per field: no module gets to "helpfully" update state it does not own, because shared writers are how audit trails and rollbacks die.
A delivered brief is just two files: output/delivery/brief.md and a .docx. Everything else a run produces exists so the brief can be questioned later. Synthetic excerpts below use fictional entities and show structure only.
output/delivery/brief.md
## 2. Market Updates
This week, the sample PV module spot price fell 1.8% week over week,
the third consecutive weekly decline. Company N announced Phase I of
its sample-state factory started production, with planned capacity of 2 GW...output/intermediate/claim_ledger.json: the registered entry behind that number:
{
"claim_id": "CL-0012",
"statement": "The sample module spot price fell 1.8% week over week.",
"source_id": "SRC-003",
"source_date": "2026-06-05",
"support": "supported"
}output/intermediate/gates/auditor_quality_gate_report.json and output/intermediate/gates/finalize_quality_gate_report.json: deterministic checks that can block audit completion or delivery. output/intermediate/quality_gate_report.json is a latest/legacy projection. There is no force flag.
{
"gate_id": "freshness",
"status": "pass",
"findings": []
}In a contract-following run, important numbers in the delivered brief link back to registered source entries. Stale sources and unsupported numbers are surfaced by gates and audit instead of silently entering the final document. The full execution trace lives in event_log.jsonl.
The writer-facing mental model is not "28 control surfaces." Each run keeps four practical questions answerable:
| Question | What it records | Where you look |
|---|---|---|
| Where is this run? | Current stage, missing artifacts, blockers, next safe action | /mabw status, workflow_state.json |
| Where did each number come from? | Claim Ledger entries, source dates, stage-scoped gate findings | claim_ledger.json, gates/*_quality_gate_report.json, source_appendix.md |
| What has it learned? | Human-approved reader preferences only; unapproved suggestions never take effect | improvement/ledger.jsonl (append-only, hash-chained, revertible) |
| What is guarding delivery? | Stage-completion transactions, reader-final gate, delivery checks | finalize_report.json, state finalize-complete |
It observes and it proposes. But only what you approve is remembered: in a ledger you can open, audit, and undo.
- Public solar integration run (v0.7.2): Improvement Memory materialization, gate execution, and control-plane closure on public materials. It is an integration reference, not a causal claim about output quality.
- Organoid-industry failure study (v0.7.4): a real external research task where an external reviewer caught semantic mismatches the gates passed by design. Includes a five-error taxonomy of how each mistake entered the pipeline. This is the honest current boundary of the system.
- BriefLoop-090 A-controlled auditable-brief pilot: one public-safe synthetic case with condition-blind, hash-bound
auditable_briefassessment. In this case, memory showed the approved guidance without obvious harm, while prompt-only over-applied the same guidance. It is not a general output-quality claim. - v0.9.0 preview notes: BriefLoop public naming, MABW compatibility rules, and the v0.9 support-sufficiency direction. The MABW-080 operator sequence remains documented in the MABW-080 experiment guide.
We can say precisely which ledger line each error entered through. That is what the system is for, and it is also why we publish the failure analysis.
BriefLoop is not an autonomous agent. It does not auto-edit brief content, does not auto-learn, has no long-term memory system, and is not an investment-advice tool or a replacement for human review.
The current first-class writer path is Claude Code. Hermes is supported as a delegated / scheduled runtime. OpenCode, Codex, and manual entrypoints exist, but they are not yet end-to-end validated as the primary writer path.
Coding agents improved fast because their loop has infrastructure: tests, CI, git history, code review. Business briefings have none of that. A junior analyst gets corrected verbally and the correction evaporates; the next hire repeats the mistake. A stale number slips into a brief and nobody can say at which step. The work is important, repetitive, and structurally unable to get better.
BriefLoop moves that same machinery: auditability, structured feedback, human gating, execution traces, into a domain with no clean reward signal, where the human is the reward channel and deterministic gates build the reward surface.
Install the writer entrypoint and use five verbs inside Claude Code:
source .venv/bin/activate
multi-agent-brief claude install --repo-workdir ./mabw new
/mabw run <workspace>
/mabw status <workspace> # strictly read-only
/mabw feedback <workspace> # recorded immediately; takes effect only after approval
/mabw deliver <workspace> # always human-triggered, gated, no force flag
Three on-ramps, one spine. There is no lite mode: entry cost drops, the accountability spine does not. Claim Ledger, gates, human delivery, event trace, and frozen snapshots stay present.
| Path | Time | What you do |
|---|---|---|
| Look once | ~5 min | Read the reference runs, run the demos |
| Run once | ~30 min | A few local text files, no search backend, new -> run -> status -> deliver |
| Live with it | weekly | Configured sources, feedback loop, approved preferences |
Full paths: Claude Code quickstart · golden path · weekly use · onboarding · search backends · MABW-080 experiment guide · docs index · roadmap · red lines and anti-patterns
I build and use BriefLoop as part of my actual weekly briefing work at a listed manufacturer, in a role that touches strategy and investor relations. Nothing from my employer enters this repository: no data, no documents, no non-public information. What crosses over is discipline, not data: patterns rewritten from memory in vocabulary that holds for any company. And where this project makes guarantees, they are written as mechanisms: schemas, gates, transactions, tests, not as promises.
The runtime orchestrator is named after the office in the Chinese ritual-music tradition responsible for keeping ensembles in time and in order. It does not write; it dispatches the specialist roles and holds them to their contracts. In the default topology, Scout also performs screening while keeping screened candidates as a separate artifact; strict topology keeps Screener independent. Not a strict historical reconstruction: a project term for the thing that maintains tempo, boundaries, and delivery discipline.
v0.8: measurement, fast-rerun, role topology, and evaluation — timing projection, same-evidence reruns, default/strict topology choices, and controlled experiment tooling without weakening accountable artifacts.
v0.9: support sufficiency and brief-loop engineering. Minimum path: Atomic Claim Graph -> Evidence Span Registry -> Claim-Support Matrix. Later v0.9.x candidates include semantic assessment proposals, human adjudication, coverage/omission gates, semantic regression, release eligibility, quality packs, and finding-to-repair workflows.
v1.0: frozen schemas and CLI surface, compatibility policy, and threat model.
This project needs real scenarios more than it needs features. If you write recurring briefings in strategy, equity research, IR, policy tracking, or similar work and want to run your real workflow through it, open an issue or discussion. If you research agent evaluation and want a dogfooded process-accountability system with run data, that is also welcome.
Start with a good first issue. Read red lines and anti-patterns first.
MIT