Build AI agents as folders, not code. A filesystem-native harness: an agent is a directory of markdown, the LLM does the work, a thin CLI coordinates. No engine, no database, no runtime lock-in. Runs the same locally and in the cloud β and gives you the outer loop (loop engineering) so an agent finds its own work, does it, verifies it, and remembers it, without you prompting each turn.
npm i -g @agentpro/scroll
scroll new researcher # scaffold a compliant agent (L1-ready: hard-rules + 3 gold evals)
scroll build researcher # render it for Cowork / Codex / Gemini / voice
scroll run researcher --task "β¦" # inner loop: one harnessed run β capped, gated, checkpointed
scroll loop LOOP.md # outer loop: schedule + find work + stop conditionsMIT Β· Node β₯ 20 Β· Spec v1.4 Β· @agentpro/scroll 0.8.0 Β· works with Claude Β· OpenAI/Codex Β· Gemini
GitHub topics:
loop-engineeringΒ·agent-harnessΒ·agenticΒ·agentic-loopsΒ·ai-agentsΒ·autonomous-agentsΒ·llmΒ·mcp
Most agent frameworks make you import a heavy runtime (graphs, schedulers, vector DBs) and lock you to one model vendor. SCROLL bets the opposite way:
| Heavy frameworks | SCROLL | |
|---|---|---|
| An agent is⦠| an object in code | a folder of markdown |
| State lives in⦠| a database / session | the filesystem + git |
| Orchestration | the framework runtime | a declarative DAG you advance deterministically |
| Safety | hope the model behaves | risk-tiered gates + grounding, enforced in code |
| Model | locked to one vendor | vendor-neutral β one definition, any runtime |
| The whole framework | a dependency tree | a convention + one small CLI |
The thesis in one line: SCROLL doesn't try to make the model smarter β it makes the model's built-in weaknesses unable to cause harm. Every mechanism below compensates for a specific model failure mode.
| Mechanism | The model weakness it neutralizes |
|---|---|
| Risk-tiered permission matrix | the model doesn't understand consequences |
| Grounding pre-check | hallucination (inventing ids / amounts) |
| Tool-boundary enforcement | sycophancy β it can't be trusted to stop itself |
| External eval (gold cases) | biased self-assessment |
| Comprehension digest | the user's passive over-delegation |
Built & tested (test/ β 50 checks green: 9 core + 14 runtime + 16 v1.4 + 11 installers):
- Inner loop
scroll runβ deterministic DAG advance fromWORK.md, single controller, cost-gate, append-onlyblackboard/, hard caps + circuit breaker, stable-prefix caching, model routing, deterministic non-LLM steps, verify-before-done, per-tick checkpoint,events.jsonl. - Risk-tiered permissions β five tiers (
read_only Β· reversible_write Β· external_comm Β· financial Β· destructive), enforced at the action boundary in code (not the prompt). Risk is declared as data in.mcp.json/IDENTITY.security, never read from prose. - Grounding pre-check β before a
financial/destructiveaction, every critical parameter must trace to a real prior source; a fabricated id/amount never executes. - Outer loop
scroll loop+LOOP.mdβ schedule, work-source, and stop conditions for self-directed runs. A loop with no stop condition is refused. - Comprehension digest β every run writes a human-readable
digest.mdfrom the event stream (no model call), language-aware. scroll evalβ gold cases run N times β machine checks + consistency (pass^k) + optional LLM-judge + hash-bound record. (built β earlier docs called this roadmap; it ships.)- Per-language token budgeting β
scroll cost/ the cost gate scale by language (Vietnamese β 1.8Γ English). - Worktree-lite isolation β parallel tasks get isolated working dirs (real
git worktreewith--worktreeon a repo) so two agents never corrupt shared state. - Crash-resume β
scroll run --resumerestores completed tasks from the checkpoint and finishes the rest without re-billing. scroll auditβ convention scan + hash-bound compliance report (CI;--verify).- Extension installers β
scroll mcp add(wire an MCP server/connector into.mcp.json, with${vault:KEY}credentials + per-tool risk tiers),scroll skill add(install or scaffold aSKILL.mdskill, attach it to an agent),scroll plugin add(unpack an agent-pack bundle: agents + skills + merged.mcp.json).
Providers: mock (offline) Β· codex Β· claude Β· openai Β· anthropic Β· gemini. On a case-gated live A/B eval the runtime cut billable tokens 50β88% and latency 50β75% vs an unstructured baseline, at equal-or-better quality.
Roadmap: AG-UI / OpenTelemetry event emission Β· storage adapters (S3/GCS/repo) Β· registry-backed discovery for mcp add.
1. Install β npm i -g @agentpro/scroll
2. Create an agent β scroll new scaffolds a correct, L1-ready folder:
agents/atlas/
βββ IDENTITY.md # who it is + security (risk tiers) β machine-readable frontmatter
βββ SOUL.md # how it thinks (you write this part)
βββ TOOLS.md # what it can use
βββ hard-rules.md # rules it must never break
βββ memory/ # what it remembers
βββ evals/ # 3 gold cases β graded by `scroll eval`
3. Build, run, loop:
scroll check atlas # validate structure (pre-commit + CI)
scroll build atlas # β Cowork / Codex / Gemini / Claude-subagent / A2A
scroll run atlas --task "β¦" # inner loop: deterministic, capped, gated, checkpointed
scroll loop LOOP.md # outer loop: scheduled, self-directed, with stop conditionsThe shift the field is naming loop engineering is from writing prompts to designing the system that prompts the agent. SCROLL gives you both halves explicitly:
scroll runβ the inner loop. One harnessed execution of aWORK.md: advance the DAG, act inside a step, verify, checkpoint.LOOP.md+scroll loopβ the outer loop. Defines when to start an inner loop (schedule), where it finds work (work_source), and when to stop (stop_conditionsβ required). It rides your host scheduler (cron / scheduled-tasks); there's no daemon engine.
# LOOP.md
id: market-watch-daily
controller: lead
schedule: { cron: "0 7 * * *" }
work_source: { type: work_file, query: WORK.md }
stop_conditions:{ max_runs_per_day: 4, budget_usd_per_day: 5, halt_on: [gate_denied, verify_fail] }
digest: requiredThe outer loop never escalates privilege β every action inside still passes the permission matrix below β and writes a digest after every run.
A model is sycophantic and prompt-injectable, so it can't be trusted to stop before a dangerous action. SCROLL puts the gate in deterministic code, in front of the action.
Five tiers, lowest β highest consequence, each mapped to a policy:
| tier | default policy |
|---|---|
read_only |
auto |
reversible_write |
auto + log |
external_comm |
soft-hold |
financial |
must-approve + grounding |
destructive |
must-approve + grounding |
Risk is data, not prose β declared per-tool in .mcp.json or per-namespace in IDENTITY.security (a label a model can read is just another prompt it can be talked out of):
# IDENTITY.md
security:
risk_defaults: { fs.read: read_only, shell.exec: destructive, mcp.*: external_comm }
approval: { financial: must-approve, destructive: must-approve }must-approve blocks until an approval file appears under control/approvals/. Grounding runs first for financial/destructive: every declared parameter (a product code, an amount, a customer id) must trace to a real source seen earlier in the run β otherwise the action is denied and never runs, even if "approved." An invented invoice can't be issued.
Agents coordinate through files, not a message bus:
WORK.mdβ the task chain as a declarative DAG. One controller owns it (this prevents the #1 multi-agent failure: infinite handoff loops). A deterministic runner advances it; the LLM only acts inside a step.blackboard/β a shared space where agents post discoveries (one append-only file each).- Worktree-lite isolation β parallel tasks write to isolated dirs, so two agents never collide.
Default is single-agent. Multi-agent is opt-in and only when a task is genuinely parallel β it costs ~15Γ the tokens, so SCROLL estimates cost (language-aware) before it spawns.
runs/<id>/events.jsonl # run_started Β· permission_decision Β· grounding_checked Β· cost_update Β· completed β¦
runs/<id>/digest.md # human-readable summary β what it did, which dangerous tiers it touched, cost
- Stream the events (tail locally, or SSE/WebSocket in the cloud; types map to AG-UI).
- Control a live run by writing files:
control/pause,control/stop,control/steer.md, or approve a gate undercontrol/approvals/. - Read the digest β so the owner understands the run instead of just trusting it (an antidote to passive over-delegation).
SCROLL doesn't rely on you reading the docs β the correct path is the only one that works: a scaffold that's correct by default, a JSON schema, a linter (scroll check) in CI, a build that refuses invalid agents, an AGENTS.md that teaches AI coding tools the rules, and scroll audit β a hash-bound report so a "pass" is tied to the exact file state and can't be faked.
| Command | Does | Status |
|---|---|---|
scroll new <name> |
Scaffold an L1-ready agent folder | β built |
scroll check <name> |
Validate structure (linter; pre-commit + CI) | β built |
scroll build <name> |
Render one source β every runtime | β built |
scroll run --work <f> |
Inner loop over a WORK.md (or run <agent> --task) |
β built |
scroll loop <LOOP.md> |
Outer loop β schedule + find work + stop conditions | β built |
scroll eval <name> |
Gold cases NΓ β machine checks + consistency + judge | β built |
scroll audit [name] |
Conventions + hash-bound report (CI; --verify) |
β built |
scroll registry |
Scan agents β a config/observability view | β built |
scroll cost <task> [--language vi] |
Single vs multi token estimate (language-aware) | β built |
scroll mcp add <name> |
Wire an MCP server / connector into .mcp.json (${vault:KEY} creds Β· per-tool risk) |
β built |
scroll skill add <ref> |
Install / scaffold a SKILL.md skill (--agent to attach) |
β built |
scroll plugin add <ref> |
Unpack an agent-pack bundle (agents + skills + merged .mcp.json) |
β built |
Useful run flags: --resume (crash-resume), --worktree (git-worktree isolation), --risk <tier>, --auto-approve, --language <code>, --max-usd / --max-iterations.
@agentpro/scrollβ the CLI + loader (npm).@agentpro/scroll-schemaβ the versioned spec + JSON Schema.- Storage adapters β
@agentpro/scroll-s3,-gcs,-repo(roadmap).
Your agents are never packaged β they're folders you own and version, like AGENTS.md.
SPEC.mdβ the formal agent-folder + frontmatter spec (incl. Β§21 permissions, Β§22 grounding, Β§23 LOOP)AGENTS.mdβ how AI coding tools follow SCROLLtemplates/β a worked agent + aWORK.md+ aLOOP.md
MIT Β© 2026 Agent Pro. Use it, fork it, ship it β just keep the copyright notice.
SCROLL β a harness for agentic loops. Turn every AI subscription your team pays for into coordinated, governed, portable agents. Built by Agent Pro.