Skip to content

LeHungViet/scroll

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“œ SCROLL β€” a harness for agentic loops

Build AI agents as folders, not code. A filesystem-native harness: an agent is a directory of markdown, the LLM does the work, a thin CLI coordinates. No engine, no database, no runtime lock-in. Runs the same locally and in the cloud β€” and gives you the outer loop (loop engineering) so an agent finds its own work, does it, verifies it, and remembers it, without you prompting each turn.

npm i -g @agentpro/scroll
scroll new researcher                 # scaffold a compliant agent (L1-ready: hard-rules + 3 gold evals)
scroll build researcher               # render it for Cowork / Codex / Gemini / voice
scroll run  researcher --task "…"     # inner loop: one harnessed run β€” capped, gated, checkpointed
scroll loop LOOP.md                   # outer loop: schedule + find work + stop conditions

MIT Β· Node β‰₯ 20 Β· Spec v1.4 Β· @agentpro/scroll 0.8.0 Β· works with Claude Β· OpenAI/Codex Β· Gemini

GitHub topics: loop-engineering Β· agent-harness Β· agentic Β· agentic-loops Β· ai-agents Β· autonomous-agents Β· llm Β· mcp


Why SCROLL

Most agent frameworks make you import a heavy runtime (graphs, schedulers, vector DBs) and lock you to one model vendor. SCROLL bets the opposite way:

Heavy frameworks SCROLL
An agent is… an object in code a folder of markdown
State lives in… a database / session the filesystem + git
Orchestration the framework runtime a declarative DAG you advance deterministically
Safety hope the model behaves risk-tiered gates + grounding, enforced in code
Model locked to one vendor vendor-neutral β€” one definition, any runtime
The whole framework a dependency tree a convention + one small CLI

The thesis in one line: SCROLL doesn't try to make the model smarter β€” it makes the model's built-in weaknesses unable to cause harm. Every mechanism below compensates for a specific model failure mode.

Mechanism The model weakness it neutralizes
Risk-tiered permission matrix the model doesn't understand consequences
Grounding pre-check hallucination (inventing ids / amounts)
Tool-boundary enforcement sycophancy β€” it can't be trusted to stop itself
External eval (gold cases) biased self-assessment
Comprehension digest the user's passive over-delegation

Implementation status (2026-06-21 Β· v1.4)

Built & tested (test/ β€” 50 checks green: 9 core + 14 runtime + 16 v1.4 + 11 installers):

  • Inner loop scroll run β€” deterministic DAG advance from WORK.md, single controller, cost-gate, append-only blackboard/, hard caps + circuit breaker, stable-prefix caching, model routing, deterministic non-LLM steps, verify-before-done, per-tick checkpoint, events.jsonl.
  • Risk-tiered permissions β€” five tiers (read_only Β· reversible_write Β· external_comm Β· financial Β· destructive), enforced at the action boundary in code (not the prompt). Risk is declared as data in .mcp.json / IDENTITY.security, never read from prose.
  • Grounding pre-check β€” before a financial/destructive action, every critical parameter must trace to a real prior source; a fabricated id/amount never executes.
  • Outer loop scroll loop + LOOP.md β€” schedule, work-source, and stop conditions for self-directed runs. A loop with no stop condition is refused.
  • Comprehension digest β€” every run writes a human-readable digest.md from the event stream (no model call), language-aware.
  • scroll eval β€” gold cases run N times β†’ machine checks + consistency (pass^k) + optional LLM-judge + hash-bound record. (built β€” earlier docs called this roadmap; it ships.)
  • Per-language token budgeting β€” scroll cost / the cost gate scale by language (Vietnamese β‰ˆ 1.8Γ— English).
  • Worktree-lite isolation β€” parallel tasks get isolated working dirs (real git worktree with --worktree on a repo) so two agents never corrupt shared state.
  • Crash-resume β€” scroll run --resume restores completed tasks from the checkpoint and finishes the rest without re-billing.
  • scroll audit β€” convention scan + hash-bound compliance report (CI; --verify).
  • Extension installers β€” scroll mcp add (wire an MCP server/connector into .mcp.json, with ${vault:KEY} credentials + per-tool risk tiers), scroll skill add (install or scaffold a SKILL.md skill, attach it to an agent), scroll plugin add (unpack an agent-pack bundle: agents + skills + merged .mcp.json).

Providers: mock (offline) Β· codex Β· claude Β· openai Β· anthropic Β· gemini. On a case-gated live A/B eval the runtime cut billable tokens 50–88% and latency 50–75% vs an unstructured baseline, at equal-or-better quality.

Roadmap: AG-UI / OpenTelemetry event emission Β· storage adapters (S3/GCS/repo) Β· registry-backed discovery for mcp add.


Quickstart

1. Install β€” npm i -g @agentpro/scroll

2. Create an agent β€” scroll new scaffolds a correct, L1-ready folder:

agents/atlas/
β”œβ”€β”€ IDENTITY.md      # who it is + security (risk tiers) β€” machine-readable frontmatter
β”œβ”€β”€ SOUL.md          # how it thinks (you write this part)
β”œβ”€β”€ TOOLS.md         # what it can use
β”œβ”€β”€ hard-rules.md    # rules it must never break
β”œβ”€β”€ memory/          # what it remembers
└── evals/           # 3 gold cases β€” graded by `scroll eval`

3. Build, run, loop:

scroll check atlas             # validate structure (pre-commit + CI)
scroll build atlas             # β†’ Cowork / Codex / Gemini / Claude-subagent / A2A
scroll run   atlas --task "…"  # inner loop: deterministic, capped, gated, checkpointed
scroll loop  LOOP.md           # outer loop: scheduled, self-directed, with stop conditions

Inner loop vs. outer loop (loop engineering)

The shift the field is naming loop engineering is from writing prompts to designing the system that prompts the agent. SCROLL gives you both halves explicitly:

  • scroll run β€” the inner loop. One harnessed execution of a WORK.md: advance the DAG, act inside a step, verify, checkpoint.
  • LOOP.md + scroll loop β€” the outer loop. Defines when to start an inner loop (schedule), where it finds work (work_source), and when to stop (stop_conditions β€” required). It rides your host scheduler (cron / scheduled-tasks); there's no daemon engine.
# LOOP.md
id: market-watch-daily
controller: lead
schedule:       { cron: "0 7 * * *" }
work_source:    { type: work_file, query: WORK.md }
stop_conditions:{ max_runs_per_day: 4, budget_usd_per_day: 5, halt_on: [gate_denied, verify_fail] }
digest: required

The outer loop never escalates privilege β€” every action inside still passes the permission matrix below β€” and writes a digest after every run.


Safety: risk tiers + grounding, enforced in code

A model is sycophantic and prompt-injectable, so it can't be trusted to stop before a dangerous action. SCROLL puts the gate in deterministic code, in front of the action.

Five tiers, lowest β†’ highest consequence, each mapped to a policy:

tier default policy
read_only auto
reversible_write auto + log
external_comm soft-hold
financial must-approve + grounding
destructive must-approve + grounding

Risk is data, not prose β€” declared per-tool in .mcp.json or per-namespace in IDENTITY.security (a label a model can read is just another prompt it can be talked out of):

# IDENTITY.md
security:
  risk_defaults: { fs.read: read_only, shell.exec: destructive, mcp.*: external_comm }
  approval:      { financial: must-approve, destructive: must-approve }

must-approve blocks until an approval file appears under control/approvals/. Grounding runs first for financial/destructive: every declared parameter (a product code, an amount, a customer id) must trace to a real source seen earlier in the run β€” otherwise the action is denied and never runs, even if "approved." An invented invoice can't be issued.


Multi-agent, the safe way

Agents coordinate through files, not a message bus:

  • WORK.md β€” the task chain as a declarative DAG. One controller owns it (this prevents the #1 multi-agent failure: infinite handoff loops). A deterministic runner advances it; the LLM only acts inside a step.
  • blackboard/ β€” a shared space where agents post discoveries (one append-only file each).
  • Worktree-lite isolation β€” parallel tasks write to isolated dirs, so two agents never collide.

Default is single-agent. Multi-agent is opt-in and only when a task is genuinely parallel β€” it costs ~15Γ— the tokens, so SCROLL estimates cost (language-aware) before it spawns.


Watch, control, and read back

runs/<id>/events.jsonl   # run_started Β· permission_decision Β· grounding_checked Β· cost_update Β· completed …
runs/<id>/digest.md      # human-readable summary β€” what it did, which dangerous tiers it touched, cost
  • Stream the events (tail locally, or SSE/WebSocket in the cloud; types map to AG-UI).
  • Control a live run by writing files: control/pause, control/stop, control/steer.md, or approve a gate under control/approvals/.
  • Read the digest β€” so the owner understands the run instead of just trusting it (an antidote to passive over-delegation).

Compliance is structural, not trust-based

SCROLL doesn't rely on you reading the docs β€” the correct path is the only one that works: a scaffold that's correct by default, a JSON schema, a linter (scroll check) in CI, a build that refuses invalid agents, an AGENTS.md that teaches AI coding tools the rules, and scroll audit β€” a hash-bound report so a "pass" is tied to the exact file state and can't be faked.


CLI reference

Command Does Status
scroll new <name> Scaffold an L1-ready agent folder βœ… built
scroll check <name> Validate structure (linter; pre-commit + CI) βœ… built
scroll build <name> Render one source β†’ every runtime βœ… built
scroll run --work <f> Inner loop over a WORK.md (or run <agent> --task) βœ… built
scroll loop <LOOP.md> Outer loop β€” schedule + find work + stop conditions βœ… built
scroll eval <name> Gold cases NΓ— β†’ machine checks + consistency + judge βœ… built
scroll audit [name] Conventions + hash-bound report (CI; --verify) βœ… built
scroll registry Scan agents β†’ a config/observability view βœ… built
scroll cost <task> [--language vi] Single vs multi token estimate (language-aware) βœ… built
scroll mcp add <name> Wire an MCP server / connector into .mcp.json (${vault:KEY} creds Β· per-tool risk) βœ… built
scroll skill add <ref> Install / scaffold a SKILL.md skill (--agent to attach) βœ… built
scroll plugin add <ref> Unpack an agent-pack bundle (agents + skills + merged .mcp.json) βœ… built

Useful run flags: --resume (crash-resume), --worktree (git-worktree isolation), --risk <tier>, --auto-approve, --language <code>, --max-usd / --max-iterations.


Packaging

  • @agentpro/scroll β€” the CLI + loader (npm).
  • @agentpro/scroll-schema β€” the versioned spec + JSON Schema.
  • Storage adapters β€” @agentpro/scroll-s3, -gcs, -repo (roadmap).

Your agents are never packaged β€” they're folders you own and version, like AGENTS.md.

Learn more

  • SPEC.md β€” the formal agent-folder + frontmatter spec (incl. Β§21 permissions, Β§22 grounding, Β§23 LOOP)
  • AGENTS.md β€” how AI coding tools follow SCROLL
  • templates/ β€” a worked agent + a WORK.md + a LOOP.md

License

MIT Β© 2026 Agent Pro. Use it, fork it, ship it β€” just keep the copyright notice.


SCROLL β€” a harness for agentic loops. Turn every AI subscription your team pays for into coordinated, governed, portable agents. Built by Agent Pro.

About

SCROLL - a harness for agentic loops. Build AI agents as folders, not code.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors