An agent-first runtime where AI agents are native processes, capabilities replace permissions, and the system is designed for autonomy — not human interaction.
The long-term vision: An agent-native Linux distribution — think CoreOS for agents. Upstream kernel, curated userland, agentd as a first-class service, capability tokens enforced via Linux primitives (namespaces, seccomp-BPF, Landlock, cgroups v2). The AgentServices trait is a substrate-agnostic ABI: process-backed today, MicroVM-per-agent (Firecracker/Kata/gVisor) later for harder tenant isolation, microkernel (seL4/Redox) only if a customer demands formally verified boundaries. The programming model is the product; the substrate is replaceable.
What exists today: A working agent runtime on Linux. 7 Rust crates, ~13,000 lines, 220+ tests. Runs autonomously in Docker — a Bootstrap Agent receives a goal, spawns specialized child agents, and produces output with zero human intervention. The system has designed its own features and audited its own security — both for pennies.
Agent frameworks bolt orchestration onto existing runtimes. aaOS takes the opposite approach: build the runtime around agents from the ground up.
- Agents are processes. They have lifecycles, registries, schedulers, and supervisors — managed by the runtime, not by application code.
- Capabilities replace permissions. No ambient authority. Unforgeable tokens granted at spawn, validated on every operation, narrowable but never escalatable. An agent with
file_write: /data/output/*cannot write to/etc/— not as a policy, but as a runtime guarantee. - Communication is typed. MCP messages (JSON-RPC 2.0) with schema validation replace raw byte pipes. Everything is parseable, validatable, and auditable.
- Inference is a schedulable resource. LLM API calls are managed by the runtime — with concurrency limits, budgets, and provider abstraction.
aaOS runs as a daemon (agentd) on Linux, isolated in Docker. The AgentServices trait is a substrate-agnostic ABI — today implemented with Linux processes, tomorrow with MicroVMs, maybe someday with a microkernel if formally-verified isolation ever becomes the gating requirement.
What's implemented and tested:
- Self-bootstrapping agent swarms — A Bootstrap Agent (DeepSeek Reasoner) receives a goal, analyzes it, spawns specialized child agents (DeepSeek Chat) with narrowed capabilities, coordinates their work, and produces output. All autonomous. Runs in Docker with
agentdas PID 1. Multi-provider: works with DeepSeek, Anthropic, or any OpenAI-compatible API. - Persistent goal queue — The Bootstrap Agent runs as a persistent process, accepting goals via Unix socket. Container stays alive between tasks.
- Capability-based security — Unforgeable tokens, zero-permission default, two-level enforcement (tool access + resource path). Parent agents can only delegate capabilities they hold — "you can only give what you have." Path normalization prevents traversal attacks; child tokens inherit parent constraints. Tokens are revocable at runtime.
- Agent orchestration — Parent spawns children with narrowed capabilities. Spawn depth limit (5), agent count limit (100). Failed children are retried once automatically. Parallel spawning via
spawn_agentsbatch tool runs up to 3 independent children concurrently (tunable viaAAOS_SPAWN_AGENTS_BATCH_CAP) — wall-clock time is the slowest child, not the sum. - Persistent agents — Long-running agents with background message loops, request-response IPC via
send_and_wait(), conversation persistence in JSONL - Managed context windows — Runtime transparently summarizes old messages via LLM when the context fills, archives originals to disk. Agents see coherent conversations without hitting token limits.
- Episodic memory — Per-agent persistent memory via
memory_store/memory_query/memory_deletetools. Semantic search via cosine similarity over embeddings. SQLite-backed for persistence across container restarts (AAOS_MEMORY_DB), falls back to in-memory if unset. - Workspace isolation — Each goal gets its own workspace directory. Child agents write intermediate files there.
- Inference scheduling — Semaphore-based concurrency limiter prevents API stampedes when multiple agents run simultaneously. Configurable max concurrent calls per provider.
- Per-agent token budgets — Agents declare token limits in their manifest. The runtime enforces budgets via atomic tracking in
report_usage(). Exceeded agents get stopped. Optional — no budget means no enforcement. - Audit trail — 22 event kinds, streamed as JSON-lines to stdout for container observability
- Verbose agent logging — Full agent thoughts, tool calls with arguments, and tool results streamed to stdout. Live dashboard shows agent activity in real-time.
- Structured IPC — MCP-native message routing with capability validation, request-response via pending-response map
- Self-designing capability — Agents can read the mounted aaOS source code at
/src/and produce working Rust implementations. The OS has designed its own budget enforcement system. - Self-auditing security — The system performed a security audit of itself (1.37M tokens, $0.05), found a real path traversal vulnerability in
glob_matchesthat had been present since Phase A, and produced a hardening plan. The vulnerability was fixed based on the audit findings. A later self-reflection run (Run 9) extended that fix by finding a symlink bypass in the same code —glob_matchesresolved..lexically but didn't canonicalize symlinks, so a link like/data/project -> /etclet a/data/*grant reach/etc/passwd. Closed by canonicalizing requested paths against the real filesystem before matching. - Iterative self-improvement — Ten self-reflection runs to date. Runs 1-3 found real runtime bugs (path traversal → missing revocation → unenforced constraints). Run 4 produced a feature proposal (Meta-Cognitive Coordination Layer for cross-run learning) that shipped as a minimal version after review. Run 5 was the first to exercise the persistent-memory protocol end-to-end and produced three manifest-only tuning fixes. Run 6 surfaced two kernel-level gaps (soft rules aren't enforcement; no structured child-to-child data channel) and shipped as two kernel fixes: stable-identity gate on private memory and
prior_findingshandoff. Run 7 validated those fixes against real behavior with a 4-child peer-review chain and zero capability denials. Run 8 measured Phase 1 speed work (~14 min vs Run 7b's ~29 min, ~50% reduction beating the 35-45% target) and exercisedfile_read_manyin production. Run 9 used an adversarial bug-hunting prompt and found seven real bugs — registry spawn ordering, silent session-store failures, unbounded audit log, symlink capability bypass, and others — all peer-reviewed by a second model (Copilot/GPT-5.4) before implementation. Full run-by-run chronicle indocs/reflection-log.md; cross-cutting lessons indocs/patterns.md. Cumulative spend since the Anthropic→DeepSeek switch (per dashboard):$1.00 through Run 10 ($1.16 all-in with earlier Anthropic runs) — ten self-reflection runs to date.
[Runtime Prototype] Agent lifecycle, capabilities, tools ✅ Phase A
|
[Persistent Agents] Long-running agents, request-response IPC ✅ Phase B
|
[Agent Memory] Managed context windows, episodic store ✅ Phase C
|
[Self-Bootstrapping] Autonomous agent swarms in Docker ✅ Phase D
|
[Multi-Provider LLM] DeepSeek, inference scheduling, budgets ✅ Phase E
|
[Security] Self-audit, path traversal fix, revocation ✅ Done
|
[AgentSkills] Open standard, 21 skills, progressive disclosure ✅ Done
|
[Self-Reflection] System reads own code, proposes features ✅ Done <-- you are here
|
[Agent-Native Linux] Distribution, systemd service, Landlock Next
|
[Isolation Ladder] MicroVM-per-agent via Firecracker/Kata Research
|
[Microkernel] seL4/Redox backend — only if demand exists Optional
The AgentServices trait is the bridge between runtime and kernel. The Tool trait defines tool integration. The manifest format defines agent bundles. When the kernel migration happens, everything above changes implementation — not interface. Agent manifests, tools, and orchestration logic work identically.
See Roadmap for details on each phase.
+---------------------------------------------+
| Human Supervision Layer | Approval queue, audit trail
+---------------------------------------------+
| Orchestration Layer | spawn_agent, capability narrowing
+---------------------------------------------+
| Tool & Service Layer | 11 tools, capability-checked, schema-validated
+---------------------------------------------+
| Agent Memory Layer | Context windows, episodic store, embeddings
+---------------------------------------------+
| Agent Runtime Core | Process model, registry, tokens, IPC router
+---------------------------------------------+
| Linux + Docker | Host OS today; target is a hardened Linux distribution (Phase F)
+---------------------------------------------+
7 Rust crates:
- aaos-core — Types, traits, capability model, audit events (21 kinds), budget tracking
- aaos-runtime — Agent process lifecycle, registry, scheduling, context window management
- aaos-ipc — MCP message router with capability validation, request-response IPC
- aaos-tools — Tool registry, invocation, 12 built-in tools (including memory + skill_read + file_list + file_read_many batch + spawn_agents parallel)
- aaos-llm — LLM clients (Anthropic + OpenAI-compat), execution loop, inference scheduler
- aaos-memory — Episodic memory store, embedding source, cosine similarity search
- agentd — Daemon binary, Unix socket API, approval queue
Agents are declared bundles:
name: research-agent
model: deepseek-chat
system_prompt: "You are a helpful research assistant with persistent memory."
lifecycle: persistent
capabilities:
- web_search
- "file_read: /data/project/*"
- "file_write: /data/output/*"
- "tool: web_fetch"
- "tool: file_write"
- "tool: memory_store"
- "tool: memory_query"
approval_required:
- file_write
memory:
context_window: "128k"
max_history_messages: 200
episodic_enabled: true
budget_config:
max_tokens: 1000000 # 1M tokens
reset_period_seconds: 86400 # daily resetRequires Docker and a DeepSeek API key (or Anthropic API key as fallback).
# Clone
git clone https://github.com/Joncik91/aaOS.git
cd aaOS
# Run with live dashboard (builds image automatically on first run)
DEEPSEEK_API_KEY="sk-..." ./run-aaos.sh "Fetch https://news.ycombinator.com and write a summary of the top 5 stories to /output/summary.txt"
# Check the output
cat output/summary.txtThe launcher starts the container and opens a live dashboard in a separate terminal showing agent activity in real-time. Ctrl+C stops everything.
The Bootstrap Agent (DeepSeek Reasoner) analyzes the goal, spawns specialized child agents (DeepSeek Chat), coordinates their work, and writes the output. Total cost: ~$0.02. Falls back to Anthropic if ANTHROPIC_API_KEY is set instead.
The source code is mounted read-only at /src/ inside the container, so agents can read and understand the codebase when given code-related goals.
By default, every container starts with a fresh Bootstrap identity and empty memory — the same behavior the first four self-reflection runs used. To let the Bootstrap Agent accumulate lessons across restarts:
AAOS_PERSISTENT_MEMORY=1 DEEPSEEK_API_KEY="sk-..." ./run-aaos.sh "your goal"This bind-mounts ./memory/ into the container. The Bootstrap ID is persisted at /var/lib/aaos/bootstrap_id (overridable via AAOS_BOOTSTRAP_ID). The manifest instructs Bootstrap to memory_query before decomposing a goal and memory_store a compact run summary after completing one. To wipe persistent state, launch once with AAOS_RESET_MEMORY=1.
Persistent memory carries real risk: prompt-injected content and bad strategies become durable. The feature is opt-in and reset is one env var away.
To send additional goals to the running container:
# The container stays alive and accepts goals via Unix socket
echo '{"jsonrpc":"2.0","id":1,"method":"agent.run","params":{
"agent_id":"<bootstrap-agent-id>",
"message":"Fetch https://lobste.rs and summarize the top 3 stories to /output/lobsters.txt"
}}' | python3 -c "import socket,sys,json; s=socket.socket(socket.AF_UNIX); s.connect('/tmp/aaos-sock/agentd.sock'); s.sendall((sys.stdin.read()+'\n').encode()); print(s.recv(4096).decode())"JSON-RPC 2.0 over Unix socket.
| Method | Description |
|---|---|
agent.spawn |
Spawn an agent from a YAML manifest |
agent.stop |
Stop a running agent |
agent.list |
List all running agents |
agent.status |
Get status of a specific agent |
agent.run |
Run an existing agent with a message |
agent.spawn_and_run |
Spawn and run in one call |
tool.list |
List registered tools |
tool.invoke |
Invoke a tool on behalf of an agent |
approval.list |
List pending approval requests |
approval.respond |
Approve or deny a pending request |
| Tool | Capability | Description |
|---|---|---|
echo |
tool: echo |
Returns input (testing) |
web_fetch |
WebSearch |
HTTP GET a URL |
file_read |
FileRead { path_glob } |
Read file, path-checked |
file_list |
FileRead { path_glob } |
List directory contents — use before guessing filenames |
file_read_many |
FileRead { path_glob } (per file) |
Batch-read up to 16 files in parallel; partial failures OK |
file_write |
FileWrite { path_glob } |
Write file, path-checked |
spawn_agent |
SpawnChild { allowed_agents } |
Spawn child with narrowed capabilities |
spawn_agents |
SpawnChild { allowed_agents } (per child) |
Spawn up to 3 independent children concurrently; best-effort per-child, wall-clock = slowest child |
memory_store |
tool: memory_store |
Store a fact/observation/decision/preference |
memory_query |
tool: memory_query |
Semantic search over stored memories |
memory_delete |
tool: memory_delete |
Delete a stored memory by ID |
skill_read |
tool: skill_read |
Load skill instructions or reference files |
aaOS supports the AgentSkills open standard by Anthropic. Skills are folders with a SKILL.md file that teach agents specialized workflows. The same skills that work in Claude Code, Copilot CLI, Gemini CLI, and OpenCode work in aaOS — but under capability-based security enforcement.
21 bundled skills from addyosmani/agent-skills:
spec-driven-development · test-driven-development · incremental-implementation · planning-and-task-breakdown · code-review-and-quality · security-and-hardening · debugging-and-error-recovery · api-and-interface-design · frontend-ui-engineering · performance-optimization · git-workflow-and-versioning · ci-cd-and-automation · shipping-and-launch · documentation-and-adrs · code-simplification · context-engineering · deprecation-and-migration · idea-refine · source-driven-development · browser-testing-with-devtools · using-agent-skills
Progressive disclosure: Agents see the skill catalog (~100 tokens each) in their system prompt at startup. When a task matches a skill, the agent calls skill_read to load full instructions on demand. Reference files load only when needed.
Add your own: Drop a folder with a SKILL.md into .agents/skills/ or set AAOS_SKILLS_DIR.
- Agent-Native, Human-Optional — The runtime boots into an agent process. Humans provide goals, not instructions.
- Capability-Based Security — No ambient authority. Unforgeable tokens replace permissions.
- Structured Communication — Typed MCP messages, not raw byte pipes.
- Observable by Default — Every action logged as a runtime guarantee.
- Substrate-Agnostic Abstractions —
AgentServicesis an ABI, not a kernel API. Today: Linux processes with capability wrappers. Next: hardened Linux distribution. Later: MicroVM-per-agent if tenant isolation demands it. Microkernel only if a customer demands formally-verified boundaries.
- Architecture — Layer details and design decisions
- Roadmap — Phase-by-phase path from runtime to real kernel
- Build Retrospective — Phase-by-phase build history (A through E)
- Self-Reflection Log — Runs where aaOS reads its own code and proposes changes
- Patterns — Cross-cutting lessons distilled from the retrospective and reflection log
- Ideas — Things we considered and deferred, with the signal that would prompt reconsideration
- Distribution Architecture — The agent-native Linux distribution target: components, capability enforcement via Linux primitives, packaging, migration from today's Docker-only deployment