LAS: langgraph-agentic-scaffold

Multi-model orchestration with structural safety — an explicit Mixture-of-Experts where the gating is semantic, the experts are heterogeneous, and the kernel enforces invariants they can't bypass.

Status: Active development. Expect breaking changes between alpha releases.

What This Is

LAS is a personal research project that orchestrates multi-model agentic workflows through LangGraph. It routes user intent through specialized agents — each backed by whichever LLM fits the task — while enforcing safety through structure, not trust.

Core thesis: LLMs are unreliable. Every specialist executes inside a safety wrapper. No specialist can unilaterally terminate, corrupt state, or bypass the control plane.

The Computing Metaphor

LAS applies computing primitives to LLM orchestration:

Concept	LAS Equivalent	Purpose
CPU / Scheduler	Router	Turn-by-turn semantic routing decisions
Stack	Context Window	Ephemeral per-invocation working memory
Heap	Artifacts	Persistent, cross-specialist shared state
System Calls	MCP	Synchronous service invocation bypassing Router
Signals	signals field	Asynchronous inter-node communication (replace reducer)
Process delegation	delegate()	Context-isolated subtask with own lifecycle

Specialist routing is a visible, auditable version of what happens implicitly inside a transformer's MoE layers. The Router is the gating function. Specialists are the experts. Every routing decision is observable and loggable. See Agentic Cognates for the full treatment.

How a Request Flows

Every request passes through a staged investment pipeline before reaching a specialist:

User Request → Triage (cost gate) → Systems Architect (plan) → Facilitator (context) → Router (dispatch)

Triage is a cheap ACCEPT/REJECT classifier. It fires before the system invests an expensive SA planning call. If the request is ambiguous, Triage bounces it back for clarification — zero wasted compute.

Systems Architect produces a structured task_plan with execution steps and acceptance criteria. If SA fails, the graph routes to END immediately (fail-fast).

Facilitator is the sole writer to gathered_context. Specialists don't fetch context — they receive it. This is a hard architectural constraint, not a convention. On retry, Facilitator curates prior work into accumulated_work so the specialist sees what it did last time without re-reading its own trace (which causes fabrication loops).

Router selects a specialist from an enum-constrained menu. Token-level enforcement prevents open-weight models from inventing specialist names. Menu filtering adapts as the workflow progresses — planning specialists disappear after context is gathered, forbidden specialists are removed on loop detection.

Autonomous Execution

The Project Director runs ReAct loops via prompt-prix MCP with access to filesystem, terminal, and browser tools. Live progress publishes to the web UI after every tool call.

Stagnation detection pattern-matches on tool call signatures — read(A) → move(A) → read(A) → move(A) triggers at period 2 repeated 3x. This catches real loops, not just "you've been running too long." Max iterations is the fallback, not the primary safety valve.

delegate() spawns a child LAS invocation with its own context window. The child does work, returns a concise result. Parent grows by result size, not work size. A conditioning frame makes honest failure reports as valued as task completion.

Signal-based completion: PD writes completion_signal on all four exit paths (success, error, stagnation, partial). The Exit Interview reads it and resolves in 0ms — no LLM call needed. The signals field uses a replace reducer, so each write is a complete snapshot with no stale flags leaking across retries.

Governed Termination

No specialist can self-declare "done." Termination flows through four stages:

Specialist completes and writes a completion signal
Signal Processor classifies the outcome via a 6-level priority chain (circuit breaker > user abort > max_iterations > stagnation > completion > continuation)
Exit Interview independently verifies by inspecting filesystem and artifacts — it can't see prior state, only final state
End/Archiver assembles the user response and produces an atomic archive (manifest, traces, final state)

Multi-Model

Different models for different roles, all runtime-configurable. Router might be Qwen on an RTX-3090, Project Director might be Gemini Pro, chat progenitors might be Gemini + Claude for adversarial validation. No code changes — just user_settings.yaml.

Adapters for local inference (LM Studio / llama-server) and Gemini. Named server references support distributed inference across multiple GPU machines.

Ecosystem

LAS is the composition layer for independently-developed AI services, each a self-contained project with its own UI and development cycle:

Service	Role
prompt-prix	LLM interface + eval platform. LAS never calls LLMs directly — prompt-prix owns the model boundary via `react_step` MCP.
semantic-chunker	Embedding infrastructure. embeddinggemma-300m (768-d) and NV-Embed-v2 (4096-d). Drift measurement, document classification, variant analysis.
surf-mcp	Browser automation with Fara visual grounding.
local-inference-pool	Multi-GPU compute substrate with JIT-swap guard and least-loaded balancing.
it-tools-mcp	119 IT utility tools for data transformation and analysis.

Repo boundaries are context firewalls. Each service lives in its own repository because AI coding assistants will grow, refactor, and occasionally damage any code within their context window. Separate repos make that structurally impossible — a specialist working in LAS literally cannot see prompt-prix source code. This is a deliberate architectural choice, not organizational preference.

Security Warning

You are granting LLMs direct control over powerful tools. The specialists you configure can execute code, access the filesystem, and make external API calls with your keys.

An agentic system can create feedback loops that amplify a simple misunderstanding over many iterations into irreversible actions.

Always run in a sandboxed environment (Docker container or dedicated VM).

Running It

Prerequisites

Docker and Docker Compose (recommended path)
At least one inference endpoint — a local model server (LM Studio or llama-server) or a Gemini API key
Python 3.12+ (local path only)

Docker (Recommended)

git clone https://github.com/shanevcantwell/langgraph-agentic-scaffold.git
cd langgraph-agentic-scaffold

cp .env.example .env                          # Add API keys — NEVER commit .env
cp proxy/squid.conf.example proxy/squid.conf
cp user_settings.yaml.example user_settings.yaml  # Bind models to specialists

docker compose up --build -d

For local model servers (LM Studio, llama-server), set in .env:

LOCAL_INFERENCE_BASE_URL="http://host.docker.internal:1234/v1"

Interface	URL
V.E.G.A.S. Terminal	`http://localhost:3000`
API	`http://localhost:8000/v1/chat/completions`
API Docs	`http://localhost:8000/docs`

The API is OpenAI-compatible — the full orchestration pipeline (multi-model routing, context curation, governed termination) is exposed behind a standard /v1/chat/completions endpoint. Any client that speaks the OpenAI REST protocol gets the entire pipeline without knowing it's there: Continue, AnythingLLM, the OpenAI SDK, curl, or any tool that targets the OpenAI API format.

Configuration changes: Python code changes are picked up by uvicorn --reload. Dockerfile or pyproject.toml changes require docker compose up --build -d.

Local (Alternative)

./scripts/install.sh
# Edit .env and user_settings.yaml, then:
./scripts/server.sh start

Observability

The observability layer is an independent module (app/src/observability/) that can be developed and tested separately from both task execution and the chat UI. It defines its own FastAPI router, event bus, and active runs registry. The dependency direction is one-way: execution pushes events into observability, never the reverse.

Every workflow produces a timestamped archive in ./logs/archive/:

run_YYYYMMDD_HHMMSS_<hash>.zip
├── manifest.json        # routing_history, timestamps, metadata
├── llm_traces.jsonl     # per-specialist execution with latency
├── final_state.json     # accumulated state at workflow end
└── artifacts/           # all produced artifacts

The V.E.G.A.S. Terminal provides real-time streaming: semantic thought entries, tool-by-tool progress during ReAct loops, delegate() breadcrumbs, and an inspector for prompt replay and scratchpad state. The terminal's frontend is split into observability.js (rendering, polling, headless mode) and chat.js (workflow input) — removing the chat script tag converts it to a pure observability dashboard.

Documentation

Document	Purpose
Architecture Reference	Core concepts, patterns, state model
Flow Examples	Prompt-to-specialist flows with Mermaid diagrams
Configuration Guide	Three-tiered configuration, distributed inference
Web UI Guide	V.E.G.A.S. Terminal layout and data flow
MCP Guide	Model Context Protocol integration
Specialist Briefings	Per-specialist technical deep dives
Developer Docs	Creating specialists, patterns, subgraphs, testing, troubleshooting
Research Directions	Prompt geometry, explicit MoE, context engineering
ADRs	Architectural decisions and design rationale
Changelog	Alpha release history

Research Directions

LAS doubles as a workbench for studying how orchestration-layer interventions shape model behavior without touching weights. See Research Directions for the full treatment: prompt geometry, explicit MoE as interpretability, context engineering as physics.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 911 Commits
.claude		.claude
.github/instructions		.github/instructions
app		app
config		config
docker		docker
docs		docs
external_agents		external_agents
proxy		proxy
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
QUICKSTART.md		QUICKSTART.md
README.md		README.md
config.yaml		config.yaml
docker-compose.yml		docker-compose.yml
log_config.yaml		log_config.yaml
pyproject.toml		pyproject.toml
start.sh		start.sh
tool_competence_tests_results.json		tool_competence_tests_results.json
user_settings.yaml.example		user_settings.yaml.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LAS: langgraph-agentic-scaffold

What This Is

The Computing Metaphor

How a Request Flows

Autonomous Execution

Governed Termination

Multi-Model

Ecosystem

Security Warning

Running It

Prerequisites

Docker (Recommended)

Local (Alternative)

Observability

Documentation

Research Directions

License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LAS: langgraph-agentic-scaffold

What This Is

The Computing Metaphor

How a Request Flows

Autonomous Execution

Governed Termination

Multi-Model

Ecosystem

Security Warning

Running It

Prerequisites

Docker (Recommended)

Local (Alternative)

Observability

Documentation

Research Directions

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages