devsper-com
diff --git a/‎CHANGELOG.md‎
Lines changed: 19 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 19 additions & 0 deletions
diff --git a/‎devsper/README.md‎
Lines changed: 148 additions & 0 deletions b/‎devsper/README.md‎
Lines changed: 148 additions & 0 deletions
diff --git a/‎devsper/agents/agent.py‎
Lines changed: 14 additions & 1 deletion b/‎devsper/agents/agent.py‎
Lines changed: 14 additions & 1 deletion
diff --git a/‎devsper/distributed/__init__.py‎
Lines changed: 7 additions & 0 deletions b/‎devsper/distributed/__init__.py‎
Lines changed: 7 additions & 0 deletions
diff --git a/‎devsper/distributed/controller.py‎
Lines changed: 70 additions & 0 deletions b/‎devsper/distributed/controller.py‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎devsper/distributed/worker_runtime.py‎
Lines changed: 43 additions & 0 deletions b/‎devsper/distributed/worker_runtime.py‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎devsper/runtime/__init__.py‎
Lines changed: 22 additions & 0 deletions b/‎devsper/runtime/__init__.py‎
Lines changed: 22 additions & 0 deletions
@@ -7,6 +7,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [2.4.0] — 2026-03-31
+
+### Added
+
+- **Distributed runtime extension layer** — Added `devsper.distributed` package with controller and worker-runtime wrappers for clean orchestration composition.
+- **Agent pool** — Added `runtime/agent_pool.py` with reusable, worker-aware concurrent agent leasing (`acquire_agent`, `release_agent`, `run_agent`, `run_parallel`).
+- **Model router (runtime layer)** — Added `runtime/model_router.py` for planning/reasoning/validation model routing with fallback chains.
+- **Speculative planner (runtime layer)** — Added `runtime/speculative_planner.py` to predict and stage speculative successor tasks.
+- **Runtime HITL manager** — Added `runtime/hitl.py` for pause/resume and explicit human-input request lifecycle handling.
+- **Execution graph upgrades** — Extended runtime execution graph with worker assignment metadata and distributed lineage tracking.
+
+### Changed
+
+- **Executor composition** — Runtime executor now integrates AgentPool, ModelRouter, speculative planning hooks, and distributed assignment tracking while preserving default behavior.
+- **Task runner routing** — Task execution now supports pool-backed agent execution, scoped model routing, and fallback chain handling.
+- **Tool execution control** — Runtime ToolRunner now supports bounded concurrency, dependency-aware batching, per-call timeout, cancellation, and structured results.
+- **Event backpressure handling** — Runtime event stream now uses bounded queues with controlled overflow behavior.
+- **Runtime docs** — Updated runtime architecture docs in `devsper/README.md` with distributed flow, worker/controller lifecycle, and HITL/speculative execution details.
+
 ## [2.3.0] — 2026-03-30
 
 ### Added
 
@@ -0,0 +1,148 @@
+# Devsper Runtime Core
+
+Devsper runtime is built as a modular orchestration core with bounded concurrency, dynamic DAG mutation, and event-stream-driven observability.
+
+## Runtime Architecture Diagram
+
+```text
+User Task
+   |
+   v
+Swarm.run()
+   |
+   v
+Planner -> Scheduler (DAG)
+   |
+   v
+Runtime Executor ------------------------------+
+   |                                           |
+   +-> RuntimeStateManager                     |
+   +-> ExecutionGraph                          |
+   +-> RuntimeEventStream -> EventLog -> SSE  |
+   +-> TaskRunner                              |
+         +-> AgentRunner (optional stream-tool loop)
+               +-> Agent
+               +-> ToolRunner / tools.tool_runner
+```
+
+## Core Components
+
+- `swarm/swarm.py`
+  - Public entrypoint (`Swarm`), config/wiring, planning and execution bootstrap.
+- `runtime/executor.py`
+  - Event-driven scheduler loop, bounded parallel execution, cancellation propagation, dynamic task injection.
+- `runtime/state_manager.py`
+  - Concurrency-safe task state transitions and runtime DAG mutation entrypoint.
+- `runtime/execution_graph.py`
+  - Execution graph with lineage, edges, attempts, and status transitions.
+- `runtime/planner.py`
+  - Runtime wrapper for dynamic planner expansion and parent-child lineage.
+- `runtime/task_runner.py`
+  - Task lifecycle orchestration with scoped retries and fallback model handling.
+- `runtime/agent_runner.py`
+  - Async agent wrapper with optional streaming tool invocation loop.
+- `runtime/tool_runner.py`
+  - Parallel tool scheduler with bounded concurrency, dependency-aware batching, timeout, and cancellation.
+- `runtime/event_stream.py`
+  - In-process stream with queue backpressure policy (drop-oldest on overflow).
+- `runtime/retry.py`
+  - Retry scopes (`tool`, `agent`, `task`, `model_fallback`) and backoff policies.
+
+## Execution Lifecycle
+
+1. `Swarm.run()` creates root task, plans subtasks, and builds scheduler DAG.
+2. Executor emits `EXECUTOR_STARTED` and begins bounded task dispatch.
+3. Ready tasks transition to running through `RuntimeStateManager`.
+4. `TaskRunner` executes with scoped retries.
+5. `AgentRunner` may invoke tools (normal or streaming tool loop).
+6. Task completion/failure updates scheduler + execution graph.
+7. Adaptive mode can inject follow-up tasks at runtime.
+8. Executor emits `EXECUTOR_FINISHED` and `RUN_COMPLETED`.
+
+## Tool Calling Flow
+
+1. Agent determines tool intent from model output.
+2. Tool calls are parsed and scheduled.
+3. `ToolRunner.run_many(...)` executes calls in parallel with:
+   - `max_concurrency` semaphore
+   - dependency constraints (`depends_on`)
+   - per-call timeouts
+   - cancellation checks
+4. Results are isolated per call and fed back into agent loop.
+
+## Planner Flow
+
+1. Initial decomposition by `swarm/planner.py`.
+2. Runtime execution completes a task.
+3. `RuntimePlanner.expand(...)` optionally creates follow-up tasks.
+4. `RuntimeStateManager.add_tasks(...)` mutates DAG safely.
+5. `ExecutionGraph` records lineage from parent task.
+
+## Concurrency Model
+
+- Bounded executor workers (`worker_count`).
+- Bounded tool concurrency (`ToolRunner` semaphore).
+- Queue backpressure in event streaming (`max_queue_size` with controlled dropping).
+- Cooperative pause/resume and cancellation propagation across runtime loops.
+- Lock-guarded scheduler mutations via `RuntimeStateManager`.
+
+## Distributed Runtime Architecture
+
+```text
+Controller
+   |
+   v
+Worker Pool
+   |
+   v
+Worker Runtime
+   |
+   v
+Runtime Executor
+   |
+   v
+Agent Pool
+   |
+   v
+Tool Runner
+```
+
+- `distributed/controller.py`
+  - Worker registry, health state, load-aware assignment, reassignment hooks.
+- `distributed/worker_runtime.py`
+  - Worker-local runtime composition: `Executor`, `AgentPool`, `ModelRouter`, `ToolRunner`.
+- Existing multi-node transport/control remains in `nodes/controller.py` and `nodes/worker.py`.
+
+## Worker Lifecycle
+
+1. Worker registers with controller.
+2. Controller assigns tasks based on health and load.
+3. Worker executes task locally through runtime executor stack.
+4. Worker returns completion/failure.
+5. Controller updates worker/task state and reassigns on failure.
+
+## Controller Lifecycle
+
+1. Track worker registration and health.
+2. Assign ready tasks using load-aware strategy.
+3. Handle failures and perform retry/reassignment.
+4. Maintain global execution progress via event/log channels.
+
+## Agent Pool
+
+- `AgentPool` manages reusable agent instances per worker.
+- Supports `acquire_agent`, `release_agent`, `run_agent`, and `run_parallel`.
+- Enables worker-local, concurrent, reuse-based execution.
+
+## Speculative Execution
+
+- `SpeculativePlanner` predicts likely successor tasks.
+- Executor marks/schedules speculative tasks early.
+- Unused speculative branches can be cancelled on dependency failure.
+
+## HITL Flow
+
+1. Agent indicates human input is required.
+2. Runtime emits HITL/clarification event.
+3. Task pauses until response or timeout.
+4. Execution resumes or fails based on response policy.
@@ -454,11 +454,24 @@ def _format_tools_section(tools: list | None = None) -> str:
         from devsper.tools.registry import list_tools
 
         tools = list_tools()
+    max_props = 8
     lines = []
     for t in tools:
         schema = getattr(t, "input_schema", None) or getattr(t, "schema", {}) or {}
+        schema_type = schema.get("type", "object")
+        required = schema.get("required", []) or []
+        props = schema.get("properties", {}) if isinstance(schema.get("properties", {}), dict) else {}
+        compact_props: dict[str, dict] = {}
+        for key in list(props.keys())[:max_props]:
+            p = props.get(key, {}) or {}
+            compact_props[key] = {"type": p.get("type", "string")}
+        compact_schema = {
+            "type": schema_type,
+            "required": required[:max_props],
+            "properties": compact_props,
+        }
         lines.append(f"- {t.name}: {t.description}")
-        lines.append(f"  input_schema: {json.dumps(schema)}")
+        lines.append(f"  input_schema: {json.dumps(compact_schema, separators=(',', ':'))}")
     return "\n".join(lines)
 
 
 
@@ -0,0 +1,7 @@
+"""Distributed runtime wrappers for controller/worker orchestration."""
+
+from devsper.distributed.controller import DistributedController
+from devsper.distributed.worker_runtime import WorkerRuntime
+
+__all__ = ["DistributedController", "WorkerRuntime"]
+
@@ -0,0 +1,70 @@
+from __future__ import annotations
+
+import asyncio
+from dataclasses import dataclass
+
+from devsper.cluster.router import TaskRouter
+from devsper.types.task import Task
+
+
+@dataclass
+class WorkerState:
+    worker_id: str
+    healthy: bool = True
+    active_tasks: int = 0
+    max_workers: int = 1
+
+
+class DistributedController:
+    """Controller-side worker orchestration and task assignment."""
+
+    def __init__(self, router: TaskRouter | None = None) -> None:
+        self._router = router or TaskRouter()
+        self._workers: dict[str, WorkerState] = {}
+        self._task_assignments: dict[str, str] = {}
+        self._lock = asyncio.Lock()
+
+    async def register_worker(self, worker_id: str, max_workers: int = 1) -> None:
+        async with self._lock:
+            self._workers[worker_id] = WorkerState(
+                worker_id=worker_id,
+                healthy=True,
+                active_tasks=0,
+                max_workers=max(1, int(max_workers)),
+            )
+
+    async def mark_worker_unhealthy(self, worker_id: str) -> None:
+        async with self._lock:
+            if worker_id in self._workers:
+                self._workers[worker_id].healthy = False
+
+    async def health_check(self) -> dict[str, bool]:
+        async with self._lock:
+            return {wid: ws.healthy for wid, ws in self._workers.items()}
+
+    async def assign_task(self, task: Task) -> str | None:
+        async with self._lock:
+            candidates = [
+                ws
+                for ws in self._workers.values()
+                if ws.healthy and ws.active_tasks < ws.max_workers
+            ]
+            if not candidates:
+                return None
+            # Lightweight balancing by active load.
+            candidates.sort(key=lambda ws: (ws.active_tasks / max(1, ws.max_workers), ws.worker_id))
+            chosen = candidates[0]
+            chosen.active_tasks += 1
+            self._task_assignments[task.id] = chosen.worker_id
+            return chosen.worker_id
+
+    async def complete_task(self, task_id: str) -> None:
+        async with self._lock:
+            wid = self._task_assignments.pop(task_id, None)
+            if wid and wid in self._workers:
+                self._workers[wid].active_tasks = max(0, self._workers[wid].active_tasks - 1)
+
+    async def reassign_on_failure(self, task: Task, failed_worker_id: str) -> str | None:
+        await self.mark_worker_unhealthy(failed_worker_id)
+        return await self.assign_task(task)
+
@@ -0,0 +1,43 @@
+from __future__ import annotations
+
+from dataclasses import dataclass
+
+from devsper.agents.agent import Agent
+from devsper.runtime.agent_pool import AgentPool
+from devsper.runtime.executor import Executor
+from devsper.runtime.model_router import ModelRouter
+from devsper.runtime.tool_runner import ToolRunner
+from devsper.swarm.scheduler import Scheduler
+from devsper.utils.event_logger import EventLog
+
+
+@dataclass
+class WorkerRuntime:
+    """Worker-local runtime for agent/tool execution."""
+
+    scheduler: Scheduler
+    agent: Agent
+    event_log: EventLog
+    worker_id: str = "worker-local"
+    max_agents: int = 4
+    tool_concurrency: int = 4
+
+    def __post_init__(self) -> None:
+        self.agent_pool = AgentPool(lambda: self.agent, max_agents=self.max_agents)
+        self.model_router = ModelRouter(
+            planning_model=getattr(self.agent, "model_name", "mock"),
+            reasoning_model=getattr(self.agent, "model_name", "mock"),
+            validation_model=getattr(self.agent, "model_name", "mock"),
+        )
+        self.tool_runner = ToolRunner(parallelism=self.tool_concurrency)
+        self.executor = Executor(
+            scheduler=self.scheduler,
+            agent=self.agent,
+            event_log=self.event_log,
+            worker_count=self.max_agents,
+        )
+
+    async def run_task_queue(self) -> dict[str, str]:
+        await self.executor.run()
+        return self.scheduler.get_results()
+
@@ -3,8 +3,30 @@
 from devsper.runtime.replay import replay_execution
 from devsper.runtime.telemetry import collect_telemetry, print_telemetry_summary
 from devsper.runtime.visualize import visualize_scheduler_dag
+from devsper.runtime.executor import Executor
+from devsper.runtime.state_manager import RuntimeStateManager
+from devsper.runtime.task_runner import TaskRunner
+from devsper.runtime.agent_runner import AgentRunner
+from devsper.runtime.tool_runner import ToolRunner
+from devsper.runtime.execution_graph import ExecutionGraph
+from devsper.runtime.planner import RuntimePlanner
+from devsper.runtime.agent_pool import AgentPool
+from devsper.runtime.model_router import ModelRouter
+from devsper.runtime.speculative_planner import SpeculativePlanner
+from devsper.runtime.hitl import HITLManager
 
 __all__ = [
+    "Executor",
+    "RuntimeStateManager",
+    "TaskRunner",
+    "AgentRunner",
+    "ToolRunner",
+    "ExecutionGraph",
+    "RuntimePlanner",
+    "AgentPool",
+    "ModelRouter",
+    "SpeculativePlanner",
+    "HITLManager",
     "replay_execution",
     "collect_telemetry",
     "print_telemetry_summary",