Skip to content

Commit 466613b

Browse files
committed
feat: release 2.4.0 distributed runtime architecture upgrade
Add distributed runtime building blocks (agent pool, model router, speculative planner, HITL manager, and distributed controller/worker runtime wrappers), integrate them into runtime executor/task flow, and bump package/changelog for v2.4.0.
1 parent 34d7e97 commit 466613b

22 files changed

Lines changed: 1418 additions & 5 deletions

CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [2.4.0] — 2026-03-31
11+
12+
### Added
13+
14+
- **Distributed runtime extension layer** — Added `devsper.distributed` package with controller and worker-runtime wrappers for clean orchestration composition.
15+
- **Agent pool** — Added `runtime/agent_pool.py` with reusable, worker-aware concurrent agent leasing (`acquire_agent`, `release_agent`, `run_agent`, `run_parallel`).
16+
- **Model router (runtime layer)** — Added `runtime/model_router.py` for planning/reasoning/validation model routing with fallback chains.
17+
- **Speculative planner (runtime layer)** — Added `runtime/speculative_planner.py` to predict and stage speculative successor tasks.
18+
- **Runtime HITL manager** — Added `runtime/hitl.py` for pause/resume and explicit human-input request lifecycle handling.
19+
- **Execution graph upgrades** — Extended runtime execution graph with worker assignment metadata and distributed lineage tracking.
20+
21+
### Changed
22+
23+
- **Executor composition** — Runtime executor now integrates AgentPool, ModelRouter, speculative planning hooks, and distributed assignment tracking while preserving default behavior.
24+
- **Task runner routing** — Task execution now supports pool-backed agent execution, scoped model routing, and fallback chain handling.
25+
- **Tool execution control** — Runtime ToolRunner now supports bounded concurrency, dependency-aware batching, per-call timeout, cancellation, and structured results.
26+
- **Event backpressure handling** — Runtime event stream now uses bounded queues with controlled overflow behavior.
27+
- **Runtime docs** — Updated runtime architecture docs in `devsper/README.md` with distributed flow, worker/controller lifecycle, and HITL/speculative execution details.
28+
1029
## [2.3.0] — 2026-03-30
1130

1231
### Added

devsper/README.md

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,148 @@
1+
# Devsper Runtime Core
2+
3+
Devsper runtime is built as a modular orchestration core with bounded concurrency, dynamic DAG mutation, and event-stream-driven observability.
4+
5+
## Runtime Architecture Diagram
6+
7+
```text
8+
User Task
9+
|
10+
v
11+
Swarm.run()
12+
|
13+
v
14+
Planner -> Scheduler (DAG)
15+
|
16+
v
17+
Runtime Executor ------------------------------+
18+
| |
19+
+-> RuntimeStateManager |
20+
+-> ExecutionGraph |
21+
+-> RuntimeEventStream -> EventLog -> SSE |
22+
+-> TaskRunner |
23+
+-> AgentRunner (optional stream-tool loop)
24+
+-> Agent
25+
+-> ToolRunner / tools.tool_runner
26+
```
27+
28+
## Core Components
29+
30+
- `swarm/swarm.py`
31+
- Public entrypoint (`Swarm`), config/wiring, planning and execution bootstrap.
32+
- `runtime/executor.py`
33+
- Event-driven scheduler loop, bounded parallel execution, cancellation propagation, dynamic task injection.
34+
- `runtime/state_manager.py`
35+
- Concurrency-safe task state transitions and runtime DAG mutation entrypoint.
36+
- `runtime/execution_graph.py`
37+
- Execution graph with lineage, edges, attempts, and status transitions.
38+
- `runtime/planner.py`
39+
- Runtime wrapper for dynamic planner expansion and parent-child lineage.
40+
- `runtime/task_runner.py`
41+
- Task lifecycle orchestration with scoped retries and fallback model handling.
42+
- `runtime/agent_runner.py`
43+
- Async agent wrapper with optional streaming tool invocation loop.
44+
- `runtime/tool_runner.py`
45+
- Parallel tool scheduler with bounded concurrency, dependency-aware batching, timeout, and cancellation.
46+
- `runtime/event_stream.py`
47+
- In-process stream with queue backpressure policy (drop-oldest on overflow).
48+
- `runtime/retry.py`
49+
- Retry scopes (`tool`, `agent`, `task`, `model_fallback`) and backoff policies.
50+
51+
## Execution Lifecycle
52+
53+
1. `Swarm.run()` creates root task, plans subtasks, and builds scheduler DAG.
54+
2. Executor emits `EXECUTOR_STARTED` and begins bounded task dispatch.
55+
3. Ready tasks transition to running through `RuntimeStateManager`.
56+
4. `TaskRunner` executes with scoped retries.
57+
5. `AgentRunner` may invoke tools (normal or streaming tool loop).
58+
6. Task completion/failure updates scheduler + execution graph.
59+
7. Adaptive mode can inject follow-up tasks at runtime.
60+
8. Executor emits `EXECUTOR_FINISHED` and `RUN_COMPLETED`.
61+
62+
## Tool Calling Flow
63+
64+
1. Agent determines tool intent from model output.
65+
2. Tool calls are parsed and scheduled.
66+
3. `ToolRunner.run_many(...)` executes calls in parallel with:
67+
- `max_concurrency` semaphore
68+
- dependency constraints (`depends_on`)
69+
- per-call timeouts
70+
- cancellation checks
71+
4. Results are isolated per call and fed back into agent loop.
72+
73+
## Planner Flow
74+
75+
1. Initial decomposition by `swarm/planner.py`.
76+
2. Runtime execution completes a task.
77+
3. `RuntimePlanner.expand(...)` optionally creates follow-up tasks.
78+
4. `RuntimeStateManager.add_tasks(...)` mutates DAG safely.
79+
5. `ExecutionGraph` records lineage from parent task.
80+
81+
## Concurrency Model
82+
83+
- Bounded executor workers (`worker_count`).
84+
- Bounded tool concurrency (`ToolRunner` semaphore).
85+
- Queue backpressure in event streaming (`max_queue_size` with controlled dropping).
86+
- Cooperative pause/resume and cancellation propagation across runtime loops.
87+
- Lock-guarded scheduler mutations via `RuntimeStateManager`.
88+
89+
## Distributed Runtime Architecture
90+
91+
```text
92+
Controller
93+
|
94+
v
95+
Worker Pool
96+
|
97+
v
98+
Worker Runtime
99+
|
100+
v
101+
Runtime Executor
102+
|
103+
v
104+
Agent Pool
105+
|
106+
v
107+
Tool Runner
108+
```
109+
110+
- `distributed/controller.py`
111+
- Worker registry, health state, load-aware assignment, reassignment hooks.
112+
- `distributed/worker_runtime.py`
113+
- Worker-local runtime composition: `Executor`, `AgentPool`, `ModelRouter`, `ToolRunner`.
114+
- Existing multi-node transport/control remains in `nodes/controller.py` and `nodes/worker.py`.
115+
116+
## Worker Lifecycle
117+
118+
1. Worker registers with controller.
119+
2. Controller assigns tasks based on health and load.
120+
3. Worker executes task locally through runtime executor stack.
121+
4. Worker returns completion/failure.
122+
5. Controller updates worker/task state and reassigns on failure.
123+
124+
## Controller Lifecycle
125+
126+
1. Track worker registration and health.
127+
2. Assign ready tasks using load-aware strategy.
128+
3. Handle failures and perform retry/reassignment.
129+
4. Maintain global execution progress via event/log channels.
130+
131+
## Agent Pool
132+
133+
- `AgentPool` manages reusable agent instances per worker.
134+
- Supports `acquire_agent`, `release_agent`, `run_agent`, and `run_parallel`.
135+
- Enables worker-local, concurrent, reuse-based execution.
136+
137+
## Speculative Execution
138+
139+
- `SpeculativePlanner` predicts likely successor tasks.
140+
- Executor marks/schedules speculative tasks early.
141+
- Unused speculative branches can be cancelled on dependency failure.
142+
143+
## HITL Flow
144+
145+
1. Agent indicates human input is required.
146+
2. Runtime emits HITL/clarification event.
147+
3. Task pauses until response or timeout.
148+
4. Execution resumes or fails based on response policy.

devsper/agents/agent.py

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -454,11 +454,24 @@ def _format_tools_section(tools: list | None = None) -> str:
454454
from devsper.tools.registry import list_tools
455455

456456
tools = list_tools()
457+
max_props = 8
457458
lines = []
458459
for t in tools:
459460
schema = getattr(t, "input_schema", None) or getattr(t, "schema", {}) or {}
461+
schema_type = schema.get("type", "object")
462+
required = schema.get("required", []) or []
463+
props = schema.get("properties", {}) if isinstance(schema.get("properties", {}), dict) else {}
464+
compact_props: dict[str, dict] = {}
465+
for key in list(props.keys())[:max_props]:
466+
p = props.get(key, {}) or {}
467+
compact_props[key] = {"type": p.get("type", "string")}
468+
compact_schema = {
469+
"type": schema_type,
470+
"required": required[:max_props],
471+
"properties": compact_props,
472+
}
460473
lines.append(f"- {t.name}: {t.description}")
461-
lines.append(f" input_schema: {json.dumps(schema)}")
474+
lines.append(f" input_schema: {json.dumps(compact_schema, separators=(',', ':'))}")
462475
return "\n".join(lines)
463476

464477

devsper/distributed/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
"""Distributed runtime wrappers for controller/worker orchestration."""
2+
3+
from devsper.distributed.controller import DistributedController
4+
from devsper.distributed.worker_runtime import WorkerRuntime
5+
6+
__all__ = ["DistributedController", "WorkerRuntime"]
7+

devsper/distributed/controller.py

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
from __future__ import annotations
2+
3+
import asyncio
4+
from dataclasses import dataclass
5+
6+
from devsper.cluster.router import TaskRouter
7+
from devsper.types.task import Task
8+
9+
10+
@dataclass
11+
class WorkerState:
12+
worker_id: str
13+
healthy: bool = True
14+
active_tasks: int = 0
15+
max_workers: int = 1
16+
17+
18+
class DistributedController:
19+
"""Controller-side worker orchestration and task assignment."""
20+
21+
def __init__(self, router: TaskRouter | None = None) -> None:
22+
self._router = router or TaskRouter()
23+
self._workers: dict[str, WorkerState] = {}
24+
self._task_assignments: dict[str, str] = {}
25+
self._lock = asyncio.Lock()
26+
27+
async def register_worker(self, worker_id: str, max_workers: int = 1) -> None:
28+
async with self._lock:
29+
self._workers[worker_id] = WorkerState(
30+
worker_id=worker_id,
31+
healthy=True,
32+
active_tasks=0,
33+
max_workers=max(1, int(max_workers)),
34+
)
35+
36+
async def mark_worker_unhealthy(self, worker_id: str) -> None:
37+
async with self._lock:
38+
if worker_id in self._workers:
39+
self._workers[worker_id].healthy = False
40+
41+
async def health_check(self) -> dict[str, bool]:
42+
async with self._lock:
43+
return {wid: ws.healthy for wid, ws in self._workers.items()}
44+
45+
async def assign_task(self, task: Task) -> str | None:
46+
async with self._lock:
47+
candidates = [
48+
ws
49+
for ws in self._workers.values()
50+
if ws.healthy and ws.active_tasks < ws.max_workers
51+
]
52+
if not candidates:
53+
return None
54+
# Lightweight balancing by active load.
55+
candidates.sort(key=lambda ws: (ws.active_tasks / max(1, ws.max_workers), ws.worker_id))
56+
chosen = candidates[0]
57+
chosen.active_tasks += 1
58+
self._task_assignments[task.id] = chosen.worker_id
59+
return chosen.worker_id
60+
61+
async def complete_task(self, task_id: str) -> None:
62+
async with self._lock:
63+
wid = self._task_assignments.pop(task_id, None)
64+
if wid and wid in self._workers:
65+
self._workers[wid].active_tasks = max(0, self._workers[wid].active_tasks - 1)
66+
67+
async def reassign_on_failure(self, task: Task, failed_worker_id: str) -> str | None:
68+
await self.mark_worker_unhealthy(failed_worker_id)
69+
return await self.assign_task(task)
70+
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
from __future__ import annotations
2+
3+
from dataclasses import dataclass
4+
5+
from devsper.agents.agent import Agent
6+
from devsper.runtime.agent_pool import AgentPool
7+
from devsper.runtime.executor import Executor
8+
from devsper.runtime.model_router import ModelRouter
9+
from devsper.runtime.tool_runner import ToolRunner
10+
from devsper.swarm.scheduler import Scheduler
11+
from devsper.utils.event_logger import EventLog
12+
13+
14+
@dataclass
15+
class WorkerRuntime:
16+
"""Worker-local runtime for agent/tool execution."""
17+
18+
scheduler: Scheduler
19+
agent: Agent
20+
event_log: EventLog
21+
worker_id: str = "worker-local"
22+
max_agents: int = 4
23+
tool_concurrency: int = 4
24+
25+
def __post_init__(self) -> None:
26+
self.agent_pool = AgentPool(lambda: self.agent, max_agents=self.max_agents)
27+
self.model_router = ModelRouter(
28+
planning_model=getattr(self.agent, "model_name", "mock"),
29+
reasoning_model=getattr(self.agent, "model_name", "mock"),
30+
validation_model=getattr(self.agent, "model_name", "mock"),
31+
)
32+
self.tool_runner = ToolRunner(parallelism=self.tool_concurrency)
33+
self.executor = Executor(
34+
scheduler=self.scheduler,
35+
agent=self.agent,
36+
event_log=self.event_log,
37+
worker_count=self.max_agents,
38+
)
39+
40+
async def run_task_queue(self) -> dict[str, str]:
41+
await self.executor.run()
42+
return self.scheduler.get_results()
43+

devsper/runtime/__init__.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,30 @@
33
from devsper.runtime.replay import replay_execution
44
from devsper.runtime.telemetry import collect_telemetry, print_telemetry_summary
55
from devsper.runtime.visualize import visualize_scheduler_dag
6+
from devsper.runtime.executor import Executor
7+
from devsper.runtime.state_manager import RuntimeStateManager
8+
from devsper.runtime.task_runner import TaskRunner
9+
from devsper.runtime.agent_runner import AgentRunner
10+
from devsper.runtime.tool_runner import ToolRunner
11+
from devsper.runtime.execution_graph import ExecutionGraph
12+
from devsper.runtime.planner import RuntimePlanner
13+
from devsper.runtime.agent_pool import AgentPool
14+
from devsper.runtime.model_router import ModelRouter
15+
from devsper.runtime.speculative_planner import SpeculativePlanner
16+
from devsper.runtime.hitl import HITLManager
617

718
__all__ = [
19+
"Executor",
20+
"RuntimeStateManager",
21+
"TaskRunner",
22+
"AgentRunner",
23+
"ToolRunner",
24+
"ExecutionGraph",
25+
"RuntimePlanner",
26+
"AgentPool",
27+
"ModelRouter",
28+
"SpeculativePlanner",
29+
"HITLManager",
830
"replay_execution",
931
"collect_telemetry",
1032
"print_telemetry_summary",

0 commit comments

Comments
 (0)