-
Notifications
You must be signed in to change notification settings - Fork 4
Implement MCP and SDK streaming messages #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
3aa35b0
c6cbe72
017780a
796ebf1
69709e5
e248787
462120a
7902c9b
f829fe9
edaf09b
2c298ff
49a6a72
a9ebbf6
d91940b
6964aef
e8aad1c
4b56bbd
a20a46e
70dbce2
8182f2b
e80cfb3
f5a25b8
d0ca4cd
c5d500d
9c07b2e
1acacb5
41f3f37
92ccd1d
5542fc9
57c4a03
a9468fe
86d11b0
7edc5ff
c6c82ab
1e1a172
57796a7
2f3e951
d10557d
3afcd78
56af756
c80ff21
0ff1684
ca4e95f
a46118e
07853d9
56806f4
225eacb
ddf10cd
36bee1d
a0eb377
d6baaba
3c6ee36
106393b
37aa8c7
652f6d3
aa46a62
c3e2bca
afdef7b
4ce1dd8
9533645
daf864c
7730358
f438791
1a54404
666d5ca
f5fc4ac
46bd98f
eb5ba19
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| name: Checks | ||
|
|
||
| on: | ||
| pull_request: | ||
| types: [opened, synchronize, reopened] | ||
| push: | ||
| branches: [main] | ||
|
|
||
| concurrency: | ||
| group: checks-${{ github.ref }} | ||
| cancel-in-progress: true | ||
|
|
||
| jobs: | ||
| lint: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.12" | ||
| - uses: astral-sh/setup-uv@v4 | ||
|
|
||
| - name: Install dependencies | ||
| run: uv sync | ||
|
|
||
| - name: Ruff check | ||
| run: uv run ruff check . | ||
|
|
||
| - name: Ruff format check | ||
| run: uv run ruff format --check . | ||
|
|
||
| - name: Pyright | ||
| run: uv run basedpyright | ||
|
|
||
| test: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.12" | ||
| - uses: astral-sh/setup-uv@v4 | ||
|
|
||
| - name: Install dependencies | ||
| run: uv sync | ||
|
|
||
| - name: Run unit tests | ||
| run: uv run pytest tests/ -v --tb=short | ||
|
|
||
| mcp-test: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
| - uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: "3.12" | ||
| - uses: astral-sh/setup-uv@v4 | ||
|
|
||
| - name: Install MCP dependencies | ||
| working-directory: everyrow-mcp | ||
| run: uv sync | ||
|
|
||
| - name: Run MCP server tests | ||
| working-directory: everyrow-mcp | ||
| run: uv run pytest tests/ -v --tb=short | ||
|
|
||
| shell-tests: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v4 | ||
|
|
||
| - name: Install jq | ||
| run: sudo apt-get install -y jq | ||
|
|
||
| - name: Run status line tests | ||
| run: bash everyrow-mcp/tests/test_statusline.sh | ||
|
|
||
| - name: Run stop guard tests | ||
| run: bash everyrow-mcp/tests/test_hook_stop_guard.sh | ||
|
|
||
| - name: Run results hook tests | ||
| run: bash everyrow-mcp/tests/test_hook_results.sh | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,166 @@ | ||
| # Architecture | ||
|
|
||
| Technical reference for the everyrow SDK, MCP server, and Claude Code plugin. | ||
|
|
||
| ## Overview | ||
|
|
||
| everyrow provides two interfaces: | ||
|
|
||
| 1. **MCP server** — 9 tools exposed directly to Claude Code / Codex. Zero-code, conversational progress updates, results saved as CSV. | ||
| 2. **Python SDK** — `from everyrow.ops import rank, agent_map, ...` for scripts, pipelines, and programmatic control. | ||
|
|
||
| Both share the same backend (Cohort engine) and session/task model. The Claude Code plugin ships both in a single package, and the Skill gives further instructions. | ||
|
|
||
| ## MCP Server | ||
|
|
||
| ### Tools | ||
|
|
||
| The MCP server (`everyrow-mcp/src/everyrow_mcp/server.py`) exposes 9 tools via [FastMCP](https://github.com/jlowin/fastmcp): | ||
|
|
||
| Blocking tools (run to completion, return results inline): | ||
| - `everyrow_screen` — Filter rows by natural language criteria | ||
| - `everyrow_rank` — Score and sort rows | ||
| - `everyrow_dedupe` — Remove semantic duplicates | ||
| - `everyrow_merge` — Join two CSVs by intelligent entity matching | ||
| - `everyrow_agent` — Run web research agents on each row | ||
|
|
||
| Submit/poll tools (for long-running operations): | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not re-use the _async naming pattern of the sdk? submit doesn't seem very descriptive |
||
| - `everyrow_agent_submit` — Submit agent_map, return immediately with `task_id` and `session_url` | ||
| - `everyrow_rank_submit` — Submit rank, return immediately | ||
| - `everyrow_progress` — Poll task status (blocks ~12s server-side, returns progress text) | ||
| - `everyrow_results` — Retrieve completed results, save to CSV | ||
|
|
||
| All tools use `@mcp.tool(structured_output=False)` to suppress FastMCP's `structuredContent` field. Without this, Claude Code displays raw JSON blobs instead of clean text (see [claude-code#9962](https://github.com/anthropics/claude-code/issues/9962)). | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does it matter much what claude code sees? I'd probably optimise MCP user experience for claude desktop/GUI clients. It's possible that they also display it weirdly... |
||
|
|
||
| ### Submit/Poll Pattern | ||
|
|
||
| Long-running operations (agent_map, rank) use a submit/poll pattern because: | ||
| - Operations take 1–10+ minutes | ||
| - LLMs cannot tell time and will hallucinate if asked to wait ([arXiv:2601.13206](https://arxiv.org/abs/2601.13206)) | ||
| - Client-side timeouts (60s in Codex CLI) kill blocking calls | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we used a streaming api, it could start by returning the task id, then if the timeout kills it the client can fallback to fetching/polling with the additional endpoint? that way we smoothly transition from short blockable tasks, to those that just miss out on the timeout, without killing the task if the timeout is hit |
||
|
|
||
| The flow: | ||
|
|
||
| ``` | ||
| 1. everyrow_*_submit → creates session, submits async task, returns task_id + session_url (0.6s) | ||
| 2. everyrow_progress → server blocks 12s, polls engine, returns status text (12-15s per call) | ||
| 3. (repeat step 2) → progress text says "call everyrow_progress again immediately" | ||
| 4. everyrow_results → on completion, fetches data, saves CSV, cleans up | ||
| ``` | ||
|
|
||
| Server-controlled pacing: The `PROGRESS_POLL_DELAY` constant (12s) controls how long `everyrow_progress` blocks before returning. This prevents the agent from burning inference tokens on rapid polling. Combined with ~3s inference overhead, users see updates every ~15s. | ||
|
|
||
| Chaining instructions: The progress tool's response text includes "Immediately call everyrow_progress again" to keep the agent in a tight poll loop. This is critical — without it, Claude tends to stop and ask the user if they want to check again. | ||
|
|
||
| ### Task State Tracking | ||
|
|
||
| The MCP server maintains two forms of state for in-flight tasks: | ||
|
|
||
| In-process: `_active_tasks` dictionary keyed by `task_id`. Stores `session`, `session_ctx`, `client`, `total`, `session_url`, `started_at`, `input_csv`, `prefix`. Cleaned up by `everyrow_results`. | ||
|
|
||
| On-disk: `/tmp/everyrow-task.json` written by `_write_task_state()` on submit and each progress poll. Contains: | ||
|
|
||
| ```json | ||
| { | ||
| "task_id": "abc123", | ||
| "session_url": "https://everyrow.io/sessions/...", | ||
| "total": 50, | ||
| "completed": 23, | ||
| "failed": 1, | ||
| "running": 5, | ||
| "status": "running", | ||
| "started_at": 1707400000.0 | ||
| } | ||
| ``` | ||
|
|
||
| This file is a singleton. Only one task is tracked at a time. It is read by the status line script and hook scripts (see below). The MCP server writes it directly rather than relying on hooks, which avoids the fragile double-escaped JSON parsing required to extract `tool_response` from plugin MCP tools. | ||
|
|
||
| ## SDK Progress Output | ||
|
|
||
| The SDK's `await_task_completion()` function (`src/everyrow/task.py`) polls the engine every 2 seconds and provides progress through three channels: | ||
|
|
||
| stderr (default): Timestamped progress lines: | ||
| ``` | ||
| [11:16:55] Session: https://everyrow.io/sessions/abc123 | ||
| [11:16:55] Starting (50 agents)... | ||
| [11:16:57] [5/50] 10% | 5 running, 0 failed | ||
| [11:17:03] [12/50] 24% | 8 running, 0 failed | ~15s remaining | ||
| ... | ||
| [11:18:20] [50/50] 100% | Done (85.2s total) | ||
| [11:18:20] Results: 49 succeeded, 1 failed | ||
| ``` | ||
|
|
||
| JSONL log (`~/.everyrow/progress.jsonl`): Machine-readable log appended on each progress change. Useful for post-hoc analysis and verification scripts. | ||
|
|
||
| `on_progress` callback: Optional parameter to `await_result()` or `await_task_completion()`. Receives a `ProgressInfo` dataclass with `pending`, `running`, `completed`, `failed`, `total` counts. Only fires when the snapshot changes (deduplication prevents redundant calls). | ||
|
|
||
| ### Snapshot deduplication | ||
|
|
||
| Progress callbacks and output only trigger when the tuple `(pending, running, completed, failed)` changes from the last poll. This prevents flooding stderr or the callback with identical lines when the engine hasn't made progress between polls. | ||
|
|
||
| ## Plugin System | ||
|
|
||
| ### What the plugin bundles | ||
|
|
||
| The Claude Code plugin (`.claude-plugin/plugin.json`) ships: | ||
|
|
||
| 1. **MCP server** — All 9 tools, auto-started by Claude Code | ||
| 2. **Hooks** — Stop guard, results tracking, session cleanup | ||
| 3. **Skill** (`skills/everyrow-sdk/SKILL.md`) — SDK code-generation guidance for the Skills path | ||
|
|
||
| ### Hooks | ||
|
|
||
| PostToolUse (matcher: `mcp__plugin_everyrow_everyrow__everyrow_results`): | ||
| Runs `everyrow-track-results.sh`, sends a macOS desktop notification with completion summary, then deletes `/tmp/everyrow-task.json`. | ||
|
|
||
| Stop: | ||
| Runs `everyrow-stop-guard.sh`, reads `/tmp/everyrow-task.json`. If a task is running, outputs `{"decision": "block", "reason": "..."}` which prevents Claude from ending its turn. The reason text instructs Claude to call `everyrow_progress` to check status. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wait so we continually burn through CC tokens until the task finishes, trapping CC in a loop of needing to call everyrow_progress? |
||
|
|
||
| Note: Claude Code displays stop hook blocks as "Stop hook error: ..." This is a cosmetic UI bug ([claude-code#12667](https://github.com/anthropics/claude-code/issues/12667)), not an actual error. The hook is working correctly. | ||
|
|
||
| **SessionEnd**: | ||
| Runs `rm -f /tmp/everyrow-task.json` to clean up tracking state. | ||
|
|
||
| ### Status Line | ||
|
|
||
| The status line script (`everyrow-mcp/scripts/everyrow-statusline.sh`) is not part of the plugin (the plugin format cannot write to `settings.json`). It must be manually configured: | ||
|
|
||
| ```json | ||
| { | ||
| "statusLine": { | ||
| "type": "command", | ||
| "command": "<path>/everyrow-mcp/scripts/everyrow-statusline.sh", | ||
| "padding": 1 | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| The script reads `/tmp/everyrow-task.json` on each refresh and renders: | ||
| ``` | ||
| everyrow ████████░░░░░░░ 42/100 23s view | ||
| ``` | ||
|
|
||
| Features: model name, context usage from Claude Code env vars, progress bar, elapsed time, failure count (yellow if >0), OSC 8 clickable link to session URL (works in iTerm2, kitty, WezTerm, Windows Terminal; degrades gracefully elsewhere). | ||
|
|
||
| ## Known Issues | ||
|
|
||
| | Issue | Impact | Workaround | | ||
| |-------|--------|------------| | ||
| | Plugin hooks don't fire with `isLocal: true` ([#14410](https://github.com/anthropics/claude-code/issues/14410)) | Hooks from `--plugin-dir` installs are silently ignored | Install from git URL or duplicate hooks into `.claude/settings.json` | | ||
| | Hook matchers don't support mid-pattern wildcards | `foo_*_bar` won't match | Use exact match or trailing `*` only | | ||
| | "Stop hook error:" cosmetic UI ([#12667](https://github.com/anthropics/claude-code/issues/12667)) | Misleading error display when stop guard blocks | No workaround — craft self-documenting reason text | | ||
| | Hooks stop after ~2.5h ([#16047](https://github.com/anthropics/claude-code/issues/16047)) | Stop guard and notifications stop in long sessions | Restart session for operations in sessions >2h | | ||
| | `type: "prompt"` hooks broken in plugins ([#13155](https://github.com/anthropics/claude-code/issues/13155)) | Must use `type: "command"` for all hooks | Already handled in plugin config | | ||
| | `/tmp/everyrow-task.json` is singleton | Only one operation tracked at a time | Sequential operations only (matches current behavior) | | ||
| | MCP/plugins/hooks can't reload mid-session | Any config change requires restart | Restart Claude Code after changes | | ||
| | `jq` required | Status line and hook scripts need `jq` installed | `brew install jq` (macOS) or `apt install jq` (Linux) | | ||
|
|
||
| ## Rejected Alternatives | ||
|
|
||
| MCP Tasks (SEP-1686): The official MCP long-running task protocol. Near-zero client adoption as of Feb 2026. Custom submit/poll tools work today across all MCP clients. | ||
|
|
||
| MCP Notifications / Progress tokens: Not displayed by Claude Code or any major client. Would be invisible to users even if implemented. | ||
|
|
||
| Server-Sent Events (SSE): Would replace polling with push notifications. Adds complexity (persistent connections, reconnection logic) for marginal gain. The 12s polling cadence already provides smooth UX. MCP's stdio transport doesn't support SSE natively. | ||
|
|
||
| Hook-based state tracking: The original design had PostToolUse hooks on `_submit` and `_progress` tools parse `tool_response` JSON and write the task state file. This was fragile because plugin MCP tool responses are double-escaped JSON strings (`{"result": "<escaped>"}`) that required careful parsing. Moving state writes into the MCP server itself (`_write_task_state()`) was simpler and more reliable. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I still don't follow why we need state at all |
||
Uh oh!
There was an error while loading. Please reload this page.