Skip to content
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
3aa35b0
Add progress UX: SDK output, MCP submit/poll/results, hooks, status line
dschwarz26 Feb 8, 2026
c6cbe72
Fix structuredContent rendering and simplify hooks
dschwarz26 Feb 8, 2026
017780a
Update docs for streaming, add and cleanup tests
dschwarz26 Feb 8, 2026
796ebf1
Fixes
dschwarz26 Feb 9, 2026
69709e5
Change output messages, give example in docs
dschwarz26 Feb 9, 2026
e248787
Regenerate OpenAPI code to include task progress
rgambee Feb 10, 2026
462120a
Fix formatting and linter errors
rgambee Feb 11, 2026
7902c9b
Use wall time, don't overwrite start time
rgambee Feb 11, 2026
f829fe9
Catch exceptions from progress callback
rgambee Feb 11, 2026
edaf09b
Check task status before fetching results
rgambee Feb 11, 2026
2c298ff
Use ~/.everyrow/task.json for storing progress
rgambee Feb 11, 2026
49a6a72
Check that jq is available
rgambee Feb 11, 2026
a9ebbf6
Truncate start time
rgambee Feb 11, 2026
d91940b
Use notify-send to show notifcation on Linux
rgambee Feb 11, 2026
6964aef
Report progress for dedupe, merge and screen
rgambee Feb 11, 2026
e8aad1c
Properly convert from TaskProgressInfo to ProgressInfo
rgambee Feb 11, 2026
4b56bbd
Use TaskProgressInfo instead of ProgressInfo
rgambee Feb 11, 2026
a20a46e
Use dataclass to store active task info, not dict
rgambee Feb 11, 2026
70dbce2
Set client parameter type to AuthenticatedClient
rgambee Feb 11, 2026
8182f2b
Use logging instead of print statements
rgambee Feb 11, 2026
e80cfb3
Update documentation
rgambee Feb 11, 2026
f5a25b8
Bump SDK version within MCP
rgambee Feb 11, 2026
d0ca4cd
Remove misleading sentence from error message
rgambee Feb 11, 2026
c5d500d
Include more exception details in error message
rgambee Feb 11, 2026
9c07b2e
Remove global cache of active tasks
rgambee Feb 12, 2026
1acacb5
Remove sync tools from MCP server
rgambee Feb 12, 2026
41f3f37
Update MCP docstrings
rgambee Feb 12, 2026
92ccd1d
Rewrite integration tests
rgambee Feb 12, 2026
5542fc9
Update MCP README
rgambee Feb 12, 2026
57c4a03
Fix elapsed time calculation
rgambee Feb 12, 2026
a9468fe
Remove model and context usage from status bar
rgambee Feb 12, 2026
86d11b0
Delete task state file on startup and submission
rgambee Feb 12, 2026
7edc5ff
Use client as a context manager
rgambee Feb 12, 2026
c6c82ab
Change field_type to match rank_async() param
rgambee Feb 12, 2026
1e1a172
Remove JSONL logging, change stderr logging
rgambee Feb 12, 2026
57796a7
Fix rank input validation test
rgambee Feb 12, 2026
2f3e951
Delete files used for testing
rgambee Feb 12, 2026
d10557d
Delete redundant CI workflow
rgambee Feb 12, 2026
3afcd78
Run bash tests in CI
rgambee Feb 12, 2026
56af756
Remove _submit suffix from MCP tools
rgambee Feb 12, 2026
c80ff21
Remove plugin hint
rgambee Feb 12, 2026
0ff1684
Deduplidate session URL construction
rgambee Feb 12, 2026
ca4e95f
Delete planning doc for progress reporting
rgambee Feb 12, 2026
a46118e
Update documentation
rgambee Feb 12, 2026
07853d9
Merge main
rgambee Feb 12, 2026
56806f4
Merge branch 'main' into streaming-sdk
rgambee Feb 12, 2026
225eacb
Rename input classes
rgambee Feb 12, 2026
ddf10cd
Update status line tests
rgambee Feb 12, 2026
36bee1d
Delete test scripts
rgambee Feb 12, 2026
a0eb377
Remove API URL from plugin env
rgambee Feb 12, 2026
d6baaba
Use helper function to check API return type
rgambee Feb 12, 2026
3c6ee36
Don't mention poll sleep time in docs
rgambee Feb 12, 2026
106393b
Remove mentions of async operations
rgambee Feb 12, 2026
37aa8c7
Point to progress bar setup from installation
rgambee Feb 12, 2026
652f6d3
Allow links to progress bar section
rgambee Feb 12, 2026
aa46a62
Move MCP server from API reference to overview
rgambee Feb 12, 2026
c3e2bca
Remove extra slash from session URLs
rgambee Feb 12, 2026
afdef7b
Adjust SDK progress formatting
rgambee Feb 12, 2026
4ce1dd8
Update documentation
rgambee Feb 12, 2026
9533645
Log state write erros at debug level
rgambee Feb 12, 2026
daf864c
Fix input class names in test_server.py
rgambee Feb 12, 2026
7730358
Add external URLs to list of ones to skip
rgambee Feb 12, 2026
f438791
Fix link to installation guide
rgambee Feb 12, 2026
1a54404
Bump SDK and MCP to version 0.3.0
rgambee Feb 12, 2026
666d5ca
Merge branch main
rgambee Feb 12, 2026
f5fc4ac
Bump version number in more places
rgambee Feb 12, 2026
46bd98f
Make SDK progress info opt-in
rgambee Feb 12, 2026
eb5ba19
Update documentation
rgambee Feb 12, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 36 additions & 3 deletions .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,44 @@
"repository": "https://github.com/futuresearch/everyrow-sdk",
"mcpServers": {
"everyrow": {
"command": "uvx",
"args": ["everyrow-mcp"],
"command": "${CLAUDE_PLUGIN_ROOT}/everyrow-mcp/.venv/bin/everyrow-mcp",
"env": {
"EVERYROW_API_KEY": "${EVERYROW_API_KEY}"
"EVERYROW_API_KEY": "${EVERYROW_API_KEY}",
"EVERYROW_API_URL": "${EVERYROW_API_URL}"
}
}
},
"hooks": {
"PostToolUse": [
{
"matcher": "mcp__plugin_everyrow_everyrow__everyrow_results",
"hooks": [
{
"type": "command",
"command": "\"${CLAUDE_PLUGIN_ROOT}/everyrow-mcp/scripts/everyrow-track-results.sh\""
}
]
}
],
"Stop": [
{
"hooks": [
{
"type": "command",
"command": "\"${CLAUDE_PLUGIN_ROOT}/everyrow-mcp/scripts/everyrow-stop-guard.sh\""
}
]
}
],
"SessionEnd": [
{
"hooks": [
{
"type": "command",
"command": "rm -f /tmp/everyrow-task.json"
}
]
}
]
}
}
82 changes: 82 additions & 0 deletions .github/workflows/checks.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
name: Checks

on:
pull_request:
types: [opened, synchronize, reopened]
push:
branches: [main]

concurrency:
group: checks-${{ github.ref }}
cancel-in-progress: true

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: astral-sh/setup-uv@v4

- name: Install dependencies
run: uv sync

- name: Ruff check
run: uv run ruff check .

- name: Ruff format check
run: uv run ruff format --check .

- name: Pyright
run: uv run basedpyright
Comment thread
rgambee marked this conversation as resolved.
Outdated

test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: astral-sh/setup-uv@v4

- name: Install dependencies
run: uv sync

- name: Run unit tests
run: uv run pytest tests/ -v --tb=short

mcp-test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- uses: astral-sh/setup-uv@v4

- name: Install MCP dependencies
working-directory: everyrow-mcp
run: uv sync

- name: Run MCP server tests
working-directory: everyrow-mcp
run: uv run pytest tests/ -v --tb=short

shell-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Install jq
run: sudo apt-get install -y jq

- name: Run status line tests
run: bash everyrow-mcp/tests/test_statusline.sh

- name: Run stop guard tests
run: bash everyrow-mcp/tests/test_hook_stop_guard.sh

- name: Run results hook tests
run: bash everyrow-mcp/tests/test_hook_results.sh
166 changes: 166 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
# Architecture

Technical reference for the everyrow SDK, MCP server, and Claude Code plugin.

## Overview

everyrow provides two interfaces:

1. **MCP server** — 9 tools exposed directly to Claude Code / Codex. Zero-code, conversational progress updates, results saved as CSV.
2. **Python SDK** — `from everyrow.ops import rank, agent_map, ...` for scripts, pipelines, and programmatic control.

Both share the same backend (Cohort engine) and session/task model. The Claude Code plugin ships both in a single package, and the Skill gives further instructions.

## MCP Server

### Tools

The MCP server (`everyrow-mcp/src/everyrow_mcp/server.py`) exposes 9 tools via [FastMCP](https://github.com/jlowin/fastmcp):

Blocking tools (run to completion, return results inline):
- `everyrow_screen` — Filter rows by natural language criteria
- `everyrow_rank` — Score and sort rows
- `everyrow_dedupe` — Remove semantic duplicates
- `everyrow_merge` — Join two CSVs by intelligent entity matching
- `everyrow_agent` — Run web research agents on each row

Submit/poll tools (for long-running operations):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not re-use the _async naming pattern of the sdk? submit doesn't seem very descriptive

- `everyrow_agent_submit` — Submit agent_map, return immediately with `task_id` and `session_url`
- `everyrow_rank_submit` — Submit rank, return immediately
- `everyrow_progress` — Poll task status (blocks ~12s server-side, returns progress text)
- `everyrow_results` — Retrieve completed results, save to CSV

All tools use `@mcp.tool(structured_output=False)` to suppress FastMCP's `structuredContent` field. Without this, Claude Code displays raw JSON blobs instead of clean text (see [claude-code#9962](https://github.com/anthropics/claude-code/issues/9962)).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it matter much what claude code sees? I'd probably optimise MCP user experience for claude desktop/GUI clients. It's possible that they also display it weirdly...


### Submit/Poll Pattern

Long-running operations (agent_map, rank) use a submit/poll pattern because:
- Operations take 1–10+ minutes
- LLMs cannot tell time and will hallucinate if asked to wait ([arXiv:2601.13206](https://arxiv.org/abs/2601.13206))
- Client-side timeouts (60s in Codex CLI) kill blocking calls
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we used a streaming api, it could start by returning the task id, then if the timeout kills it the client can fallback to fetching/polling with the additional endpoint? that way we smoothly transition from short blockable tasks, to those that just miss out on the timeout, without killing the task if the timeout is hit


The flow:

```
1. everyrow_*_submit → creates session, submits async task, returns task_id + session_url (0.6s)
2. everyrow_progress → server blocks 12s, polls engine, returns status text (12-15s per call)
3. (repeat step 2) → progress text says "call everyrow_progress again immediately"
4. everyrow_results → on completion, fetches data, saves CSV, cleans up
```

Server-controlled pacing: The `PROGRESS_POLL_DELAY` constant (12s) controls how long `everyrow_progress` blocks before returning. This prevents the agent from burning inference tokens on rapid polling. Combined with ~3s inference overhead, users see updates every ~15s.

Chaining instructions: The progress tool's response text includes "Immediately call everyrow_progress again" to keep the agent in a tight poll loop. This is critical — without it, Claude tends to stop and ask the user if they want to check again.

### Task State Tracking

The MCP server maintains two forms of state for in-flight tasks:

In-process: `_active_tasks` dictionary keyed by `task_id`. Stores `session`, `session_ctx`, `client`, `total`, `session_url`, `started_at`, `input_csv`, `prefix`. Cleaned up by `everyrow_results`.

On-disk: `/tmp/everyrow-task.json` written by `_write_task_state()` on submit and each progress poll. Contains:

```json
{
"task_id": "abc123",
"session_url": "https://everyrow.io/sessions/...",
"total": 50,
"completed": 23,
"failed": 1,
"running": 5,
"status": "running",
"started_at": 1707400000.0
}
```

This file is a singleton. Only one task is tracked at a time. It is read by the status line script and hook scripts (see below). The MCP server writes it directly rather than relying on hooks, which avoids the fragile double-escaped JSON parsing required to extract `tool_response` from plugin MCP tools.

## SDK Progress Output

The SDK's `await_task_completion()` function (`src/everyrow/task.py`) polls the engine every 2 seconds and provides progress through three channels:

stderr (default): Timestamped progress lines:
```
[11:16:55] Session: https://everyrow.io/sessions/abc123
[11:16:55] Starting (50 agents)...
[11:16:57] [5/50] 10% | 5 running, 0 failed
[11:17:03] [12/50] 24% | 8 running, 0 failed | ~15s remaining
...
[11:18:20] [50/50] 100% | Done (85.2s total)
[11:18:20] Results: 49 succeeded, 1 failed
```

JSONL log (`~/.everyrow/progress.jsonl`): Machine-readable log appended on each progress change. Useful for post-hoc analysis and verification scripts.

`on_progress` callback: Optional parameter to `await_result()` or `await_task_completion()`. Receives a `ProgressInfo` dataclass with `pending`, `running`, `completed`, `failed`, `total` counts. Only fires when the snapshot changes (deduplication prevents redundant calls).

### Snapshot deduplication

Progress callbacks and output only trigger when the tuple `(pending, running, completed, failed)` changes from the last poll. This prevents flooding stderr or the callback with identical lines when the engine hasn't made progress between polls.

## Plugin System

### What the plugin bundles

The Claude Code plugin (`.claude-plugin/plugin.json`) ships:

1. **MCP server** — All 9 tools, auto-started by Claude Code
2. **Hooks** — Stop guard, results tracking, session cleanup
3. **Skill** (`skills/everyrow-sdk/SKILL.md`) — SDK code-generation guidance for the Skills path

### Hooks

PostToolUse (matcher: `mcp__plugin_everyrow_everyrow__everyrow_results`):
Runs `everyrow-track-results.sh`, sends a macOS desktop notification with completion summary, then deletes `/tmp/everyrow-task.json`.

Stop:
Runs `everyrow-stop-guard.sh`, reads `/tmp/everyrow-task.json`. If a task is running, outputs `{"decision": "block", "reason": "..."}` which prevents Claude from ending its turn. The reason text instructs Claude to call `everyrow_progress` to check status.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait so we continually burn through CC tokens until the task finishes, trapping CC in a loop of needing to call everyrow_progress?


Note: Claude Code displays stop hook blocks as "Stop hook error: ..." This is a cosmetic UI bug ([claude-code#12667](https://github.com/anthropics/claude-code/issues/12667)), not an actual error. The hook is working correctly.

**SessionEnd**:
Runs `rm -f /tmp/everyrow-task.json` to clean up tracking state.

### Status Line

The status line script (`everyrow-mcp/scripts/everyrow-statusline.sh`) is not part of the plugin (the plugin format cannot write to `settings.json`). It must be manually configured:

```json
{
"statusLine": {
"type": "command",
"command": "<path>/everyrow-mcp/scripts/everyrow-statusline.sh",
"padding": 1
}
}
```

The script reads `/tmp/everyrow-task.json` on each refresh and renders:
```
everyrow ████████░░░░░░░ 42/100 23s view
```

Features: model name, context usage from Claude Code env vars, progress bar, elapsed time, failure count (yellow if >0), OSC 8 clickable link to session URL (works in iTerm2, kitty, WezTerm, Windows Terminal; degrades gracefully elsewhere).

## Known Issues

| Issue | Impact | Workaround |
|-------|--------|------------|
| Plugin hooks don't fire with `isLocal: true` ([#14410](https://github.com/anthropics/claude-code/issues/14410)) | Hooks from `--plugin-dir` installs are silently ignored | Install from git URL or duplicate hooks into `.claude/settings.json` |
| Hook matchers don't support mid-pattern wildcards | `foo_*_bar` won't match | Use exact match or trailing `*` only |
| "Stop hook error:" cosmetic UI ([#12667](https://github.com/anthropics/claude-code/issues/12667)) | Misleading error display when stop guard blocks | No workaround — craft self-documenting reason text |
| Hooks stop after ~2.5h ([#16047](https://github.com/anthropics/claude-code/issues/16047)) | Stop guard and notifications stop in long sessions | Restart session for operations in sessions >2h |
| `type: "prompt"` hooks broken in plugins ([#13155](https://github.com/anthropics/claude-code/issues/13155)) | Must use `type: "command"` for all hooks | Already handled in plugin config |
| `/tmp/everyrow-task.json` is singleton | Only one operation tracked at a time | Sequential operations only (matches current behavior) |
| MCP/plugins/hooks can't reload mid-session | Any config change requires restart | Restart Claude Code after changes |
| `jq` required | Status line and hook scripts need `jq` installed | `brew install jq` (macOS) or `apt install jq` (Linux) |

## Rejected Alternatives

MCP Tasks (SEP-1686): The official MCP long-running task protocol. Near-zero client adoption as of Feb 2026. Custom submit/poll tools work today across all MCP clients.

MCP Notifications / Progress tokens: Not displayed by Claude Code or any major client. Would be invisible to users even if implemented.

Server-Sent Events (SSE): Would replace polling with push notifications. Adds complexity (persistent connections, reconnection logic) for marginal gain. The 12s polling cadence already provides smooth UX. MCP's stdio transport doesn't support SSE natively.

Hook-based state tracking: The original design had PostToolUse hooks on `_submit` and `_progress` tools parse `tool_response` JSON and write the task state file. This was fragile because plugin MCP tool responses are double-escaped JSON strings (`{"result": "<escaped>"}`) that required careful parsing. Moving state writes into the MCP server itself (`_write_task_state()`) was simpler and more reliable.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't follow why we need state at all

Loading
Loading