Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,8 @@ sdk-ts/dist/
# Secrets and env files
.env
.env.*
ocw.toml
.ocw/
tj.toml
.tj/

# Claude Code
.claude/
Expand Down
4 changes: 2 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/).
## [0.1.7] - 2026-04-13

### Added
- **MCP server (`tj mcp`)** — stdio-based Model Context Protocol server giving Claude Code direct access to OCW observability data. 13 tool handlers: status, traces, alerts, budget headroom, cost summary, drift report, tool stats, trace detail, acknowledge alerts, setup project, list sessions, open dashboard. Dual-mode operation: routes queries through REST API when `tj serve` is running, falls back to read-only DuckDB otherwise. Auto-starts `tj serve` on demand.
- **MCP server (`tj mcp`)** — stdio-based Model Context Protocol server giving Claude Code direct access to TokenJam observability data. 13 tool handlers: status, traces, alerts, budget headroom, cost summary, drift report, tool stats, trace detail, acknowledge alerts, setup project, list sessions, open dashboard. Dual-mode operation: routes queries through REST API when `tj serve` is running, falls back to read-only DuckDB otherwise. Auto-starts `tj serve` on demand.
- **Claude Code integration (`tj onboard --claude-code`)** — one-command setup for Claude Code telemetry. Configures OTLP log exporter in `~/.claude/settings.json`, sets project-level `OTEL_RESOURCE_ATTRIBUTES`, adds Docker-compatible endpoint to shell env, and optionally installs background daemon. Re-runs resync the auth header to fix 401s without manual setup.
- **Logs ingestion (`POST /v1/logs`)** — new OTLP log endpoint that converts Claude Code log events (`api_request`, `tool_result`, `api_error`, `user_prompt`, `tool_decision`) into NormalizedSpans with deterministic trace/span IDs. Spans flow through the standard ingest pipeline for cost, alerts, and drift.
- **`tj drift` CLI** — behavioral drift report with Rich table output showing baseline vs latest session Z-scores per dimension (input tokens, output tokens, duration, tool call count, tool sequence similarity). Color-coded thresholds, `--json` support, exit code 1 if drift detected.
Expand Down Expand Up @@ -106,7 +106,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/).

### Added
- `tj stop` command — graceful shutdown of daemon or background process
- `tj uninstall` command — clean removal of all OCW data, config, and daemon
- `tj uninstall` command — clean removal of all TokenJam data, config, and daemon
- 16 runnable example agents across 4 tiers: single provider, single framework, multi-agent, and alerts/drift demos
- API fallback backend (`ApiBackend`) so CLI works while `tj serve` holds the DB lock

Expand Down
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ Post-ingest hooks run synchronously after each span is written to DB:
| `tj drift` | `cmd_drift.py` | Show drift baselines and Z-scores for recent sessions |
| `tj demo [scenario]` | `cmd_demo.py` | Run Agent Incident Library scenarios (zero-config, no API keys). `tj demo` lists all; `tj demo retry-loop` runs one |
| `tj mcp` | `cmd_mcp.py` | Start the stdio MCP server for Claude Code integration |
| `tj uninstall` | `cmd_uninstall.py` | Remove all OCW data, config, and daemon |
| `tj uninstall` | `cmd_uninstall.py` | Remove all TokenJam data, config, and daemon |
| `tj doctor` | `cmd_doctor.py` | Health checks (config, DB, secrets, webhooks, drift readiness, schema-vs-capture consistency). Exit 0 = ok, 1 = warnings, 2 = errors |

All commands support `--json` for machine-readable output. Commands that query alerts use exit code 1 if active (unacknowledged, unsuppressed) alerts exist.
Expand Down
4 changes: 2 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,9 +10,9 @@ test-e2e:
pytest tests/e2e/

lint:
ruff check ocw/
ruff check tokenjam/

typecheck:
mypy ocw/
mypy tokenjam/

all: lint typecheck test
56 changes: 28 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ For any Python agent — Anthropic, OpenAI, Gemini, Bedrock, LangChain, CrewAI,
```bash
pip install tokenjam
tj onboard # creates config, generates ingest secret
ocw doctor # verify your setup
tj doctor # verify your setup
```

```python
Expand Down Expand Up @@ -155,12 +155,12 @@ https://github.com/user-attachments/assets/b94d13f6-1432-40d4-b093-6958d74f0e65

```bash
tj status # current state, cost, active alerts
ocw traces # full span history with waterfall view
ocw cost --since 7d # cost breakdown by agent, model, day
ocw alerts # everything that fired while you were away
ocw budget # view and set daily/session cost limits
ocw drift # behavioral drift Z-scores vs baseline
ocw tools # tool call history with error rates
tj traces # full span history with waterfall view
tj cost --since 7d # cost breakdown by agent, model, day
tj alerts # everything that fired while you were away
tj budget # view and set daily/session cost limits
tj drift # behavioral drift Z-scores vs baseline
tj tools # tool call history with error rates
tj serve # start the web UI + REST API
```

Expand All @@ -183,7 +183,7 @@ No signup, no cloud — runs entirely on your machine.

---

## ocw vs LangSmith vs Langfuse
## tj vs LangSmith vs Langfuse

LangSmith and Langfuse are excellent for tracing LLM API calls and running evals on chat outputs. `tj` solves a different problem: **autonomous agents running unsupervised with real-world consequences**.

Expand Down Expand Up @@ -251,14 +251,14 @@ The MCP server gives Claude Code direct access to your observability data inside
| `get_tool_stats` | Tool call counts and average duration |
| `get_drift_report` | Drift baseline vs latest session |
| `acknowledge_alert` | Mark an alert as acknowledged |
| `setup_project` | Configure a project for OCW telemetry |
| `setup_project` | Configure a project for TokenJam telemetry |
| `open_dashboard` | Open the web UI (starts `tj serve` if needed) |

The MCP server opens DuckDB read-only — no lock conflicts with `tj serve`.

**Per-project tagging** — after installing globally, ask Claude Code:

> "Set up OCW for this project"
> "Set up TokenJam for this project"

Claude calls `setup_project`, which writes `.claude/settings.json` with the right `OTEL_RESOURCE_ATTRIBUTES` for this project.

Expand All @@ -274,8 +274,8 @@ tj onboard --codex
`tj onboard --codex` is project-agnostic. It writes to `~/.codex/config.toml` (Codex's single global config), so you only run it once — not once per project. Codex hardcodes `service.name="codex_exec"` in its binary, so all sessions appear under the same agent ID regardless of which repo you're working in.

`tj onboard --codex`:
- Writes an `[otel]` block and `[mcp_servers.ocw]` to `~/.codex/config.toml`
- Registers the MCP server so Codex can call OCW tools directly
- Writes an `[otel]` block and `[mcp_servers.tj]` to `~/.codex/config.toml`
- Registers the MCP server so Codex can call TokenJam tools directly
- Installs the background daemon (launchd / systemd)

**Codex must be restarted** after running `tj onboard --codex`.
Expand All @@ -289,14 +289,14 @@ The same 13 MCP tools available to Claude Code are available to Codex after rest
### Uninstalling

```bash
# Remove all OCW data, config, daemon, MCP registration, and env vars:
ocw uninstall --yes
# Remove all TokenJam data, config, daemon, MCP registration, and env vars:
tj uninstall --yes

# Then remove the package:
pip uninstall tokenjam -y
```

`tj uninstall` cleans up everything set by `tj onboard --claude-code`: daemon, MCP server, `~/.ocw/`, `~/.config/ocw/`, OTLP env vars in `~/.claude/settings.json`, `OTEL_RESOURCE_ATTRIBUTES` in every onboarded project's `.claude/settings.json`, and the harness env block in `~/.zshrc`.
`tj uninstall` cleans up everything set by `tj onboard --claude-code`: daemon, MCP server, `~/.tj/`, `~/.config/tj/`, OTLP env vars in `~/.claude/settings.json`, `OTEL_RESOURCE_ATTRIBUTES` in every onboarded project's `.claude/settings.json`, and the harness env block in `~/.zshrc`.

---

Expand Down Expand Up @@ -341,7 +341,7 @@ Full framework support guide: [docs/framework-support.md](docs/framework-support
Configure where alerts go. Multiple channels work simultaneously.

```toml
# .ocw/config.toml
# .tj/config.toml

[[alerts.channels]]
type = "ntfy"
Expand Down Expand Up @@ -382,10 +382,10 @@ Full event table and configuration: [docs/nemoclaw-integration.md](docs/nemoclaw
## Export and integrate

```bash
ocw export --format otlp # forward to Grafana, Datadog, any OTel backend
ocw export --format openevals # openevals / agentevals trajectory evaluation
ocw export --format json # NDJSON
ocw export --format csv
tj export --format otlp # forward to Grafana, Datadog, any OTel backend
tj export --format openevals # openevals / agentevals trajectory evaluation
tj export --format json # NDJSON
tj export --format csv
```

Prometheus metrics at `http://127.0.0.1:7391/metrics` when `tj serve` is running.
Expand Down Expand Up @@ -422,7 +422,7 @@ flowchart TD
Alerts --> DB
Schema --> DB

DB --> CLI["ocw CLI"]
DB --> CLI["tj CLI"]
DB --> API["REST API + Web UI\n:7391"]
DB --> MCP["MCP Server\n13 tools"]
DB --> Prom["Prometheus\n:7391/metrics"]
Expand All @@ -435,7 +435,7 @@ Full architecture deep-dive — design principles, SDK internals, alert system,
## Configuration

```toml
# .ocw/config.toml — generated by tj onboard
# .tj/config.toml — generated by tj onboard

[defaults.budget]
daily_usd = 10.00
Expand All @@ -462,7 +462,7 @@ completions = false
tool_outputs = false

[storage]
path = "~/.ocw/telemetry.duckdb"
path = "~/.tj/telemetry.duckdb"
retention_days = 90
```

Expand Down Expand Up @@ -503,18 +503,18 @@ See [`examples/README.md`](examples/README.md) for the full list.
Reproducible AI agent failures you can run in 30 seconds. No API keys, no config, no setup.

```bash
ocw demo # list all scenarios
ocw demo retry-loop # run one
ocw demo retry-loop --json # machine-readable output
tj demo # list all scenarios
tj demo retry-loop # run one
tj demo retry-loop --json # machine-readable output
```

| Scenario | What goes wrong | What OCW catches |
| Scenario | What goes wrong | What TokenJam catches |
|---|---|---|
| [`retry-loop`](incidents/retry-loop/README.md) | Agent retries a failing tool in a loop, burning time and tokens | `retry_loop` + `failure_rate` alerts fire automatically |
| [`surprise-cost`](incidents/surprise-cost/README.md) | Model silently escalates from Haiku to Opus mid-chain | Per-model cost breakdown shows the $3+ you didn't expect |
| [`hallucination-drift`](incidents/hallucination-drift/README.md) | Agent behavior shifts — different tokens, different tools | `drift_detected` alert fires with Z-scores at session end |

Each scenario runs against an in-memory backend and produces a side-by-side comparison: what `print()` shows vs. what OCW reveals.
Each scenario runs against an in-memory backend and produces a side-by-side comparison: what `print()` shows vs. what TokenJam reveals.

---

Expand Down
2 changes: 1 addition & 1 deletion SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ Security issues in these areas are especially important to report.
## Not in Scope

- Vulnerabilities in upstream dependencies (report to the upstream project)
- Issues that require physical access to the machine running `ocw`
- Issues that require physical access to the machine running `tj`
8 changes: 4 additions & 4 deletions docs/alerts.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Alerts

`ocw` fires alerts the moment something happens — sensitive tool calls, budget breaches, behavioral drift, sandbox violations. Alerts are evaluated after every span ingest and dispatched to your configured channels in real time.
`tj` fires alerts the moment something happens — sensitive tool calls, budget breaches, behavioral drift, sandbox violations. Alerts are evaluated after every span ingest and dispatched to your configured channels in real time.

## Alert types

Expand All @@ -22,7 +22,7 @@

## Channels

Configure where alerts go in `.ocw/config.toml`. Multiple channels work simultaneously — you can get push notifications on your phone and a Discord message at the same time.
Configure where alerts go in `.tj/config.toml`. Multiple channels work simultaneously — you can get push notifications on your phone and a Discord message at the same time.

```toml
# Push notification (free, no account required)
Expand All @@ -49,7 +49,7 @@ url = "https://your-endpoint.com/alerts"
# Local file log
[[alerts.channels]]
type = "file"
path = "~/.ocw/alerts.log"
path = "~/.tj/alerts.log"

# Stdout (always enabled by default)
[[alerts.channels]]
Expand Down Expand Up @@ -77,7 +77,7 @@ Define which tool calls should trigger immediate alerts:

## Cooldown

To prevent alert storms, `ocw` tracks a cooldown per agent + alert type. Repeat alerts within the cooldown window are suppressed — still persisted to the database, but not dispatched to channels.
To prevent alert storms, `tj` tracks a cooldown per agent + alert type. Repeat alerts within the cooldown window are suppressed — still persisted to the database, but not dispatched to channels.

```toml
[alerts]
Expand Down
Loading
Loading