Skip to content

feat(scouts): scout fleet control pane in the agents area#2586

Open
andrewm4894 wants to merge 38 commits into
mainfrom
feat/scouts-ui
Open

feat(scouts): scout fleet control pane in the agents area#2586
andrewm4894 wants to merge 38 commits into
mainfrom
feat/scouts-ui

Conversation

@andrewm4894

@andrewm4894 andrewm4894 commented Jun 10, 2026

Copy link
Copy Markdown
Member

Problem

The signals-scout fleet (scheduled agents that sweep a project and emit findings to the Signals inbox) had no UI in PostHog Code. Configs could only be inspected and tuned via the MCP tools or the REST API directly: no way to see what scouts exist, whether they are healthy, what they found, or to toggle/tune them without leaving the app.

Changes

Adds a scouts control pane to the agents area (/code/agents), plus a per-scout detail screen.

Fleet section (expandable, on the agents config page):

  • Collapsed: one-line pulse (enabled count, last dispatch time).
  • Expanded: fleet summary (running now, success rate, signals emitted over a true last-24h window), one ScoutRowCard per scout with origin/dry-run badges (with explainer tooltips), cadence, per-run outcome boxes, enable switch, and a gear that expands an inline config form (mode + cadence).
  • Run outcome boxes: quiet runs recede to gray so color only means something happened — iris = emitted, red/amber = error/timeout, blue pulse = running, hollow = queued. Boxes deep-link to the backing task run in PostHog cloud.
  • Whole row navigates to the scout detail screen via stretched-link; links to the cloud skill and the two official helper skills (authoring/exploring signals scouts).

Chat CTAs (one-click agent tasks, mirroring the inbox discuss flow):

  • Two templated question chips above the fleet list — "How is my scout fleet performing?" and "What signals were emitted recently?" — each creates an auto-mode cloud task driven by the exploring-signals-scouts skill and navigates straight to it.
  • A sparkle icon on each scout row ("Chat with PostHog about this scout") fires the same flow scoped to that scout: config, run history, emissions, related memory, tuning suggestions.
  • Enabled by making the report context optional on useInboxCloudTaskRunner (report-scoped runs keep their signal_report analytics; these track as manual).

Scout detail screen (/code/agents/scouts/$skillName), signals-first:

  • Shared ScoutRowCard header, 24h rollup line, a Signals section showing emissions from emitted runs, then the run list with All/Emitted/Quiet/Failed filter chips.
  • Emission cards and run rows collapse/expand with the same grammar: two-line clamped markdown preview, expanded shows full markdown plus footer (mono id, external task-run link).
  • Drill-down beyond the summary is the cloud task run; there is deliberately no in-app run detail page.

Plumbing (per the repo's layering rules):

  • @posthog/api-client: scout config/run/emission methods.
  • @posthog/core: scouts/scoutPresentation.ts (rollups, outcome derivation, filters) and scoutRunsWindow.ts (paginated 24h fleet window assembly), both unit-tested.
  • @posthog/ui: features/scouts/ components and hooks, nested TanStack routes.

Known backend gaps (per-scout run filter, failure info on runs, aggregate stats) are patched client-side for now and tracked for scout_harness fixes in the posthog repo.

How did you test this?

  • Unit tests for the core presentation and window logic (scoutPresentation.test.ts, scoutRunsWindow.test.ts); full ui/core test suites, Biome, and typecheck pass (also enforced per-commit by the pre-commit hook).
  • Iterated live against a dogfooding project: fleet list, config edits (enable toggle, cadence, dry-run), detail screen, run boxes, task-run deep links, empty states.
  • Not yet verified: config writes with a token whose signal_scout:write scope is unconfirmed (reads confirmed working); the chat CTAs end-to-end (task creation path is the same shared runner the inbox discuss/create-PR flows use).

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

Adds a Scouts surface under /code/agents/scouts backed by the PostHog Cloud
scout endpoints (/api/projects/{id}/signals/scout/):

- api-client: ScoutConfig/ScoutRun/ScoutEmission/ScoutScratchpadEntry types
  and list/get/patch methods for configs, runs, emissions, scratchpad
- core/scouts: pure presentation helpers (status normalization, timeout
  inference, stuck-run detection, run filters, fleet/per-scout rollups,
  interval formatting) with unit tests
- ui/features/scouts: fleet view with inline config controls (on/off,
  live vs dry-run, cadence), per-scout run history with emitted/quiet/failed
  filter chips and summary previews, run detail with emissions, memory
  written, and an external task-log link
- agents config page gains a Scouts entry card (above Responders);
  /code/agents becomes a layout route with nested scout routes

Known API gaps are patched client-side and marked with comments: no
skill_name filter on runs (client-side filter over the recent window),
no failure kind (inferred from ~30m duration), no aggregate stats
(windowed client rollups), canonical-vs-custom via hardcoded skill list,
per-run memory limited to created entries.
SettingsOptionSelect hardcodes w-full on its trigger; as a direct flex
child in the scout row the interval select stretched against the row and
pushed the enable toggle past the card edge. Fixed trigger widths keep
the controls in a tidy column (w-36 fits the longest cadence label).
- Replace the standalone /code/agents/scouts fleet screen and summary
  card with ScoutsFleetSection, an expandable section on the agents
  page (collapsed pulse line; expanded list capped at ~10 rows with
  inner scroll). Detail routes survive; back link now goes to Agents.
- Scout run stats now describe a true last-24h window: core
  fetchScoutRunsWindow walks the backend's 100-row pages on a
  started_at cursor with run_id dedupe (created_at isn't serialized –
  api gap 7), flagging truncation honestly. All window copy updated.
- Status dots get tooltips + aria-labels explaining each state.
- Runs polling now only happens while the section is expanded.
…og cloud

- All signal counts now read "N signal(s) emitted" (was "0 signals (quiet)" /
  bare "N signals") across fleet rows and run rows.
- Scout detail header links to the skill in PostHog cloud (/skills/<name>)
  via new skillUrl helper in posthogLinks.
Each fleet row now shows a strip of small boxes, one per run in the
24h window (oldest left), colored by outcome: iris emitted, green
quiet, red failed, amber timed out, blue pulsing running, red pulsing
stuck. Hovering shows outcome + duration + relative time; clicking a
box opens the run detail (summary, emissions, task log link).

- core: ScoutRollup now carries the per-scout runs in timeline order;
  new deriveRunOutcome/scoutRunOutcomeLabel classifiers with tests.
- ui: new ScoutRunBoxes component wired into ScoutRow.
Fleet rows now show only the enable switch plus a gear toggle; mode
(live/dry-run) and cadence move into a labeled form that expands
under the row. The detail header keeps the full horizontal controls
strip. New ScoutEnabledSwitch + ScoutConfigForm exports alongside
ScoutConfigControls.
Drop the per-row stats subtitle (last ran / run counts / emitted) –
the status dot, run boxes, and their tooltips already carry it. A row
is now: dot, name, badges, cadence, run boxes, switch, gear.
Small external-link icon next to the name opening the skill in
PostHog cloud (/skills/<name>). Revealed on row hover to keep the
single-line rows quiet; sits outside the router Link to avoid a
nested anchor.
"N signals emitted" after the cadence chip, only when the scout
emitted at least one signal in the 24h window – quiet scouts stay
clean.
Extract the fleet row into ScoutRowCard and reuse it as the header
card on the scout detail screen (linkToDetail=false there), replacing
the old title block + Configuration card so the two surfaces stay
identical. Also drop the quiet-run sentence from the Scouts
subsection description.
Boxes with a task_url now link straight to the backing task run in
the browser, with the tooltip calling it out ('open task run in
PostHog'). Runs without a task link keep the in-app run detail
fallback.
"41 signals emitted (20%) · last 24h" – emitRate is the share of
window runs that emitted at least one signal; window label loses the
space ("24 h" → "24h") and its parentheses.
@andrewm4894 andrewm4894 self-assigned this Jun 10, 2026
@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Comments Outside Diff (2)

  1. packages/ui/src/features/scouts/components/ScoutDetailView.tsx, line 1423-1490 (link)

    P1 Premature "No config found" message while configs load

    useScoutConfigs() is called without capturing isLoading, so configs is undefined during the initial fetch. configs?.find(...) returns undefined, and the "No config found for this scout on the current project." message renders immediately — before the request completes. Any user who navigates directly to a scout detail URL will see this false empty state for the duration of the network round trip.

    The isLoading for runsWindow is captured and used to show a skeleton for the run list, but the same guard is missing for the config card above it.

    Rule Used: In cases where there are multiple states to handle... (source)

    Learned From
    PostHog/posthog#32610

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/ui/src/features/scouts/components/ScoutDetailView.tsx
    Line: 1423-1490
    
    Comment:
    **Premature "No config found" message while configs load**
    
    `useScoutConfigs()` is called without capturing `isLoading`, so `configs` is `undefined` during the initial fetch. `configs?.find(...)` returns `undefined`, and the "No config found for this scout on the current project." message renders immediately — before the request completes. Any user who navigates directly to a scout detail URL will see this false empty state for the duration of the network round trip.
    
    The `isLoading` for `runsWindow` is captured and used to show a skeleton for the run list, but the same guard is missing for the config card above it.
    
    **Rule Used:** In cases where there are multiple states to handle... ([source](https://app.greptile.com/posthog-org-19734/-/custom-context?memory=b99946fe-4cce-4bff-bdc0-8480d200548a))
    
    **Learned From**
    [PostHog/posthog#32610](https://github.com/PostHog/posthog/pull/32610)
    
    How can I resolve this? If you propose a fix, please make it concise.
  2. packages/core/src/scouts/scoutPresentation.test.ts, line 276-406 (link)

    P2 Prefer parameterised tests for input/output tables

    Several tests accumulate multiple independent expect assertions that map inputs to outputs one-to-one. When one case fails, the test name doesn't tell you which input was the culprit, and adding a new case requires editing prose rather than adding a row. it.each (or Vitest's test.each) is the standard pattern for these tables.

    Applies to: "normalizes TaskRun statuses case-insensitively" (5 cases), "formats durations" (4 cases), "classifies each run into a single outcome" (7 cases), "labels outcomes with emitted counts" (3 cases), "matches runs to filter chips" (5 cases), and "formats intervals" (5 cases).

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: packages/core/src/scouts/scoutPresentation.test.ts
    Line: 276-406
    
    Comment:
    **Prefer parameterised tests for input/output tables**
    
    Several tests accumulate multiple independent `expect` assertions that map inputs to outputs one-to-one. When one case fails, the test name doesn't tell you which input was the culprit, and adding a new case requires editing prose rather than adding a row. `it.each` (or Vitest's `test.each`) is the standard pattern for these tables.
    
    Applies to: `"normalizes TaskRun statuses case-insensitively"` (5 cases), `"formats durations"` (4 cases), `"classifies each run into a single outcome"` (7 cases), `"labels outcomes with emitted counts"` (3 cases), `"matches runs to filter chips"` (5 cases), and `"formats intervals"` (5 cases).
    
    How can I resolve this? If you propose a fix, please make it concise.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix All With AI
Fix the following 2 code review issues. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 2
packages/ui/src/features/scouts/components/ScoutDetailView.tsx:1423-1490
**Premature "No config found" message while configs load**

`useScoutConfigs()` is called without capturing `isLoading`, so `configs` is `undefined` during the initial fetch. `configs?.find(...)` returns `undefined`, and the "No config found for this scout on the current project." message renders immediately — before the request completes. Any user who navigates directly to a scout detail URL will see this false empty state for the duration of the network round trip.

The `isLoading` for `runsWindow` is captured and used to show a skeleton for the run list, but the same guard is missing for the config card above it.

### Issue 2 of 2
packages/core/src/scouts/scoutPresentation.test.ts:276-406
**Prefer parameterised tests for input/output tables**

Several tests accumulate multiple independent `expect` assertions that map inputs to outputs one-to-one. When one case fails, the test name doesn't tell you which input was the culprit, and adding a new case requires editing prose rather than adding a row. `it.each` (or Vitest's `test.each`) is the standard pattern for these tables.

Applies to: `"normalizes TaskRun statuses case-insensitively"` (5 cases), `"formats durations"` (4 cases), `"classifies each run into a single outcome"` (7 cases), `"labels outcomes with emitted counts"` (3 cases), `"matches runs to filter chips"` (5 cases), and `"formats intervals"` (5 cases).

Reviews (1): Last reviewed commit: "feat(scouts): run boxes recede quiet run..." | Re-trigger Greptile

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 51cbe5d60b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/ui/src/features/scouts/components/ScoutsFleetSection.tsx
Five typed events covering the scouts control pane: fleet viewed (incl
empty state), detail viewed, config changed (with old/new value and
rollback flag), chat started (which CTA, which scout), and a generic
scout action for the smaller interactions (run/emission expand, task-run
and skill links, run filter, settings gear, hide-disabled toggle).

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c458e526ce

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/ui/src/features/scouts/hooks/useScoutConfigMutations.ts Outdated
Comment thread packages/ui/src/features/scouts/components/ScoutSignalsSection.tsx Outdated
Comment thread packages/ui/src/features/scouts/components/ScoutDetailView.tsx Outdated
# Conflicts:
#	packages/ui/src/features/inbox/CLAUDE.md
@github-actions

github-actions Bot commented Jun 11, 2026

Copy link
Copy Markdown

React Doctor could not complete this scan.

No React dependency found in /tmp/react-doctor-baseline-cW5q1G/package.json. Add "react" to dependencies (or peerDependencies) and re-run.

Reviewed by React Doctor for commit d67eea3.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bde0f5c5ff

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/ui/src/features/scouts/components/ScoutDetailView.tsx Outdated
Comment thread packages/ui/src/features/scouts/hooks/useScoutConfigMutations.ts
Comment thread packages/ui/src/features/scouts/components/ScoutSignalsSection.tsx Outdated
@andrewm4894

Copy link
Copy Markdown
Member Author

Re the Greptile findings (posted as "Comments Outside Diff", so replying here):

  1. Premature "No config found" while configs load (ScoutDetailView.tsx) — agree, real bug. configsLoading is captured for the analytics gate but never gates the config card, so a direct navigation to a scout detail URL flashes the false empty state for the duration of the fetch. Will gate the card on configsLoading with a skeleton, matching the runs list.

  2. Parameterised tests in scoutPresentation.test.ts — agree; the repo testing conventions explicitly prefer it.each for input/output tables. Will convert the six listed table-shaped tests.

…fan-out

- Surface configs/runs query failures instead of rendering empty states
  (fleet section error strip with retry; detail-view config and runs
  error messages), so a missing scope or API outage is distinguishable
  from a genuinely quiet project.
- Gate the detail config card on configsLoading to stop the false
  "No config found" flash during the initial fetch.
- Scope optimistic-rollback to the failed config and invalidate the
  configs query when the last in-flight mutation settles, so concurrent
  edits cannot clobber each other or land stale server responses.
- Add a per-run fallback when an emissions query errors or returns
  empty, keeping the task-run link as the escape hatch.
- Cap initially rendered emitted runs at 10 with a "show more" expander
  (new show_more_emitted_runs scout action) to bound the per-run
  emissions query fan-out.
- Call out window truncation explicitly in the no-runs empty state.
- Convert table-shaped scoutPresentation tests to it.each.
@andrewm4894 andrewm4894 requested a review from a team June 11, 2026 10:54

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e27aba763

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +143 to +149
const { data: runsWindow } = useScoutRuns();
const { updateConfig } = useScoutConfigMutations();
const [hideDisabled, setHideDisabled] = useState(false);
useTrackFleetViewed(configs);

const runs = runsWindow?.runs;
const rollups = useMemo(() => computeScoutRollups(runs ?? []), [runs]);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Surface fleet run query failures

When the expanded fleet's runs request is still loading or rejects (for example a missing signal_scout:read scope, a 500, or endpoint rollout gap), this hook only reads data, so runsWindow stays undefined and the code computes rollups from []. The fleet then reports “none running now”, “0 signals emitted”, and empty run boxes under “last 24h” instead of indicating that run stats are unavailable; read isLoading/isError from useScoutRuns() and render a loading/error state before deriving zero-valued stats.

Useful? React with 👍 / 👎.

Just a bit. Courtesy of Claude. I verified all simplifications.
Comment on lines +106 to +109
// Fire the viewed event once per scout, after both queries settle so the
// config and run-window stats are real rather than loading-state zeros.
const viewTrackedFor = useRef<string | null>(null);
useEffect(() => {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember when LLMs were terrible at this kind of stuff. Today this implementation actually makes sense

De-LLM-ifying.
Comment on lines +27 to +35
if (
!RUN_INTERVAL_OPTIONS.some(
(option) => option.minutes === config.run_interval_minutes,
)
) {
options.push({
value: String(config.run_interval_minutes),
label: formatRunInterval(config.run_interval_minutes),
});

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd that the select option is last - feels like it'd be best to not funk around with this, and show them always in the same order

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants