Skip to content

feat(metrics): deny→use conversion measurement + honest dark-metric visibility#21

Merged
sdsrss merged 1 commit into
mainfrom
feat/deny-conversion-measurement
Jun 9, 2026
Merged

feat(metrics): deny→use conversion measurement + honest dark-metric visibility#21
sdsrss merged 1 commit into
mainfrom
feat/deny-conversion-measurement

Conversation

@sdsrss

@sdsrss sdsrss commented Jun 9, 2026

Copy link
Copy Markdown
Owner

Two related batches on the recommend→use conversion metric. No published tool-schema (src/mcp/tools.rs) changes; all surfaces are diagnostic (stats, health-check, doctor) or internal (metrics, tests).

fix: make the conversion metric honest

The v0.39.0 recommend→use metric was producing zero usable data in this repo and was silently invisible.

  • Dark visible: stats (text + JSON recommendations.state) and health-check now surface absent | empty | live instead of silently skipping the block. You can no longer mistake "hooks not recording" for "feature absent".
  • Stop the test leak: resolve_project_root_from walks up to the nearest ancestor .git; plugin_e2e::spawn_server wrote only a Cargo.toml, so a fixture under the repo flushed test metrics (nonexistent_tool, dur_s:0) into the real usage.jsonl, corrupting the conversion denominator. Fix = test isolation (spawn_server drops a .git marker); prod resolve logic unchanged (subdir-CLI behavior is intended). Regression test test_stdio_metrics_isolated_from_ancestor_repo.
  • Reproduced deterministically; full test suite now leaves usage.jsonl byte-for-byte unchanged.

feat: measure whether the DENY stick actually converts

HINT-tier nudges convert ≈0; DENY (pre-grep-guide.js permissionDecision:'deny') is the only lever that mechanically changes tool selection — but it was never cleanly measured. This makes it measurable, without adding/broadening any intervention.

  • Per-session deny→use funnel (window-join): SessionMetrics stamps started_at; flush counts in-window deny/hint from recommendations.jsonl into the session's usage record ("recs":{deny,hint}, additive — no schema bump). stats prints Deny→use: M/N = X% (+ hint); JSON adds recommendations.funnel. "Used cg" = the 14 understanding tools (excludes housekeeping).
  • doctor.js stale-path detection: the absent-entry audit already existed (check Release binaries for macOS (darwin-arm64 and darwin-x64) #7); the gap was a present-but-stale-version hook path (e.g. 0.45.1 registered while 0.45.4 active) — it fires but may run pre-recordRecommendation code, keeping the metric dark. Added stale[] to surveyHookCoverage → routes to the existing re-register fix.

Follow-up (not in this PR)

  • Read the funnel on a real consumer project where DENY is live + real coding happens — that number decides whether to broaden DENY (gated) or stop on the stick.
  • Spec: docs/ADOPTION-DENY-MEASUREMENT-PLAN.md (gitignored, local).

Validation

cargo test all green (lib 476, cli_e2e 143, etc.); cargo +1.95.0 clippy --all-targets clean; node --test doctor.test.js lifecycle.test.js 33/33; full suite leaves usage.jsonl unchanged (no leak).

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Diagnostic tool now distinguishes between missing and stale hook entries
    • Recommendation metrics tracking with conversion funnel analysis added
  • Improvements

    • Stats command reports recommendation state (absent/empty/live)
    • Conversion funnel metrics now displayed in CLI and JSON outputs
    • Health-check output enhanced with recommendation tracking details
  • Tests

    • Added regression tests for recommendation state reporting
    • Improved test isolation for metrics collection

…isibility

fix: make the recommend→use conversion metric honest
- stats/health-check surface DARK state instead of silently skipping (absent|empty|live)
- stop plugin_e2e metrics leaking into the real usage.jsonl (spawn_server .git marker)
- purge of 8 historical test-leak entries done in gitignored data (not in this commit)

feat: measure whether the DENY stick actually converts
- per-session deny→use funnel via window-join (SessionMetrics.started_at + recs field)
- doctor.js stale-path detection (present-but-old-version hook → metric silently dark)
- spec: docs/ADOPTION-DENY-MEASUREMENT-PLAN.md (gitignored, local)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds hook coverage stale-entry detection to the Claude plugin doctor diagnostic, and implements recommendation event tracking with funnel analytics. Session metrics now capture timestamps and correlate recommendation events to session windows, while the CLI aggregates funnel counters to compute deny→use and hint→use conversion percentages, surfaced in both health-check and stats commands.

Changes

Hook Coverage Diagnostics

Layer / File(s) Summary
Hook coverage detection and reporting
claude-plugin/scripts/doctor.js
surveyHookCoverage() now compares desired vs. present hook commands to detect stale entries (command mismatches) in addition to missing entries. Hook coverage diagnostic reports ok only when both missing and stale arrays are empty; otherwise emits warn with fixId handling for each case. Function is exported from module.
Hook coverage unit tests
claude-plugin/scripts/doctor.test.js
New tests validate surveyHookCoverage behavior: clean coverage when settings match current hooks, stale detection when hook commands reference old plugin-cache paths, and missing detection with empty settings.

Recommendation Metrics & Funnel Tracking

Layer / File(s) Summary
Session metrics time window and recommendations
src/mcp/metrics.rs
SessionMetrics captures started_at ISO-8601 timestamp at construction. SessionMetrics::flush() reads recommendations.jsonl and counts PreToolUse deny/hint events within the session time window, emitting them in a recs payload. New count_recs_in_window() helper parses JSONL, applies lexicographic time filtering, and skips malformed entries. Unit test validates window filtering and edge cases.
Recommendation file state detection
src/cli.rs (lines 554–570, 698, 727–731)
New recommendation_metric_state() returns "live", "empty", or "absent" based on recommendations.jsonl presence and aggregated event count. cmd_health_check integrates this in both JSON and human output with descriptive state labels.
Funnel counter types and helpers
src/cli.rs (lines 772–846)
UsageSummary gains four funnel counter fields tracking sessions with deny/hint and code-graph tool usage. CG_QUERY_TOOLS allowlist defines code-graph query tools (excluding housekeeping). session_conversion() helper computes deny→use and hint→use percentages rounded to 2 decimals.
Funnel counter computation in usage aggregation
src/cli.rs (lines 887–905)
aggregate_usage_jsonl() loop parses recs.deny and recs.hint from session records, checks for allowlisted code-graph tool calls, and increments corresponding funnel counters.
Stats command recommendation and funnel output
src/cli.rs (lines 1015–1167, 4327–4346)
cmd_stats computes rec_state from file presence and aggregated total. JSON output adds state field and funnel object with deny→use and hint→use conversion percentages. Human output distinguishes dark/absent vs. empty vs. live states and prints funnel percentages. Unit test test_aggregate_funnel_deny_and_hint_to_use validates funnel attribution logic and confirms housekeeping tools are excluded.
CLI end-to-end tests for recommendations
tests/cli_e2e.rs (lines 736–800)
Three E2E tests verify recommendation states: absent (file missing), empty (file present but empty), and live with deny→use funnel attribution. Tests assert correct JSON and human output formatting for each state.
Plugin metrics isolation and fixture anchoring
tests/plugin_e2e.rs (lines 25–29, 339–378)
spawn_server fixture creates .git marker to anchor project-root resolution. New test_stdio_metrics_isolated_from_ancestor_repo verifies metrics are written to fixture's own .code-graph directory rather than ancestor repo.

Sequence Diagram

sequenceDiagram
  participant SessionMetrics
  participant recommendations.jsonl
  participant UsageSummary
  participant CliCmd as cmd_stats/cmd_health_check

  SessionMetrics->>SessionMetrics: initialize started_at
  SessionMetrics->>recommendations.jsonl: read at flush time
  recommendations.jsonl-->>SessionMetrics: deny/hint events with ts
  SessionMetrics->>SessionMetrics: filter events by time window
  SessionMetrics->>SessionMetrics: emit recs (deny, hint counts)

  UsageSummary->>UsageSummary: aggregate sessions from usage.jsonl
  UsageSummary->>UsageSummary: check recs.deny/recs.hint presence
  UsageSummary->>UsageSummary: check tools for code-graph allowlist
  UsageSummary->>UsageSummary: increment funnel counters
  
  CliCmd->>recommendations.jsonl: check state (absent/empty/live)
  CliCmd->>UsageSummary: read funnel counters
  CliCmd->>CliCmd: compute deny→use, hint→use percentages
  CliCmd->>CliCmd: format JSON and human output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through metrics tall,
Counting hooks both stale and small,
Recommendations flow with time,
Funnels tracked in perfect rhyme,
Isolated test files sing—
All these stats that data bring! 🐰📊

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly addresses the two main features introduced: deny→use conversion measurement and dark-metric visibility improvements, which are the core objectives of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/deny-conversion-measurement

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/cli.rs (1)

1074-1077: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

cg_tool_calls is counting every tool call, not just CG query tools.

summary.total_tool_calls() includes housekeeping and other non-CG tools, so this JSON field is mislabeled and the proxy ratio is inflated. The same issue also affects the human proxy line at Line 1143. Recompute both from CG_QUERY_TOOLS or rename the field back to total_tool_calls.

💡 Suggested fix
+        let cg_tool_calls: u64 = summary
+            .tools
+            .iter()
+            .filter(|(name, _)| CG_QUERY_TOOLS.contains(&name.as_str()))
+            .map(|(_, agg)| agg.n)
+            .sum();
         println!("{}", serde_json::json!({
@@
-                "cg_tool_calls": summary.total_tool_calls(),
+                "cg_tool_calls": cg_tool_calls,
                 "conversion_ratio": if recs.total > 0 {
-                    (summary.total_tool_calls() as f64 / recs.total as f64 * 100.0).round() / 100.0
+                    (cg_tool_calls as f64 / recs.total as f64 * 100.0).round() / 100.0
                 } else { 0.0 },
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli.rs` around lines 1074 - 1077, The JSON field "cg_tool_calls" and its
derived "conversion_ratio" are using summary.total_tool_calls(), which counts
all tools; instead either rename the field to "total_tool_calls" or (preferred)
recompute both values by summing only tools in the CG_QUERY_TOOLS set: iterate
over summary's per-tool counts (use the map/structure that holds tool counts in
summary) and sum counts for names in CG_QUERY_TOOLS to produce cg_tool_calls,
then compute conversion_ratio as (cg_tool_calls as f64 / recs.total as f64 *
100.0) rounded as before; apply the same change to the analogous human-proxy
calculation referenced by the human proxy code path so both use
CG_QUERY_TOOLS-filtered counts instead of summary.total_tool_calls().
🧹 Nitpick comments (1)
tests/cli_e2e.rs (1)

779-799: ⚡ Quick win

Consider adding test coverage for hint→use funnel.

The test verifies deny→use funnel attribution correctly. However, according to the stack context, UsageSummary tracks both deny→use and hint→use funnels (via sessions_with_hint and sessions_with_hint_and_cg fields), and the stats output should include both conversions. Currently, no test verifies the hint→use funnel path—both sessions in this test have hint:0.

🧪 Suggested test case for hint→use funnel

Add a test similar to this one but with sessions containing hint recommendations:

#[test]
fn test_cli_stats_hint_to_use_funnel() {
    let project = setup_indexed_project();
    let cg = project.path().join(code_graph_mcp::domain::CODE_GRAPH_DIR);
    // Two hint-sessions, one of which called a cg query tool → 1/2 = 50%.
    let s1 = "{\"ts\":\"2026-06-10T10:00:00Z\",\"v\":\"0.45.4\",\"tools\":{\"get_call_graph\":{\"n\":1,\"ms\":5,\"err\":0,\"max_ms\":5}},\"recs\":{\"deny\":0,\"hint\":1}}";
    let s2 = "{\"ts\":\"2026-06-10T11:00:00Z\",\"v\":\"0.45.4\",\"tools\":{},\"recs\":{\"deny\":0,\"hint\":1}}";
    std::fs::write(cg.join("usage.jsonl"), format!("{s1}\n{s2}\n")).unwrap();
    let (stdout, _, code) = run_cli(&project, &["stats"]);
    assert_eq!(code, 0);
    assert!(stdout.contains("Hint→use:") && stdout.contains("50%"),
        "stats must print the hint→use funnel; got: {stdout:?}");

    let (jstdout, _, jcode) = run_cli(&project, &["stats", "--json"]);
    assert_eq!(jcode, 0);
    let v: serde_json::Value = serde_json::from_str(jstdout.trim()).unwrap();
    let funnel = &v["recommendations"]["funnel"];
    assert_eq!(funnel["hint_sessions"], 2);
    assert_eq!(funnel["hint_then_cg"], 1);
    assert_eq!(funnel["hint_conversion"], 0.5);
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/cli_e2e.rs` around lines 779 - 799, Add a new test mirroring
test_cli_stats_deny_to_use_funnel to cover the hint→use funnel: create a fn
test_cli_stats_hint_to_use_funnel that writes two usage.jsonl sessions (both
with "recs":{"deny":0,"hint":1}, one including a get_call_graph tool) to the
code_graph_mcp::domain::CODE_GRAPH_DIR, call run_cli(&project, &["stats"]) and
assert stdout contains "Hint→use" and "50%", then call run_cli(&project,
&["stats","--json"]) and assert the JSON at v["recommendations"]["funnel"] has
hint_sessions == 2, hint_then_cg == 1, and hint_conversion == 0.5; use the same
helpers and patterns from test_cli_stats_deny_to_use_funnel
(setup_indexed_project, run_cli) to locate and implement the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@claude-plugin/scripts/doctor.js`:
- Around line 257-277: The loop currently marks entries as present when
isOurHookEntry(entry) is true even if entry.hooks[0].command is missing, which
lets empty/commandless registrations slip past the stale check; update the stale
calculation so a present key with no command is treated as stale by changing the
predicate in the stale computation to treat falsy presentCmd[k] as stale (e.g.
in the stale = expected.filter(...) expression, use present.has(k) &&
desiredCmd[k] && (!presentCmd[k] || presentCmd[k] !== desiredCmd[k])); this
keeps the isOurHookEntry / present set logic intact but ensures commandless
entries are reported as stale/missing.

In `@claude-plugin/scripts/doctor.test.js`:
- Around line 66-75: The test mutates a hook command using a hardcoded
'/scripts/' separator which fails on Windows; update the mutation in the test
(the line that replaces bash.hooks[0].command) to be separator-agnostic by
matching either POSIX or Windows separators (e.g. use a regex like
[\\/]+scripts[\\/] or build the replacement using path.sep) so the stale-path
branch exercised by surveyHookCoverage and settingsWithCurrentHooks still
triggers on all platforms.

In `@src/cli.rs`:
- Around line 1015-1022: The computed rec_state (based on recs,
rec_path.exists(), and aggregate_recommendations_jsonl) is never surfaced
because early returns (in functions producing stats) occur before any
recommendations payload is emitted; ensure the recommendations.state is always
included in the emitted stats payload even on cold-start/no-usage paths by
adding recommendations/state (using rec_state) to the stats or recommendations
object before any return path that currently omits it (references: rec_state,
recs, rec_path, aggregate_recommendations_jsonl, and the function producing
stats/recommendations); update the early-return branches to attach a minimal
recommendations object with state "absent"/"empty"/"live" so callers always
receive the field.
- Line 698: The JSON health-check returned by the no-index/onboarding path is
missing the conversion_metric field because the early-return branch (the block
that returns before the post-open path) doesn't include it; update that
early-return JSON construction in src/cli.rs to call
recommendation_metric_state(project_root) and include its result under the
"conversion_metric" key (the same value currently added in the post-open branch)
so both code paths produce identical fields for "conversion_metric".

In `@src/mcp/metrics.rs`:
- Around line 244-249: The current join only filters recommendations by
started_at; update the join to also bound by this session's end by passing the
session end timestamp into count_recs_in_window and enforcing started_at <= ts
<= ended_at. Concretely, change the call site that reads recommendations.jsonl
(the block using usage_path.parent() and calling count_recs_in_window(&content,
&self.started_at)) to pass self.ended_at (or a reference to it) as a second
argument, and update the count_recs_in_window function signature/logic to accept
the end timestamp and filter recommendation records so their ts is between
started_at and ended_at (inclusive) before counting deny/hint.

---

Outside diff comments:
In `@src/cli.rs`:
- Around line 1074-1077: The JSON field "cg_tool_calls" and its derived
"conversion_ratio" are using summary.total_tool_calls(), which counts all tools;
instead either rename the field to "total_tool_calls" or (preferred) recompute
both values by summing only tools in the CG_QUERY_TOOLS set: iterate over
summary's per-tool counts (use the map/structure that holds tool counts in
summary) and sum counts for names in CG_QUERY_TOOLS to produce cg_tool_calls,
then compute conversion_ratio as (cg_tool_calls as f64 / recs.total as f64 *
100.0) rounded as before; apply the same change to the analogous human-proxy
calculation referenced by the human proxy code path so both use
CG_QUERY_TOOLS-filtered counts instead of summary.total_tool_calls().

---

Nitpick comments:
In `@tests/cli_e2e.rs`:
- Around line 779-799: Add a new test mirroring
test_cli_stats_deny_to_use_funnel to cover the hint→use funnel: create a fn
test_cli_stats_hint_to_use_funnel that writes two usage.jsonl sessions (both
with "recs":{"deny":0,"hint":1}, one including a get_call_graph tool) to the
code_graph_mcp::domain::CODE_GRAPH_DIR, call run_cli(&project, &["stats"]) and
assert stdout contains "Hint→use" and "50%", then call run_cli(&project,
&["stats","--json"]) and assert the JSON at v["recommendations"]["funnel"] has
hint_sessions == 2, hint_then_cg == 1, and hint_conversion == 0.5; use the same
helpers and patterns from test_cli_stats_deny_to_use_funnel
(setup_indexed_project, run_cli) to locate and implement the test.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1c933d38-d5a4-414f-b787-679b3eefa1ad

📥 Commits

Reviewing files that changed from the base of the PR and between a838021 and fed40ab.

📒 Files selected for processing (6)
  • claude-plugin/scripts/doctor.js
  • claude-plugin/scripts/doctor.test.js
  • src/cli.rs
  • src/mcp/metrics.rs
  • tests/cli_e2e.rs
  • tests/plugin_e2e.rs

Comment on lines 257 to +277
if (isOurHookEntry(entry)) {
present.add(`${event}:${entry.matcher || '*'}`);
const key = `${event}:${entry.matcher || '*'}`;
present.add(key);
if (entry.hooks && entry.hooks[0] && entry.hooks[0].command) {
presentCmd[key] = entry.hooks[0].command;
}
}
}
}
}

const missing = expected.filter(k => !present.has(k));
return { expected, present: [...present], missing };
// Stale = present but the registered command no longer matches what we'd write
// now (points at an old plugin-cache version dir / moved path). A stale path can
// run pre-recordRecommendation hook code, so the hook fires but the conversion
// metric stays dark — invisible to a present/absent check. This is the
// 0.45.1-registered-while-0.45.4-active case the RCA surfaced.
const stale = expected.filter(k =>
present.has(k) && desiredCmd[k] && presentCmd[k] && presentCmd[k] !== desiredCmd[k]
);
return { expected, present: [...present], missing, stale };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't count commandless hook entries as covered.

isOurHookEntry() accepts entries by description alone, so an entry with the right (event, matcher) but an empty hooks array still reaches Line 259. From there it is marked present, but Line 275 drops it from stale because presentCmd[k] is falsy. That makes Lines 209-214 report coverage as OK even though Claude has nothing executable for that hook. Treat missing/empty commands as stale, or keep those entries out of present, so doctor doesn't false-negative broken registrations.

Suggested fix
       for (const entry of entries) {
         if (isOurHookEntry(entry)) {
           const key = `${event}:${entry.matcher || '*'}`;
           present.add(key);
-          if (entry.hooks && entry.hooks[0] && entry.hooks[0].command) {
-            presentCmd[key] = entry.hooks[0].command;
-          }
+          presentCmd[key] = entry.hooks && entry.hooks[0]
+            ? entry.hooks[0].command || null
+            : null;
         }
       }
     }
   }
@@
   const stale = expected.filter(k =>
-    present.has(k) && desiredCmd[k] && presentCmd[k] && presentCmd[k] !== desiredCmd[k]
+    present.has(k) && desiredCmd[k] && presentCmd[k] !== desiredCmd[k]
   );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (isOurHookEntry(entry)) {
present.add(`${event}:${entry.matcher || '*'}`);
const key = `${event}:${entry.matcher || '*'}`;
present.add(key);
if (entry.hooks && entry.hooks[0] && entry.hooks[0].command) {
presentCmd[key] = entry.hooks[0].command;
}
}
}
}
}
const missing = expected.filter(k => !present.has(k));
return { expected, present: [...present], missing };
// Stale = present but the registered command no longer matches what we'd write
// now (points at an old plugin-cache version dir / moved path). A stale path can
// run pre-recordRecommendation hook code, so the hook fires but the conversion
// metric stays dark — invisible to a present/absent check. This is the
// 0.45.1-registered-while-0.45.4-active case the RCA surfaced.
const stale = expected.filter(k =>
present.has(k) && desiredCmd[k] && presentCmd[k] && presentCmd[k] !== desiredCmd[k]
);
return { expected, present: [...present], missing, stale };
if (isOurHookEntry(entry)) {
const key = `${event}:${entry.matcher || '*'}`;
present.add(key);
presentCmd[key] = entry.hooks && entry.hooks[0]
? entry.hooks[0].command || null
: null;
}
}
}
}
const missing = expected.filter(k => !present.has(k));
// Stale = present but the registered command no longer matches what we'd write
// now (points at an old plugin-cache version dir / moved path). A stale path can
// run pre-recordRecommendation hook code, so the hook fires but the conversion
// metric stays dark — invisible to a present/absent check. This is the
// 0.45.1-registered-while-0.45.4-active case the RCA surfaced.
const stale = expected.filter(k =>
present.has(k) && desiredCmd[k] && presentCmd[k] !== desiredCmd[k]
);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude-plugin/scripts/doctor.js` around lines 257 - 277, The loop currently
marks entries as present when isOurHookEntry(entry) is true even if
entry.hooks[0].command is missing, which lets empty/commandless registrations
slip past the stale check; update the stale calculation so a present key with no
command is treated as stale by changing the predicate in the stale computation
to treat falsy presentCmd[k] as stale (e.g. in the stale = expected.filter(...)
expression, use present.has(k) && desiredCmd[k] && (!presentCmd[k] ||
presentCmd[k] !== desiredCmd[k])); this keeps the isOurHookEntry / present set
logic intact but ensures commandless entries are reported as stale/missing.

Comment on lines +66 to +75
test('surveyHookCoverage flags a present-but-stale hook path', () => {
const settings = settingsWithCurrentHooks();
// Repoint one PreToolUse entry at an old plugin-cache version dir — present,
// recognized as ours (description unchanged), but command no longer current.
const bash = settings.hooks.PreToolUse.find(e => e.matcher === 'Bash');
bash.hooks[0].command = bash.hooks[0].command.replace('/scripts/', '/0.0.1-old/scripts/');
const cov = surveyHookCoverage(settings);
assert.equal(cov.missing.length, 0, 'entry is present, not missing');
assert.ok(cov.stale.includes('PreToolUse:Bash'),
`stale Bash path should be flagged; got stale=${JSON.stringify(cov.stale)}`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Make the stale-path rewrite separator-agnostic.

Line 71 hardcodes '/scripts/', but buildSettingsHookEntries() builds these commands with path.join(...). On Windows that leaves \scripts\ untouched, so this test stops creating a stale command and no longer covers the branch it claims to validate. Use path.sep or a [\\/]scripts[\\/] regex here.

Suggested fix
-  bash.hooks[0].command = bash.hooks[0].command.replace('/scripts/', '/0.0.1-old/scripts/');
+  bash.hooks[0].command = bash.hooks[0].command.replace(
+    /([\\/])scripts\1/,
+    (_, sep) => `${sep}0.0.1-old${sep}scripts${sep}`,
+  );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
test('surveyHookCoverage flags a present-but-stale hook path', () => {
const settings = settingsWithCurrentHooks();
// Repoint one PreToolUse entry at an old plugin-cache version dir — present,
// recognized as ours (description unchanged), but command no longer current.
const bash = settings.hooks.PreToolUse.find(e => e.matcher === 'Bash');
bash.hooks[0].command = bash.hooks[0].command.replace('/scripts/', '/0.0.1-old/scripts/');
const cov = surveyHookCoverage(settings);
assert.equal(cov.missing.length, 0, 'entry is present, not missing');
assert.ok(cov.stale.includes('PreToolUse:Bash'),
`stale Bash path should be flagged; got stale=${JSON.stringify(cov.stale)}`);
test('surveyHookCoverage flags a present-but-stale hook path', () => {
const settings = settingsWithCurrentHooks();
// Repoint one PreToolUse entry at an old plugin-cache version dir — present,
// recognized as ours (description unchanged), but command no longer current.
const bash = settings.hooks.PreToolUse.find(e => e.matcher === 'Bash');
bash.hooks[0].command = bash.hooks[0].command.replace(
/([\\/])scripts\1/,
(_, sep) => `${sep}0.0.1-old${sep}scripts${sep}`,
);
const cov = surveyHookCoverage(settings);
assert.equal(cov.missing.length, 0, 'entry is present, not missing');
assert.ok(cov.stale.includes('PreToolUse:Bash'),
`stale Bash path should be flagged; got stale=${JSON.stringify(cov.stale)}`);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@claude-plugin/scripts/doctor.test.js` around lines 66 - 75, The test mutates
a hook command using a hardcoded '/scripts/' separator which fails on Windows;
update the mutation in the test (the line that replaces bash.hooks[0].command)
to be separator-agnostic by matching either POSIX or Windows separators (e.g.
use a regex like [\\/]+scripts[\\/] or build the replacement using path.sep) so
the stale-path branch exercised by surveyHookCoverage and
settingsWithCurrentHooks still triggers on all platforms.

Comment thread src/cli.rs
"embedding_status": embedding_status,
"model_available": model_available,
"snapshot": snapshot_block,
"conversion_metric": recommendation_metric_state(project_root),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

The no-index JSON health-check still omits conversion_metric.

This field is only added in the post-open branch. When index.db is missing, Line 576 through Line 596 return earlier, so JSON callers lose the new state exactly on the onboarding/no-index path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli.rs` at line 698, The JSON health-check returned by the
no-index/onboarding path is missing the conversion_metric field because the
early-return branch (the block that returns before the post-open path) doesn't
include it; update that early-return JSON construction in src/cli.rs to call
recommendation_metric_state(project_root) and include its result under the
"conversion_metric" key (the same value currently added in the post-open branch)
so both code paths produce identical fields for "conversion_metric".

Comment thread src/cli.rs
Comment on lines +1015 to +1022
let rec_exists = rec_path.exists();
let recs = std::fs::read_to_string(&rec_path).ok()
.map(|c| aggregate_recommendations_jsonl(&c))
.unwrap_or_default();
// Recording-side state of the conversion metric, made explicit so a dark
// metric (file absent → PreToolUse hooks not recording here) is never
// silently indistinguishable from "feature absent" or "no data yet".
let rec_state = if recs.total > 0 { "live" } else if rec_exists { "empty" } else { "absent" };

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

stats still hides recommendations.state on the no-usage paths.

rec_state is computed here, but Line 994 through Line 1005 and Line 1024 through Line 1031 both return before any recommendations payload is emitted. That makes absent vs empty vs live invisible in the cold-start cases this diagnostic is supposed to explain.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/cli.rs` around lines 1015 - 1022, The computed rec_state (based on recs,
rec_path.exists(), and aggregate_recommendations_jsonl) is never surfaced
because early returns (in functions producing stats) occur before any
recommendations payload is emitted; ensure the recommendations.state is always
included in the emitted stats payload even on cold-start/no-usage paths by
adding recommendations/state (using rec_state) to the stats or recommendations
object before any return path that currently omits it (references: rec_state,
recs, rec_path, aggregate_recommendations_jsonl, and the function producing
stats/recommendations); update the early-return branches to attach a minimal
recommendations object with state "absent"/"empty"/"live" so callers always
receive the field.

Comment thread src/mcp/metrics.rs
Comment on lines +244 to +249
if let Some(dir) = usage_path.parent() {
if let Ok(content) = std::fs::read_to_string(dir.join("recommendations.jsonl")) {
let (deny, hint) = count_recs_in_window(&content, &self.started_at);
if deny > 0 || hint > 0 {
record["recs"] = serde_json::json!({ "deny": deny, "hint": hint });
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Bound the recommendation join by session end.

This only constrains ts >= started_at. If two MCP sessions share a repo, recommendations emitted by a later overlapping session can be attributed to whichever earlier session flushes afterward, which writes inflated recs into usage.jsonl and skews the deny/hint funnel. Pass this session’s end timestamp into the helper and require started_at <= ts <= ended_at.

💡 Suggested fix
-                let (deny, hint) = count_recs_in_window(&content, &self.started_at);
+                let ended_at = record
+                    .get("ts")
+                    .and_then(|v| v.as_str())
+                    .unwrap_or(&self.started_at)
+                    .to_owned();
+                let (deny, hint) =
+                    count_recs_in_window(&content, &self.started_at, &ended_at);
                 if deny > 0 || hint > 0 {
                     record["recs"] = serde_json::json!({ "deny": deny, "hint": hint });
                 }
             }
         }
@@
-pub(crate) fn count_recs_in_window(rec_content: &str, started_at: &str) -> (u64, u64) {
+pub(crate) fn count_recs_in_window(
+    rec_content: &str,
+    started_at: &str,
+    ended_at: &str,
+) -> (u64, u64) {
@@
-        if ts < started_at {
+        if ts < started_at || ts > ended_at {
             continue;
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if let Some(dir) = usage_path.parent() {
if let Ok(content) = std::fs::read_to_string(dir.join("recommendations.jsonl")) {
let (deny, hint) = count_recs_in_window(&content, &self.started_at);
if deny > 0 || hint > 0 {
record["recs"] = serde_json::json!({ "deny": deny, "hint": hint });
}
if let Some(dir) = usage_path.parent() {
if let Ok(content) = std::fs::read_to_string(dir.join("recommendations.jsonl")) {
let ended_at = record
.get("ts")
.and_then(|v| v.as_str())
.unwrap_or(&self.started_at)
.to_owned();
let (deny, hint) =
count_recs_in_window(&content, &self.started_at, &ended_at);
if deny > 0 || hint > 0 {
record["recs"] = serde_json::json!({ "deny": deny, "hint": hint });
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/mcp/metrics.rs` around lines 244 - 249, The current join only filters
recommendations by started_at; update the join to also bound by this session's
end by passing the session end timestamp into count_recs_in_window and enforcing
started_at <= ts <= ended_at. Concretely, change the call site that reads
recommendations.jsonl (the block using usage_path.parent() and calling
count_recs_in_window(&content, &self.started_at)) to pass self.ended_at (or a
reference to it) as a second argument, and update the count_recs_in_window
function signature/logic to accept the end timestamp and filter recommendation
records so their ts is between started_at and ended_at (inclusive) before
counting deny/hint.

@sdsrss sdsrss merged commit 78ced21 into main Jun 9, 2026
9 checks passed
@sdsrss sdsrss deleted the feat/deny-conversion-measurement branch June 9, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant