feat(security-assessment): Phase 1b expansion + fp-reduction Stage 0/confidence (v2.2.0)#25
Merged
Merged
Conversation
…field to fp-reduction
Two enhancements to the FP-reduction rubric, surfaced from a competitive
analysis against Anthropic's Claude Code Security:
- Stage 0 devil's advocate: every disposition entry now carries
da_rationale (≥20 chars) and da_strong (bool) before Stages 1-5 run.
The pre-pass forces the agent to argue the strongest case AGAINST the
finding being real (framework protection, trusted caller, non-prod
context, rule pattern noise) - a strong DA argument sharpens Stage 1
reachability into a hypothesis test rather than open-ended search,
and a true_positive that explicitly refutes a counter-argument is
more trustworthy than one that never examined the counter-case.
- Confidence field: every disposition entry now carries a confidence
band (high | medium | low | null) derived from the
verdict × exploitability score table:
true_positive + score 7-10 → high
true_positive + score 0-6 → medium
likely_true_positive → medium
uncertain → low
likely_false_positive / false_positive → null
Confidence is consumed by exec-report-generator for the Section 1
dashboard column and Section 2 detail blocks. severity-floors.json
documents the bands as informational metadata.
Phase 1b now dispatches 5 agents in parallel (was 2). Three new opus
agents address gaps surfaced by competitive analysis and a NextGen
portfolio rerun:
- deep-code-reasoning: RECON surface-scoped freeform vulnerability
reasoning - bottom-up analysis at entry points / auth paths /
data-flow boundaries for novel context-dependent issues that static
rules cannot express (IDOR, confused deputy, TOCTOU across services,
indirect privilege escalation, workflow bypass). Minimum evidence
bar: ≥2 file:line citations per finding; only emits high/medium
confidence (no low-confidence noise).
- authorization-logic-review: top-down authorization architecture
review. Maps the intended access control model from route
decorators / permission constants / middleware, then verifies
consistent enforcement at controller / service / repository layers.
Catches design-intent vs. implementation gaps (auth at front door
but not at data-access layer, multi-tenancy filter inconsistency,
role escalation via mutable fields).
- recon-driven-scan: bridges Phase 0 RECON narrative claims to
concrete file:line evidence. Reads RECON's human-language risk
descriptions ("inverted-boolean TLS bypass", "RCE shape via Flee",
"header-driven SQL connection-string interpolation") and validates
each described risk has matching code via targeted grep. Includes a
28-pattern claim→search library with rule_id namespaces and CWE
assignments. Validated against the NextGen 2026-05-01 rerun: 12
repos previously scored zero-findings by SAST were re-scanned,
producing 75 confirmed findings (8 CRITICAL, 17 HIGH) with zero
false alarms - including 2 production SQL injections, an in-process
RCE shape via Flee+Dynamic LINQ, and an inverted-boolean TLS
bypass library-amplified across all consumer Lambdas.
All three new agents emit unified-finding-v1 directly (no adapter)
and append to memory/findings-<slug>.jsonl via jq after Phase 1b
completes. recon-driven-scan legitimately emits [] for repos with
empty/generic RECON narratives - this is not a failure.
Wiring updates: pipeline skill (artifacts table + Phase 1b agent
list), command (5-agent dispatch + parallelization rule),
exec-report-generator (agent → phase mapping), CLAUDE.md (agent
registry 11 → 12).
Manual changelog entry for the Phase 1b expansion. release-please will generate the canonical 2.2.0 entry from conventional commits when this lands on main; this entry serves as a working preview.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Expands Phase 1b of
/security-assessmentfrom 2 agents to 5, plus adds Stage 0 devil's-advocate reasoning and an explicitconfidencefield to fp-reduction. Closes detection gaps surfaced by (a) competitive analysis against Anthropic's Claude Code Security and (b) a NextGen portfolio rerun that found 75 real findings the original SAST-only Phase 1 missed.Changes
fp-reductionenhancements (commit 1)da_rationale(≥20 chars) +da_strong(bool) before Stages 1–5 run. Forces the agent to argue against the finding being real (framework protection, trusted caller, non-prod, rule pattern noise) and sharpens Stage 1 reachability into a hypothesis test.confidence: high | medium | low | nullderived fromverdict × exploitability score. Surfaced in exec-report Section 1 dashboard column and Section 2 detail blocks.Three new Phase 1b judgment agents (commit 2)
deep-code-reasoningauthorization-logic-reviewrecon-driven-scanAll three emit unified-finding-v1 directly; no adapter required.
recon-driven-scanlegitimately emits[]for repos with empty/generic RECON.Validation history
recon-driven-scanwas validated against the 2026-05-01 NextGen portfolio rerun (12 repos previously scored zero-findings by Phase 1 SAST):Wiring
Test plan
🤖 Generated with Claude Code