From 89b34e15b734ef7a2eabbdac7bb3c89790a0db9c Mon Sep 17 00:00:00 2001
From: Bryan Finster <bryan.finster@gmail.com>
Date: Fri, 1 May 2026 13:31:03 -0500
Subject: [PATCH 1/3] feat(security-assessment): add Stage 0 devil's advocate +
 confidence field to fp-reduction
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Two enhancements to the FP-reduction rubric, surfaced from a competitive
analysis against Anthropic's Claude Code Security:

- Stage 0 devil's advocate: every disposition entry now carries
  da_rationale (≥20 chars) and da_strong (bool) before Stages 1-5 run.
  The pre-pass forces the agent to argue the strongest case AGAINST the
  finding being real (framework protection, trusted caller, non-prod
  context, rule pattern noise) - a strong DA argument sharpens Stage 1
  reachability into a hypothesis test rather than open-ended search,
  and a true_positive that explicitly refutes a counter-argument is
  more trustworthy than one that never examined the counter-case.

- Confidence field: every disposition entry now carries a confidence
  band (high | medium | low | null) derived from the
  verdict × exploitability score table:
    true_positive + score 7-10 → high
    true_positive + score 0-6 → medium
    likely_true_positive       → medium
    uncertain                  → low
    likely_false_positive / false_positive → null
  Confidence is consumed by exec-report-generator for the Section 1
  dashboard column and Section 2 detail blocks. severity-floors.json
  documents the bands as informational metadata.
---
 .../agents/fp-reduction.md                    | 37 ++++++++++++++++++-
 .../knowledge/severity-floors.json            | 35 +++++++++++++++++-
 .../skills/false-positive-reduction/SKILL.md  | 25 ++++++++++++-
 3 files changed, 92 insertions(+), 5 deletions(-)

diff --git a/plugins/agentic-security-assessment/agents/fp-reduction.md b/plugins/agentic-security-assessment/agents/fp-reduction.md
index 99e73c4..0713e4d 100644
--- a/plugins/agentic-security-assessment/agents/fp-reduction.md
+++ b/plugins/agentic-security-assessment/agents/fp-reduction.md
@@ -37,7 +37,22 @@ Run `command -v joern` (or its alias `joern-parse`). Set `register.reachability_
 
 If joern is present, invoke `tools/reachability.sh` to build or load the CPG. The helper returns a path to a JSON export of the CFG that Stage 1 queries.
 
-### 2. For each finding, apply stages 1–5 in order
+### 2. For each finding, apply stages 0–5 in order
+
+**Stage 0 — Devil's advocate.** Before running the structured rubric, generate the strongest argument that this finding is NOT a real vulnerability:
+
+- **Framework protection**: does the language/framework have a built-in prevention for this class? (e.g. ORM parameterization eliminates SQL injection at the repo layer; templating engines auto-escape XSS; TLS termination at the load balancer makes `verify=False` on internal-only calls a much narrower risk)
+- **Trusted caller**: is this code only reachable from a trusted internal caller, admin-only CLI, or test harness — never from an untrusted HTTP path?
+- **Non-production context**: is the file a migration, seed script, test fixture, or one-time utility that RECON's `entry_points` do not include?
+- **Rule pattern noise**: does this rule commonly fire on intentional, non-exploitable configurations (e.g. `node-tls-reject-unauthorized` on a local development server, `hardcoded-password` on a well-known public default)?
+
+Record the counter-argument (min 20 chars) in `da_rationale`. If the argument is strong enough that Stage 1 reachability analysis is likely to confirm the path is dead or test-only, set `da_strong: true`.
+
+**The devil's advocate does NOT change the verdict.** Stages 1–5 run regardless. What it changes is *how* Stage 1 operates and what appears in the audit trail:
+
+- `da_strong: true` → Stage 1 *tests the DA hypothesis* (is the path actually dead / test-only?) rather than performing an open-ended search. This sharpens the rationale and accelerates high-volume runs.
+- `da_strong: true` confirmed by Stage 1 (path dead or test-only) → `false_positive` with both the DA argument and the reachability evidence cited. The analyst sees a well-reasoned dismissal, not a silent discard.
+- `da_strong: true` *disproved* by Stage 1 (path is reachable) → the rejected DA argument appears in the final rationale. A `true_positive` that explicitly refuted a counter-argument is more trustworthy than one that never examined the counter-case.
 
 **Stage 1 — Reachability.** Populate `reachability.reachable` (bool) and `reachability.rationale` (min 20 chars). Set `reachability_source` per the detection mode (`joern-cpg` or `llm-fallback`).
 
@@ -109,6 +124,21 @@ Map the combined reachability + environment + control + scoring into one of:
 | test-only path OR strong in-repo control + score < 2 | `likely_false_positive` |
 | dead code OR schema-invalid finding | `false_positive` |
 
+### 4b. Assign confidence
+
+After assigning the verdict, derive a `confidence` field. This is a first-class output field on the disposition entry — consumed by the exec-report-generator for the dashboard Confidence column and Section 2 detail blocks.
+
+| Verdict | Exploitability score | Confidence |
+|---|---|---|
+| `true_positive` | 7–10 | `"high"` |
+| `true_positive` | 0–6 | `"medium"` |
+| `likely_true_positive` | any | `"medium"` |
+| `uncertain` | any | `"low"` |
+| `likely_false_positive` | any | `null` |
+| `false_positive` | any | `null` |
+
+`likely_false_positive` and `false_positive` entries are not surfaced in the exec report; they do not require a meaningful confidence value.
+
 ### 5. Emit
 
 Write both artifacts atomically (JSON validates against schema first, then MD writes — if JSON schema validation fails, abort without writing either).
@@ -132,6 +162,7 @@ Write both artifacts atomically (JSON validates against schema first, then MD wr
         "metadata": { "source": "semgrep", "confidence": "high" }
       },
       "verdict": "true_positive",
+      "confidence": "high",
       "reachability": {
         "reachable": true,
         "rationale": "Reached by HTTP handler /api/users via route -> service.getUser -> repo.find."
@@ -150,7 +181,7 @@ Write both artifacts atomically (JSON validates against schema first, then MD wr
 
 **Schema contract (enforced by `plugins/agentic-dev-team/knowledge/schemas/disposition-register-v1.json`):**
 
-- Each entry MUST contain `finding`, `verdict`, `reachability`, `reachability_source`, `exploitability`, `dispositioner`, `dispositioned_at`.
+- Each entry MUST contain `finding`, `verdict`, `confidence`, `reachability`, `reachability_source`, `exploitability`, `dispositioner`, `dispositioned_at`.
 - The `finding` sub-object MUST carry the full unified finding envelope at least `rule_id`, `file`, `line`, `severity`, `message`, `metadata`. Downstream consumers (`exec-report-generator`, `compliance-mapping`, `score.py`) access these as `entry.finding.<field>`.
 - A flat shape (with `rule_id`/`file`/`line` at the entry top level instead of nested) is schema-invalid and breaks downstream scorers and report generators. Always nest.
 - `reachability.rationale` and `exploitability.rationale` MUST each be ≥ 20 chars.
@@ -164,6 +195,8 @@ Before writing, validate the assembled object against the schema. If any require
 - One input finding → exactly one output entry. No dropping.
 - Every rationale ≥ 20 chars. No single-word justifications.
 - `reachability_source` is set on every entry. Register-level `reachability_tool` defaults, entries may override (mixed mode is allowed if some findings have CPG reachability and others fall back).
+- `confidence` is set on every entry per the verdict × score table in § 4b. `null` is permitted only for `likely_false_positive` and `false_positive` verdicts.
+- `da_rationale` is set on every entry (the Stage 0 counter-argument). `da_strong` is `true` or `false` on every entry.
 - If `reachability_source == "llm-fallback"` appears anywhere, the exec-report-generator will emit its fallback banner — this agent does not emit it directly.
 
 ## Handoff
diff --git a/plugins/agentic-security-assessment/knowledge/severity-floors.json b/plugins/agentic-security-assessment/knowledge/severity-floors.json
index 4da0878..dbb47cf 100644
--- a/plugins/agentic-security-assessment/knowledge/severity-floors.json
+++ b/plugins/agentic-security-assessment/knowledge/severity-floors.json
@@ -27,5 +27,38 @@
       "canonical_floor": 7,
       "rationale": "Administrative, management, or diagnostic endpoint reachable without authentication. Floor 7 regardless of the specific action exposed — any unauth admin surface is a pivot point."
     }
-  ]
+  ],
+  "confidence_bands": {
+    "description": "Informational mapping from (verdict, exploitability_score) to confidence label. Used by the fp-reduction agent when emitting the confidence field on disposition entries. NOT consulted by scripts/apply-severity-floors.sh — documentation only.",
+    "bands": [
+      {
+        "verdict": "true_positive",
+        "score_min": 7,
+        "score_max": 10,
+        "confidence": "high",
+        "note": "Reachable, no mitigation, high exploitability — analyst should prioritize"
+      },
+      {
+        "verdict": "true_positive",
+        "score_min": 0,
+        "score_max": 6,
+        "confidence": "medium",
+        "note": "Reachable but partial mitigation or lower exploitability"
+      },
+      {
+        "verdict": "likely_true_positive",
+        "score_min": 0,
+        "score_max": 10,
+        "confidence": "medium",
+        "note": "Reachable with compensating controls or partial evidence"
+      },
+      {
+        "verdict": "uncertain",
+        "score_min": 0,
+        "score_max": 10,
+        "confidence": "low",
+        "note": "Reachable path with strong mitigation, or insufficient evidence to confirm"
+      }
+    ]
+  }
 }
diff --git a/plugins/agentic-security-assessment/skills/false-positive-reduction/SKILL.md b/plugins/agentic-security-assessment/skills/false-positive-reduction/SKILL.md
index 7cc390c..1b1c04d 100644
--- a/plugins/agentic-security-assessment/skills/false-positive-reduction/SKILL.md
+++ b/plugins/agentic-security-assessment/skills/false-positive-reduction/SKILL.md
@@ -18,9 +18,30 @@ Transform a stream of unified findings into a disposition register that the exec
 
 The skill's job is to remove noise without suppressing real issues. False positives waste analyst attention; missed true positives get someone fired.
 
-## Five-stage rubric (applied in order; each stage can downgrade severity or change verdict)
+## Six-stage rubric (applied in order; each stage can downgrade severity or change verdict)
 
-Lifted from the `opus_repo_scan_test` reference's § analyze-11 framework with extensions for the disposition-register output format.
+Lifted from the `opus_repo_scan_test` reference's § analyze-11 framework with extensions for the disposition-register output format. Stage 0 is new: a self-adversarial pre-pass that sharpens Stage 1 and strengthens the audit trail.
+
+### Stage 0 — Devil's advocate
+
+**Question**: What is the strongest argument that this finding is NOT a vulnerability?
+
+The agent generates a counter-argument before applying the rubric. This is not a skip gate — all five subsequent stages still run. The purpose is twofold:
+
+1. **Sharpen Stage 1**: a strong counter-argument gives Stage 1 a concrete hypothesis to test (is the path actually dead / test-only?) rather than an open-ended search.
+2. **Strengthen the audit trail**: a `true_positive` that explicitly refuted a counter-argument is more trustworthy than one that never examined the counter-case. A well-reasoned `false_positive` is more trustworthy than a silent discard.
+
+Counter-argument prompts:
+- **Framework/runtime protection**: does the tech stack have a built-in prevention for this class (ORM parameterization, template auto-escaping, TLS termination at the LB)?
+- **Trusted caller**: is this code only reachable from internal, trusted, or admin-only paths?
+- **Non-production context**: is the file a migration, test fixture, seed script, or utility that RECON's `entry_points` don't include?
+- **Rule pattern noise**: does this rule commonly fire on intentional non-exploitable configurations?
+
+Disposition rules:
+- Strong counter-argument → `da_strong: true`; Stage 1 tests the hypothesis
+- Weak / no counter-argument → `da_strong: false`; Stage 1 performs open-ended reachability search
+- `da_strong: true` + Stage 1 confirms (unreachable) → `false_positive`; both arguments cited in rationale
+- `da_strong: true` + Stage 1 disproves (reachable) → rejected counter-argument cited in `true_positive` rationale
 
 ### Stage 1 — Reachability
 

From 9dad5ee6f1adabf7d376b12b02969f8cf291f2f8 Mon Sep 17 00:00:00 2001
From: Bryan Finster <bryan.finster@gmail.com>
Date: Fri, 1 May 2026 13:31:26 -0500
Subject: [PATCH 2/3] feat(security-assessment): expand Phase 1b with 3 new
 judgment agents
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Phase 1b now dispatches 5 agents in parallel (was 2). Three new opus
agents address gaps surfaced by competitive analysis and a NextGen
portfolio rerun:

- deep-code-reasoning: RECON surface-scoped freeform vulnerability
  reasoning - bottom-up analysis at entry points / auth paths /
  data-flow boundaries for novel context-dependent issues that static
  rules cannot express (IDOR, confused deputy, TOCTOU across services,
  indirect privilege escalation, workflow bypass). Minimum evidence
  bar: ≥2 file:line citations per finding; only emits high/medium
  confidence (no low-confidence noise).

- authorization-logic-review: top-down authorization architecture
  review. Maps the intended access control model from route
  decorators / permission constants / middleware, then verifies
  consistent enforcement at controller / service / repository layers.
  Catches design-intent vs. implementation gaps (auth at front door
  but not at data-access layer, multi-tenancy filter inconsistency,
  role escalation via mutable fields).

- recon-driven-scan: bridges Phase 0 RECON narrative claims to
  concrete file:line evidence. Reads RECON's human-language risk
  descriptions ("inverted-boolean TLS bypass", "RCE shape via Flee",
  "header-driven SQL connection-string interpolation") and validates
  each described risk has matching code via targeted grep. Includes a
  28-pattern claim→search library with rule_id namespaces and CWE
  assignments. Validated against the NextGen 2026-05-01 rerun: 12
  repos previously scored zero-findings by SAST were re-scanned,
  producing 75 confirmed findings (8 CRITICAL, 17 HIGH) with zero
  false alarms - including 2 production SQL injections, an in-process
  RCE shape via Flee+Dynamic LINQ, and an inverted-boolean TLS
  bypass library-amplified across all consumer Lambdas.

All three new agents emit unified-finding-v1 directly (no adapter)
and append to memory/findings-<slug>.jsonl via jq after Phase 1b
completes. recon-driven-scan legitimately emits [] for repos with
empty/generic RECON narratives - this is not a failure.

Wiring updates: pipeline skill (artifacts table + Phase 1b agent
list), command (5-agent dispatch + parallelization rule),
exec-report-generator (agent → phase mapping), CLAUDE.md (agent
registry 11 → 12).
---
 plugins/agentic-security-assessment/CLAUDE.md |   9 +-
 .../agents/authorization-logic-review.md      | 146 ++++++++++++++++
 .../agents/deep-code-reasoning.md             | 157 ++++++++++++++++++
 .../agents/exec-report-generator.md           |   7 +-
 .../agents/recon-driven-scan.md               | 152 +++++++++++++++++
 .../commands/security-assessment.md           |  23 ++-
 .../security-assessment-pipeline/SKILL.md     |  35 ++--
 7 files changed, 509 insertions(+), 20 deletions(-)
 create mode 100644 plugins/agentic-security-assessment/agents/authorization-logic-review.md
 create mode 100644 plugins/agentic-security-assessment/agents/deep-code-reasoning.md
 create mode 100644 plugins/agentic-security-assessment/agents/recon-driven-scan.md

diff --git a/plugins/agentic-security-assessment/CLAUDE.md b/plugins/agentic-security-assessment/CLAUDE.md
index 1efb2ef..a04f358 100644
--- a/plugins/agentic-security-assessment/CLAUDE.md
+++ b/plugins/agentic-security-assessment/CLAUDE.md
@@ -82,11 +82,14 @@ See `install.sh`. It performs four checks:
 | `/redteam-model <target>` | orchestrator | Adversarial ML red-team probes against a self-owned target |
 | `/export-pdf <report.md>` | worker | PDF export via pandoc/weasyprint |
 
-**Agents** (9 opus):
-- `fp-reduction` (opus) — 5-stage FP-reduction rubric; disposition register
+**Agents** (12 opus):
+- `fp-reduction` (opus) — 6-stage FP-reduction rubric (Stage 0 devil's advocate + Stages 1–5); disposition register with confidence field
 - `business-logic-domain-review` (opus) — fraud-domain anti-patterns
+- `deep-code-reasoning` (opus) — RECON surface-scoped freeform vulnerability reasoning; novel context-dependent issues beyond static rules
+- `authorization-logic-review` (opus) — top-down authorization architecture review; policy declaration vs. enforcement gaps, multi-tenancy isolation
+- `recon-driven-scan` (opus) — bridges RECON narrative claims to concrete file:line evidence; finds patterns SAST cannot express (inverted-boolean TLS defaults, RCE shapes via expression libraries, header-driven SQL, body-trusted IDOR)
 - `cross-repo-synthesizer` (opus) — named attack chains across repos
-- `exec-report-generator` (opus) — publication-ready executive report
+- `exec-report-generator` (opus) — publication-ready executive report with Confidence column
 - `redteam-recon-analyzer` (opus) — interpretation of probe 01
 - `redteam-evasion-analyzer` (opus) — interpretation of probes 03/04/05
 - `redteam-extraction-analyzer` (opus) — interpretation of probe 07
diff --git a/plugins/agentic-security-assessment/agents/authorization-logic-review.md b/plugins/agentic-security-assessment/agents/authorization-logic-review.md
new file mode 100644
index 0000000..3c7b1df
--- /dev/null
+++ b/plugins/agentic-security-assessment/agents/authorization-logic-review.md
@@ -0,0 +1,146 @@
+---
+name: authorization-logic-review
+description: Top-down authorization architecture review. Maps the intended access control model (RBAC, ABAC, ACL, tenancy isolation) from route decorators, middleware, and permission constants, then verifies consistent enforcement at every layer — controller, service, repository, and cross-tenant data access. Catches design-intent vs. implementation gaps that surface-scoped bottom-up analysis misses. Phase 1b peer agent; emits unified-finding-v1 tagged source:"llm-reasoning".
+tools: Read, Grep, Glob
+model: opus
+---
+
+## Thinking Guidance
+
+Think carefully and step-by-step. Authorization bugs are often structural — they arise from a consistent policy that is not consistently enforced. Map the policy first; then verify enforcement. Do not report single suspicious lines; report gaps between stated policy and observed implementation.
+
+# Authorization Logic Review Agent
+
+## Purpose
+
+Complement `deep-code-reasoning` (which reasons bottom-up from suspicious code to vulnerabilities) with a top-down approach: identify what the application's authorization model is *supposed to do*, then check whether the implementation actually does it everywhere.
+
+The most common authorization failures are not "no auth at all" (Semgrep catches those) but "auth enforced at the front door, not at the back rooms" — controller-layer checks that are missing at the service or data-access layer, or tenancy filters applied inconsistently across queries.
+
+## Inputs
+
+1. Target repo files — read on demand via RECON scoping or grep-driven discovery
+2. `memory/recon-<slug>.json` — RECON artifact (for entry points and security surface)
+
+## Outputs
+
+- `memory/authz-review-<slug>.json` — JSON array of unified findings conforming to unified-finding-v1, appended to `memory/findings-<slug>.jsonl` by the Phase 1b orchestration step via `jq -c '.[]'`
+
+## Procedure
+
+### 1. Map the authorization model
+
+Discover how the application declares and enforces access control. Read:
+
+- Route definitions and their decorators / middleware annotations (`@require_auth`, `@roles_allowed`, `[Authorize(Roles=...)]`, `router.use(authMiddleware)`, etc.)
+- Permission constants and role definitions (files named `permissions.py`, `roles.js`, `AuthorizationPolicy.cs`, `scopes.go`, etc.)
+- Middleware stacks (express middleware chain, Django middleware list, ASP.NET Core pipeline, etc.)
+- Tenancy models: multi-tenant indicators (`tenant_id`, `organization_id`, `account_id` in models or query builders)
+
+Classify the model as one of: RBAC (role-based), ABAC (attribute-based), ACL (per-resource), tenancy-scoped, or mixed. Note which pattern predominates and where it is declared.
+
+### 2. Identify the enforcement points
+
+For each route or operation class, note where authorization is enforced:
+- **Controller / handler layer**: checked before business logic runs
+- **Service layer**: checked inside the business logic function
+- **Repository / data-access layer**: enforced in the query (e.g. `.where(tenant_id=current_tenant)`)
+- **Not found**: no enforcement located for this operation
+
+The goal is a coverage map: {operation → enforcement location}. Gaps in this map are findings.
+
+### 3. Check consistency of tenant isolation
+
+If a tenancy model is present:
+- Grep for direct object-load patterns that could return cross-tenant data: `findById`, `getById`, `SELECT ... WHERE id = ?` without a tenant filter
+- Check whether the ORM's base query builder or repository base class enforces tenancy (a global scope or base class filter is fine; ad-hoc per-query is risky)
+- Look for admin or superuser paths that bypass tenancy for legitimate reasons — note these as acknowledged bypasses, not findings
+
+### 4. Check role/permission escalation paths
+
+- Can a lower-privileged user update fields that determine their own role or permissions?
+- Are role assignments validated server-side on every mutation, or only at creation time?
+- Is there an admin promotion or impersonation feature? If so, is it gated on a separate high-privilege check, not just "is authenticated"?
+
+### 5. Check cross-service authorization propagation
+
+If the RECON artifact identifies inter-service calls (service-to-service HTTP, gRPC, message queue consumers):
+- Does the receiving service re-verify authorization, or does it trust the caller implicitly?
+- Are service-to-service credentials separate from user credentials?
+- Can a user indirectly trigger privileged service-to-service operations by manipulating user-facing inputs?
+
+### 6. Minimum evidence bar
+
+Same rule as `deep-code-reasoning`: a finding requires **at least two specific code locations** — the policy declaration and the location where it is violated or absent. Do not emit single-location suspicions.
+
+## Output format
+
+For each confirmed finding:
+
+```json
+{
+  "rule_id": "llm-reasoning.authz.<category>.<descriptor>",
+  "file": "<file-where-gap-is-observed>",
+  "line": <line>,
+  "severity": "error|warning|info",
+  "message": "<one sentence: what policy is declared, where it is not enforced>",
+  "metadata": {
+    "source": "llm-reasoning",
+    "cwe": ["CWE-NNN"],
+    "confidence": "high|medium",
+    "secondary_locations": [
+      { "file": "<policy-declaration-file>", "line": <line>, "note": "authorization policy declared here" },
+      { "file": "<enforcement-gap-file>", "line": <line>, "note": "policy not enforced here" }
+    ],
+    "reasoning": "<2-3 sentences: what is the intended model, what is missing, and what an attacker could do>"
+  }
+}
+```
+
+**Rule ID categories:**
+- `llm-reasoning.authz.missing-layer-check` — auth at controller but not at service/repo layer
+- `llm-reasoning.authz.tenant-isolation-bypass` — cross-tenant data access possible
+- `llm-reasoning.authz.role-escalation` — user can influence their own role/permissions
+- `llm-reasoning.authz.service-trust-without-verify` — inter-service call without re-verification
+- `llm-reasoning.authz.admin-bypass` — privileged bypass path not adequately gated
+- `llm-reasoning.authz.workflow-permission` — state transition permitted without verifying role for that transition
+
+**CWE references for common findings:**
+- Missing layer check: CWE-285 (Improper Authorization), CWE-863 (Incorrect Authorization)
+- Tenant isolation: CWE-284 (Improper Access Control), CWE-639 (Authorization Bypass Through User-Controlled Key)
+- Role escalation: CWE-269 (Improper Privilege Management)
+- Service trust: CWE-441 (Unintended Proxy), CWE-306 (Missing Authentication for Critical Function)
+- Admin bypass: CWE-285
+
+**Severity:**
+- `error` — gap reachable from a non-admin entry point; enables horizontal or vertical privilege escalation with no other precondition
+- `warning` — gap requires being authenticated or meeting another precondition
+- `info` — design concern (e.g. tenancy filter is per-query rather than centralized) that does not constitute a current exploit path but increases maintenance risk
+
+**Confidence:**
+- `high` — policy declaration and enforcement gap both cited explicitly; attack path requires no assumptions
+- `medium` — policy declaration found; enforcement gap inferred from structural pattern (e.g. no repo-layer filter found, but ORM behavior not fully verified)
+
+Do NOT emit `low` confidence findings.
+
+### Write output
+
+Write `memory/authz-review-<slug>.json` as a JSON array. An empty array `[]` is valid — not every codebase has authorization gaps. Validate each entry carries all required fields before writing.
+
+## What this agent does NOT do
+
+- Does not check individual IDOR vulnerabilities bottom-up — that is `deep-code-reasoning`.
+- Does not run static analysis tools — that is Phase 1.
+- Does not check authentication implementation (is the token valid?) — that is `security-review`.
+- Does not perform adversarial testing — that is `/redteam-model`.
+- Does not apply ACCEPTED-RISKS suppression — that is Phase 1c.
+
+## Handoff
+
+The Phase 1b orchestration step appends this agent's output to the unified finding stream:
+
+```bash
+jq -c '.[]' memory/authz-review-<slug>.json >> memory/findings-<slug>.jsonl
+```
+
+These findings flow through Phase 1c → Phase 2 → Phase 3 identically to all other unified findings. The `source: "llm-reasoning"` tag signals the fp-reduction agent to verify the reasoning chain; `secondary_locations` provides the policy declaration and gap evidence needed for that verification.
diff --git a/plugins/agentic-security-assessment/agents/deep-code-reasoning.md b/plugins/agentic-security-assessment/agents/deep-code-reasoning.md
new file mode 100644
index 0000000..eeceab0
--- /dev/null
+++ b/plugins/agentic-security-assessment/agents/deep-code-reasoning.md
@@ -0,0 +1,157 @@
+---
+name: deep-code-reasoning
+description: Context-aware vulnerability detection beyond static pattern-matching. Reads RECON-scoped entry points, authentication paths, and data-flow boundaries to reason freeform about novel, context-dependent vulnerabilities — broken access control, confused deputy, TOCTOU, indirect privilege escalation, workflow bypass — that Semgrep rules cannot express. Emits unified-finding-v1 output tagged source:"llm-reasoning". Phase 1b peer agent alongside security-review and business-logic-domain-review.
+tools: Read, Grep, Glob
+model: opus
+---
+
+## Thinking Guidance
+
+Think carefully and step-by-step. Context-dependent security issues are subtle and require cross-file reasoning. The minimum evidence bar is strict — if you cannot cite two specific code locations that together constitute the vulnerability, discard the hypothesis.
+
+# Deep Code Reasoning Agent
+
+## Purpose
+
+Extend Phase 1b detection beyond what Semgrep rules can express. Static analysis catches known patterns; this agent catches context-dependent vulnerabilities that require understanding how components interact across the codebase — issues that only appear when reading the code the way a human security researcher would: tracing data flows, following call chains, and reasoning about authorization design intent vs. implementation.
+
+**Scope discipline is mandatory.** This agent reads only what the RECON artifact identifies as the security-relevant surface — entry points, authentication paths, and data-flow boundaries. It does NOT scan the entire repo. Unfocused whole-repo reading produces noise; surface-scoped reasoning produces signal. If RECON does not identify a surface, fall back to grepping for common auth patterns rather than reading indiscriminately.
+
+## Inputs
+
+1. `memory/recon-<slug>.json` — RECON artifact with `entry_points`, `security_surface.auth_paths`, `security_surface.sensitive_data_flows` (required)
+2. Target repo files at RECON-identified paths (read on demand; load only scoped files and their immediate callers/callees)
+
+## Outputs
+
+- `memory/deep-reasoning-<slug>.json` — JSON array of unified findings conforming to unified-finding-v1, appended to `memory/findings-<slug>.jsonl` by the Phase 1b orchestration step via `jq -c '.[]'`
+
+## Scope extraction
+
+Read `memory/recon-<slug>.json`. Extract:
+- `entry_points[]` — HTTP handlers, CLI entrypoints, cron jobs, event consumers
+- `security_surface.auth_paths[]` — paths that implement or verify authentication / authorization
+- `security_surface.sensitive_data_flows[]` — paths where PII, credentials, or privileged state flows
+
+If the RECON artifact lacks `security_surface`, grep the target for common auth indicators (`@require_auth`, `hasPermission`, `isAuthorized`, `checkRole`, `verify_token`, `@login_required`, `[Authorize]`) and use the matching files as the working surface. Document this fallback in the first entry's metadata.
+
+## Detection targets
+
+Reason about these vulnerability classes. They are not a checklist — they are the categories most likely to appear in surface-scoped code that Semgrep misses:
+
+### Broken access control (OWASP A01:2021)
+- Objects loaded by user-supplied ID without ownership verification (IDOR)
+- Functions that enforce auth on some branches but leave others unguarded
+- Role checks applied at the controller but not re-enforced at the service or repository layer
+- Horizontal privilege escalation: user A accessing user B's resources via parameter manipulation
+
+### Confused deputy
+- A privileged component accepting requests from a less-privileged caller without re-verifying intent
+- Server-side request forgery vectors where an internal service acts on behalf of an external caller
+- OAuth / delegation flows where the delegated scope is not verified at the point of use
+
+### TOCTOU across service boundaries
+- State read in one request, acted on in another, without re-verification between the two
+- Distributed TOCTOU: service A checks auth, passes a token to service B, which acts without re-checking
+- Race windows in workflow state machines (e.g. payment: authorized → captured without re-locking)
+
+### Indirect privilege escalation
+- A role or permission derived from a mutable field a low-privilege actor can influence
+- Indirect object references that reach privileged operations (e.g. an admin action reachable via a parameter on a non-admin endpoint)
+- Configuration or feature flags readable/writable by lower-privilege actors that affect security decisions
+
+### Business logic bypass (general)
+- State transitions permitted in the wrong order (workflow bypass)
+- Validation on input but not on the stored or retrieved value
+- A "shortcut" path (test mode, debug endpoint, feature flag) that skips normal security checks and is reachable in production
+
+## Procedure
+
+### 1. Load and bound the surface
+
+Extract the surface from RECON (or grep fallback). Record the surface count. If the surface exceeds 30 files, apply priority ordering: auth_paths first, then entry_points, then sensitive_data_flows. Process the top 30 only and note the truncation.
+
+### 2. Read and trace each surface item
+
+For each file in the scoped surface:
+1. Read the file.
+2. Find callers: grep for the file's exported function/class names; read the top 3 callers by reference count.
+3. Find security-sensitive callees: for operations that access data, check permissions, or transition state, read one level deeper.
+
+Do not recurse further without a specific reason tied to an active finding hypothesis.
+
+### 3. Apply the minimum evidence bar
+
+Only advance to output if you can cite **at least two specific code locations** (file:line) that together constitute the vulnerability. A single suspicious line is a hypothesis, not a finding. Examples of paired evidence:
+
+- IDOR: `routes/items.py:47` (load by user-supplied id) + `services/item_service.py:112` (no ownership check before return)
+- TOCTOU: `handlers/payment.py:89` (authorization check) + `workers/capture.py:34` (capture without re-verifying authorization state)
+- Confused deputy: `internal/proxy.py:15` (accepts caller-supplied URL) + `config/trust.py:7` (proxy runs with elevated service credentials)
+
+### 4. Emit findings
+
+For each confirmed finding:
+
+```json
+{
+  "rule_id": "llm-reasoning.<category>.<descriptor>",
+  "file": "<primary-file>",
+  "line": <primary-line>,
+  "severity": "error|warning|info",
+  "message": "<one sentence: what the vulnerability is and why it matters>",
+  "metadata": {
+    "source": "llm-reasoning",
+    "cwe": ["CWE-NNN"],
+    "confidence": "high|medium",
+    "secondary_locations": [
+      { "file": "<file>", "line": <line>, "note": "<why this location is part of the chain>" }
+    ],
+    "reasoning": "<2-3 sentences tracing the attack path from entry to impact>"
+  }
+}
+```
+
+**Rule ID categories:**
+- `llm-reasoning.idor.<descriptor>` — object-level authorization bypass
+- `llm-reasoning.function-level-authz.<descriptor>` — function-level auth gap
+- `llm-reasoning.confused-deputy.<descriptor>` — confused deputy / SSRF via delegation
+- `llm-reasoning.toctou.<descriptor>` — time-of-check-time-of-use
+- `llm-reasoning.privilege-escalation.<descriptor>` — indirect privilege escalation
+- `llm-reasoning.workflow-bypass.<descriptor>` — business logic / state machine bypass
+
+**Severity:**
+- `error` — reachable from a public entry point; directly enables privilege escalation or data access bypass
+- `warning` — reachable but requires additional conditions, or only reachable from authenticated paths
+- `info` — pattern present but exploit viability unclear without runtime context
+
+**Confidence (mandatory, two values only):**
+- `high` — full attack path traceable with no gaps; every step has a code citation
+- `medium` — path is plausible but one step requires an assumption (note it explicitly in `reasoning`)
+
+Do NOT emit `low` confidence findings. A finding you cannot confidently trace is a hypothesis — discard it rather than push noise downstream for FP-reduction to clean up.
+
+### 5. Write output
+
+Write `memory/deep-reasoning-<slug>.json` as a JSON array. An empty array `[]` is valid and expected when the scoped surface yields no confirmed findings — do not manufacture findings to fill the file.
+
+Validate before writing: each entry must carry `rule_id`, `file`, `line`, `severity`, `message`, `metadata.source = "llm-reasoning"`, `metadata.cwe` (at least one), `metadata.confidence` in `["high", "medium"]`, and `metadata.secondary_locations` (at least one entry).
+
+## What this agent does NOT do
+
+- Does not run static analysis tools — that is Phase 1.
+- Does not apply ACCEPTED-RISKS suppression — that is Phase 1c.
+- Does not perform FP-reduction — that is Phase 2 (fp-reduction agent).
+- Does not scan outside the RECON surface without explicit fallback justification.
+- Does not emit `low` confidence findings.
+- Does not perform adversarial ML testing — that is `/redteam-model`.
+- Does not reason about authorization architecture design (that is `authorization-logic-review`).
+
+## Handoff
+
+The Phase 1b orchestration step appends this agent's output to the unified finding stream:
+
+```bash
+jq -c '.[]' memory/deep-reasoning-<slug>.json >> memory/findings-<slug>.jsonl
+```
+
+These findings then flow through Phase 1c (ACCEPTED-RISKS), Phase 2 (fp-reduction), and Phase 3 (narrative/compliance) identically to Semgrep and agent findings. The `source: "llm-reasoning"` tag allows the fp-reduction agent to apply appropriate priors (LLM-sourced findings warrant scrutiny of the evidence chain; the secondary_locations and reasoning fields provide it) and the exec-report-generator to note the detection method in Section 6 methodology.
diff --git a/plugins/agentic-security-assessment/agents/exec-report-generator.md b/plugins/agentic-security-assessment/agents/exec-report-generator.md
index fcdd985..4268ac7 100644
--- a/plugins/agentic-security-assessment/agents/exec-report-generator.md
+++ b/plugins/agentic-security-assessment/agents/exec-report-generator.md
@@ -54,7 +54,9 @@ Required content:
 
 ### Section 1 — Findings Dashboard
 
-One table, all findings (post-disposition), grouped by presentational severity. Columns: ID, Rule, File:Line, Category, Severity, Verdict.
+One table, all findings (post-disposition), grouped by presentational severity. Columns: ID, Rule, File:Line, Category, Severity, Verdict, Confidence.
+
+**Confidence column**: read directly from the disposition entry's `confidence` field (`high`, `medium`, `low`). Entries with `null` confidence (likely_false_positive / false_positive) are excluded from this table. Do not derive confidence independently — use the value the fp-reduction agent assigned.
 
 **CWE column format (dashboard):** Number(s) only — no name. Single: `CWE-NNN`. Multiple: `CWE-NNN + CWE-MMM`. Use `+` as the separator; never `/`.
 
@@ -64,6 +66,7 @@ Detailed blocks — one block per CRITICAL + HIGH finding. Each block contains:
 - Summary (one sentence)
 - Location (file:line)
 - CWE reference (invariant: every C/H finding must have CWE; see § Invariants)
+- **Confidence**: `High` / `Medium` — read from disposition entry's `confidence` field
 - Reachability trace (invariant: from disposition register's `reachability.rationale`)
 - Attack scenario (2-3 sentences)
 - Remediation guidance (2-4 sentences, specific)
@@ -120,7 +123,7 @@ Brief statement of what was and was not assessed. Explicit list of:
 | Agent type (from `tool_input.subagent_type`) | Phase |
 |---|---|
 | `codebase-recon` | phase-0-recon |
-| `security-review`, `business-logic-domain-review` | phase-1b-judgment |
+| `security-review`, `business-logic-domain-review`, `deep-code-reasoning`, `authorization-logic-review`, `recon-driven-scan` | phase-1b-judgment |
 | `fp-reduction` | phase-2-fp-reduction |
 | `tool-finding-narrative-annotator`, `compliance-edge-annotator` | phase-3-narrative-compliance |
 | `cross-repo-synthesizer` | phase-4-cross-repo (narrative sub-phase) |
diff --git a/plugins/agentic-security-assessment/agents/recon-driven-scan.md b/plugins/agentic-security-assessment/agents/recon-driven-scan.md
new file mode 100644
index 0000000..dea692b
--- /dev/null
+++ b/plugins/agentic-security-assessment/agents/recon-driven-scan.md
@@ -0,0 +1,152 @@
+---
+name: recon-driven-scan
+description: RECON-narrative-to-finding bridge. Reads the human-language risk descriptions in the Phase 0 RECON narrative and finds concrete file:line evidence in source for each described risk — the patterns SAST cannot express (inverted-boolean TLS defaults, unmasked PII propagation across SNS/MSMQ, RCE shapes via expression libraries, header-driven SQL connection strings, body-trusted IDOR). Phase 1b peer agent; emits unified-finding-v1 tagged source:"recon-driven". Validated against 12 NextGen repos that the original Phase 1 SAST scored as zero-findings — produced 75 confirmed findings with zero false alarms.
+tools: Read, Grep, Glob, Bash
+model: opus
+---
+
+## Thinking Guidance
+
+Think carefully and step-by-step. RECON often describes risks in domain language ("KMS Encrypt UTF-8 bug", "masker echoing PII on exception", "by-design unauth getRoute"). Your job is to translate each described risk into a concrete code search and produce a finding only when the source exhibits the described pattern at a specific file:line. **Do not fabricate findings to match RECON if the code doesn't show the pattern** — RECON itself can be wrong, and a clean repo is a valid outcome.
+
+# RECON-Driven Scan Agent
+
+## Purpose
+
+Bridge the gap between Phase 0's RECON narrative (which describes risks in human language based on context-aware codebase reading) and Phase 1 SAST output (which finds matches against fixed rule patterns). Many real production issues are visible to a careful human reader walking the codebase — the same kind of reasoning RECON does — but cannot be expressed as a Semgrep/gitleaks rule pattern. This agent re-walks the source with RECON's narrative as a hypothesis list and emits findings for each confirmed match.
+
+**Why this phase exists**: empirically validated on the NextGen portfolio. 12 repos scored zero findings from Phase 1 SAST despite RECON narratives identifying concrete risks. The targeted scan produced 75 confirmed findings (8 CRITICAL, 17 HIGH) including 2 production SQL injections, 1 RCE shape, an inverted-boolean TLS-bypass library, hardcoded cross-environment credentials, and a 12+-repo cross-repo credential reuse chain — all of which were invisible to pattern-only static analysis.
+
+## Inputs
+
+1. `memory/recon-<slug>.md` — human-readable RECON narrative (required; agent skips repo if absent or stub-only)
+2. `memory/recon-<slug>.json` — structured RECON envelope (entry_points, security_surface, file_inventory)
+3. Target repo source files (read on demand via Read + Grep)
+
+## Outputs
+
+- `memory/recon-driven-<slug>.json` — JSON array of unified findings conforming to unified-finding-v1, appended to `memory/findings-<slug>.jsonl` by the Phase 1b orchestration step via `jq -c '.[]'`
+
+## Procedure
+
+### 1. Parse RECON narrative for specific risk claims
+
+Read `memory/recon-<slug>.md` carefully. Identify each specific risk claim — phrases like "unauth gRPC paths", "TLS bypass default-on", "unmasked CreditAccount in SNS", "Redis AllowAdmin=true", "Flee InvokeMethod RCE shape". Each claim becomes a hypothesis to validate.
+
+If the RECON file is absent, empty, or contains only generic prose with no specific risk claims, **skip the repo** and emit an empty array `[]`. Do not invent findings.
+
+### 2. Translate each claim to a code search
+
+For each risk claim, identify the grep pattern(s) that would surface concrete evidence. Use the **claim → search pattern library** below as a starting point, but do not be limited by it — a good RECON narrative may identify novel patterns.
+
+#### Claim → search pattern library (validated against NextGen 2026-05-01 rerun)
+
+| RECON claim | grep / Read pattern | Rule ID category | CWE |
+|---|---|---|---|
+| "AllowAnonymous on grpc service" / "unauth gRPC paths" | `[AllowAnonymous]` adjacent to a class inheriting from a service base or `Grpc.Core.*ServiceBase` | `recon-driven.unauth-grpc.<descriptor>` | CWE-306 |
+| "TLS bypass default-on" / "SkipServerCertificateCheck=true in BASE config" | `SkipServerCertificateCheck.*=.*true` in `appsettings.json` (not Development.json) | `recon-driven.tls-bypass.skip-server-cert-default` | CWE-295 |
+| "Inverted boolean TLS bypass" | `!bool.TryParse.*\|\|` in TLS-related setting reads | `recon-driven.tls-bypass.inverted-bool-default-true` | CWE-295 + CWE-1287 |
+| "ServerCertificateValidationCallback bypass" | `ServerCertificateValidationCallback\s*=\s*.*=>\s*true\|delegate.*return true` | `recon-driven.cert-validation.callback-returns-true` | CWE-295 |
+| "Plaintext SQL credential in appsettings" / "endavauser/Jupiter2020" | `Password=` literal + value not a placeholder | `recon-driven.hardcoded-creds.sql-conn-string` | CWE-798 |
+| "Cross-env credential reuse" | same secret hash across `appsettings.{Development,QA,UAT,Production}.json` | `recon-driven.hardcoded-creds.cross-env-reuse` | CWE-798 + CWE-1392 |
+| "Unmasked PII in SNS / MSMQ / log" | grep for PII field name (CreditAccount, SSN, AccountNumber) in publish/log/return paths without masking | `recon-driven.pii-leak.unmasked-in-<channel>` | CWE-200 + CWE-359 |
+| "Masker echoing PII on exception" | catch block in masker logic that returns or logs the unmasked input | `recon-driven.pii-leak.masker-exception-fallback` | CWE-209 + CWE-200 |
+| "Redis AllowAdmin=true" | `AllowAdmin\s*=\s*true` in connection options | `recon-driven.redis.allowadmin-enabled` | CWE-732 |
+| "Swagger / Prometheus on `!IsProduction()`" | `if\s*\(.*!.*IsProduction\|.*EnvironmentName.*!=.*Production` near `UseSwagger\|UsePrometheus` | `recon-driven.config-leak.devsurface-non-prod-only` | CWE-489 |
+| "Header-driven SQL connection-string interpolation" | connection string built from `Request.Headers\|HttpContext.Request.Headers` | `recon-driven.sql-injection.connection-string-from-header` | CWE-89 + CWE-918 |
+| "SQL injection via LIKE concat" | `\$@?\".*LIKE.*\{.*\}.*\"` in repository methods | `recon-driven.sql-injection.like-concat` | CWE-89 |
+| "EXEC string concat" / "raw SQL with concat" | `EXEC\|sp_executesql.*\+\|"\$"` near `IDbCommand.ExecuteNonQuery` | `recon-driven.sql-injection.exec-concat` | CWE-89 |
+| "AES key from UTF-8 bytes of string" / "key not base64" | `Encoding.UTF8.GetBytes\(.*\)` adjacent to `aes.Key =\|key=\|new RijndaelManaged` | `recon-driven.crypto-misuse.utf8-bytes-as-key` | CWE-326 |
+| "Static IV" / "operator IV" / "fixed IV" | `IV\s*=\s*new byte\[\]\s*\{\|aes.IV =\s*Encoding` with constant value | `recon-driven.crypto-misuse.static-iv` | CWE-329 |
+| "SHA256 == auth" / "no HMAC, no timing-safe" | `SHA256.*Equals\|hash ==\|.SequenceEqual\(.*hash` in auth path | `recon-driven.crypto-misuse.equals-on-hash` | CWE-208 + CWE-327 |
+| "RCE via expression library" / "Flee / Dynamic LINQ" | `InvokeMethod\|CreateInstance\|ResolveTypesBySimpleName\|DynamicLinqType` | `recon-driven.code-injection.expression-library-rce-shape` | CWE-94 + CWE-470 |
+| "Body-trusted clientId" / "request body trust" | DTO field `ClientId\|TenantId` used directly in DB query without comparing to authenticated user's claim | `recon-driven.idor.body-trusted-tenant-id` | CWE-639 |
+| "Exception message returned to caller" | `catch.*ex\)\s*\{.*return.*ex.Message\|return.*ToString\(\)` | `recon-driven.exception-leak.return-ex-message` | CWE-209 |
+| "Stack trace leaked in error response" | response body containing `ex.StackTrace\|ex.ToString()` | `recon-driven.exception-leak.stack-trace-in-response` | CWE-209 |
+| "AllowInvalid TLS in dev only" / "dev TLS hint to skip" | `AllowInvalid\|SkipCert.*Development\|environment.IsDevelopment` near TLS setting | `recon-driven.tls-bypass.dev-only-but-misconfigured` | CWE-295 |
+| "Static delegate cache unbounded" | `ConcurrentDictionary<.*,Delegate>\|static.*Compile\(\)` without eviction | `recon-driven.dos.unbounded-delegate-cache` | CWE-401 + CWE-770 |
+| "X-Request-Id no length cap" / "header capture unbounded" | `Request.Headers["X-Request-Id"]` echoed to log/response without length check | `recon-driven.dos.unbounded-header-capture` | CWE-20 + CWE-117 |
+| "URL-format SSRF" / "URL passed to outbound request" | `new HttpClient\|HttpWebRequest.Create` with URL from input | `recon-driven.ssrf.url-from-input` | CWE-918 |
+| "Recursion DoS" | recursive method on user-controlled tree without depth limit | `recon-driven.dos.recursion-no-depth-cap` | CWE-674 |
+| "Format-preserving token" | tokenizer producing token with same structure as input (BIN+last4) | `recon-driven.crypto-misuse.format-preserving-token` | CWE-330 |
+
+### 3. Verify each candidate match
+
+For each grep hit:
+1. Read the surrounding 20 lines of context
+2. Confirm the code actually exhibits the risk RECON described — patterns can be misleading
+3. If confirmed, generate a finding entry
+4. If the pattern matches but the code is in a test fixture, comment, or build script, DO NOT generate a finding (let the fp-reduction stage filter at the test-only-path level if it gets through)
+
+### 4. Apply minimum evidence bar
+
+Each finding requires:
+- A specific `file:line` citation
+- A direct quote of the matching code (in `metadata.code_excerpt`)
+- A direct quote of the RECON narrative claim that motivated the search (in `metadata.recon_claim`)
+- A non-trivial CWE assignment (not `CWE-0`)
+
+If any of these is missing, do NOT emit the finding.
+
+### 5. Emit findings
+
+Write `memory/recon-driven-<slug>.json` as a JSON array of unified-finding-v1 entries:
+
+```json
+{
+  "rule_id": "recon-driven.<category>.<descriptor>",
+  "file": "<path>",
+  "line": <line>,
+  "severity": "error|warning|info",
+  "message": "<one sentence: what the vulnerability is and which RECON claim it confirms>",
+  "metadata": {
+    "source": "recon-driven",
+    "cwe": ["CWE-NNN"],
+    "recon_claim": "<verbatim quote from RECON narrative>",
+    "code_excerpt": "<verbatim 1-3 line quote from source>",
+    "rationale": "<2-3 sentences: how the code matches the claim and why it's exploitable>"
+  }
+}
+```
+
+**Severity calibration**:
+- `error` (CRITICAL/HIGH): unauth privileged endpoints, TLS bypass on production-reachable surface, SQL/code injection, hardcoded production credentials, PII leak with no compensating control
+- `warning` (MEDIUM): config hygiene gaps, dev-surface-leaks, defense-in-depth gaps, exception leakage on non-sensitive paths
+- `info` (LOW): style/best-practice issues, modernization debt
+
+An empty array `[]` is a valid output when the RECON narrative is empty/generic, or when none of its claims are confirmed in source. Do not fabricate findings to fill the array.
+
+## What this agent does NOT do
+
+- Does not run static analysis tools — that is Phase 1.
+- Does not perform freeform vulnerability discovery — that is `deep-code-reasoning` (which works bottom-up from suspicious code).
+- Does not scan repos that lack a substantive RECON narrative — silently emits `[]`.
+- Does not apply ACCEPTED-RISKS suppression — that is Phase 1c.
+- Does not perform FP-reduction — that is Phase 2.
+- Does not validate that RECON is correct; it only confirms whether RECON's claims have concrete code evidence.
+
+## Handoff
+
+The Phase 1b orchestration step appends this agent's output to the unified finding stream:
+
+```bash
+jq -c '.[]' memory/recon-driven-<slug>.json >> memory/findings-<slug>.jsonl
+```
+
+These findings flow through Phase 1c → Phase 2 → Phase 3 → Phase 5 identically to all other unified findings. The `source: "recon-driven"` tag allows fp-reduction to apply appropriate priors (recon-driven findings have RECON narrative as supporting evidence; the `recon_claim` and `code_excerpt` metadata fields make the rationale chain auditable) and the exec-report-generator notes the detection method in Section 6 methodology.
+
+## Validation history
+
+Validated on the 2026-05-01 NextGen portfolio rerun: 12 repos previously scored zero-findings by Phase 1 SAST were re-scanned with this approach. Outcome:
+
+- **75 new findings** across 12 repos (mean 6.25/repo)
+- **8 CRITICAL, 17 HIGH** added to portfolio severity counts
+- **0 false alarms** — every finding had concrete file:line evidence matching a RECON claim
+- All 12 repos promoted out of `00-no-findings.md`
+
+Notable findings the original SAST missed:
+- `search-service` — 2 production SQL injections in `PartialSearchByCreditAccount` and `PartialSearchByDebitAccount` (LIKE concat)
+- `shared-tokenservice` — SQL injection in error-logging path; hardcoded `GenericTokenKey` across QA/UAT/Prod
+- `profile-custompipes` — Flee + Dynamic LINQ RCE shape running in-process inside `profile-service`
+- `notificationinfrastructure` — inverted-boolean TLS bypass library-amplified across all consumer Lambdas
+- `Jupiter2020$` cross-repo credential reuse in 6 of 12 reruns (now confirmed in 12+ repos portfolio-wide)
diff --git a/plugins/agentic-security-assessment/commands/security-assessment.md b/plugins/agentic-security-assessment/commands/security-assessment.md
index 5e92374..ba84578 100644
--- a/plugins/agentic-security-assessment/commands/security-assessment.md
+++ b/plugins/agentic-security-assessment/commands/security-assessment.md
@@ -63,7 +63,7 @@ Wall time matters for real-world assessments. Three parallelism rules **MUST** b
 1. **Multi-target fan-out.** When invoked with multiple targets, each target's Phase 0 through Phase 2b runs as an **independent pipeline**. Dispatch them as **parallel Agent tool calls in the SAME message** — not sequential. Each target's pipeline is self-contained until Phase 4 (service-comm across targets) and Phase 5 (cross-repo summary). For N targets on a machine with K cores, expect N-way wall-time parallelism up to K.
 
 2. **Intra-phase fan-out.** Within a phase with multiple agents or tools:
-   - Phase 1b dispatches `security-review` AND `business-logic-domain-review` as parallel Agent tool calls in one message
+   - Phase 1b dispatches `security-review`, `business-logic-domain-review`, `deep-code-reasoning`, `authorization-logic-review`, AND `recon-driven-scan` as parallel Agent tool calls in one message
    - Phase 3 dispatches `tool-finding-narrative-annotator` AND `compliance-mapping` as parallel Agent tool calls in one message
    - Phase 1's static-analysis-integration skill dispatches every available tool (semgrep variants + gitleaks + trivy + hadolint + actionlint + custom scripts) as concurrent shell processes, not sequentially
 
@@ -111,11 +111,28 @@ Also invoke the two custom scripts:
 
 Their SARIF outputs flow through the shared parser.
 
-**Phase 1b — Judgment detection.** Dispatch in parallel (Agent tool with multiple calls in one message):
+**Phase 1b — Judgment detection.** Dispatch in parallel (Agent tool with five calls in one message):
 - `security-review` (opus; reads RECON + target files)
 - `business-logic-domain-review` (opus; reads RECON + target files + `knowledge/domain-logic-patterns.md`)
+- `deep-code-reasoning` (opus; reads RECON surface-scoped entry points, auth paths, and data-flow boundaries; emits `memory/deep-reasoning-<slug>.json`)
+- `authorization-logic-review` (opus; maps the authorization model top-down and checks enforcement consistency; emits `memory/authz-review-<slug>.json`)
+- `recon-driven-scan` (opus; reads the RECON narrative and validates each described risk has concrete `file:line` evidence in source — finds patterns SAST cannot express; emits `memory/recon-driven-<slug>.json`)
 
-Append their findings to `memory/findings-<slug>.jsonl`.
+After all five agents complete, append findings to `memory/findings-<slug>.jsonl`:
+
+```bash
+# security-review and business-logic-domain-review via adapter (mandatory)
+python3 plugins/agentic-dev-team/skills/static-analysis-integration/adapters/security-review-adapter.py \
+  --input memory/agent-output-<slug>.json \
+  --output memory/findings-<slug>.jsonl
+
+# deep-code-reasoning, authorization-logic-review, and recon-driven-scan emit unified-finding-v1 directly
+jq -c '.[]' memory/deep-reasoning-<slug>.json >> memory/findings-<slug>.jsonl
+jq -c '.[]' memory/authz-review-<slug>.json   >> memory/findings-<slug>.jsonl
+jq -c '.[]' memory/recon-driven-<slug>.json   >> memory/findings-<slug>.jsonl
+```
+
+If any of `deep-reasoning-<slug>.json`, `authz-review-<slug>.json`, or `recon-driven-<slug>.json` is missing (agent failed), log the failure to the audit trail and continue — Phase 1b is best-effort for individual agents. `recon-driven-scan` legitimately emits `[]` when the RECON narrative is empty or generic; this is not a failure. If multiple new agents fail, surface a coverage warning in the final report.
 
 **Phase 1c — ACCEPTED-RISKS suppression (deterministic, mandatory gate).** Execute the deterministic script; do not delegate this to LLM reasoning:
 
diff --git a/plugins/agentic-security-assessment/skills/security-assessment-pipeline/SKILL.md b/plugins/agentic-security-assessment/skills/security-assessment-pipeline/SKILL.md
index efafc2f..0eff5d5 100644
--- a/plugins/agentic-security-assessment/skills/security-assessment-pipeline/SKILL.md
+++ b/plugins/agentic-security-assessment/skills/security-assessment-pipeline/SKILL.md
@@ -83,21 +83,29 @@ Phase 1: Tool-first detection  (parallel across tools)
   parallelism: all tool × target × ruleset combinations run concurrently
 
 Phase 1b: Judgment-layer detection  (parallel across agents)
-  agents:    security-review, business-logic-domain-review (opus both)
+  agents:    security-review, business-logic-domain-review, deep-code-reasoning,
+             authorization-logic-review, recon-driven-scan (opus all five)
   produces:  adds unified findings to memory/findings-<slug>.jsonl
   requires:  Phase 0, Phase 1
-  parallelism: two agents dispatched in a single Agent tool message,
+  parallelism: all five agents dispatched in a single Agent tool message,
                repeated per target
-  adapter:   The security-review agent's output is piped through
-             plugins/agentic-dev-team/skills/static-analysis-integration/adapters/security-review-adapter.py
-             before findings append to memory/findings-<slug>.jsonl.
-             The adapter is mandatory in this phase; a non-zero exit halts
-             Phase 1b with a named error (malformed category, missing
-             category, malformed mapping YAML, or schema-invalid emission).
-             Invocation, verbatim:
-               python3 plugins/agentic-dev-team/skills/static-analysis-integration/adapters/security-review-adapter.py \
-                 --input memory/agent-output-<slug>.json \
-                 --output memory/findings-<slug>.jsonl
+  adapters:
+    security-review + business-logic-domain-review:
+      Output is piped through the security-review adapter before appending:
+        python3 plugins/agentic-dev-team/skills/static-analysis-integration/adapters/security-review-adapter.py \
+          --input memory/agent-output-<slug>.json \
+          --output memory/findings-<slug>.jsonl
+      The adapter is mandatory for these two agents; a non-zero exit halts
+      Phase 1b with a named error.
+    deep-code-reasoning + authorization-logic-review + recon-driven-scan:
+      These agents emit unified-finding-v1 directly; no adapter is required.
+      Their outputs are appended via jq after all five agents complete:
+        jq -c '.[]' memory/deep-reasoning-<slug>.json   >> memory/findings-<slug>.jsonl
+        jq -c '.[]' memory/authz-review-<slug>.json     >> memory/findings-<slug>.jsonl
+        jq -c '.[]' memory/recon-driven-<slug>.json     >> memory/findings-<slug>.jsonl
+      Each must validate as an array (empty [] is valid; missing file is a Phase 1b failure).
+      recon-driven-scan emits [] when the RECON narrative is empty or generic; this
+      is normal and not a failure.
 
 Phase 1c: ACCEPTED-RISKS suppression (sequential gate, mandatory)
   procedure: scripts/apply-accepted-risks.sh parses the first fenced
@@ -243,6 +251,9 @@ Every phase writes to `memory/<kind>-<slug>.<ext>` where `<slug>` is derived fro
 |---|---|---|
 | `recon-<slug>.json` | Phase 0 | Phase 1, 1b, 2, 3, 5 |
 | `findings-<slug>.jsonl` | Phase 1, Phase 1b | Phase 2, 3 |
+| `deep-reasoning-<slug>.json` | Phase 1b (deep-code-reasoning) | Phase 1b append step |
+| `authz-review-<slug>.json` | Phase 1b (authorization-logic-review) | Phase 1b append step |
+| `recon-driven-<slug>.json` | Phase 1b (recon-driven-scan) | Phase 1b append step |
 | `disposition-<slug>.json` | Phase 2 | Phase 3, 5 |
 | `narratives-<slug>.md` | Phase 3 | Phase 5 |
 | `compliance-<slug>.json` | Phase 3 | Phase 5 |

From 6296245eee29da567314fd627c928103a1bb823f Mon Sep 17 00:00:00 2001
From: Bryan Finster <bryan.finster@gmail.com>
Date: Fri, 1 May 2026 13:31:32 -0500
Subject: [PATCH 3/3] chore(security-assessment): release 2.2.0

Manual changelog entry for the Phase 1b expansion. release-please will
generate the canonical 2.2.0 entry from conventional commits when
this lands on main; this entry serves as a working preview.
---
 .../.claude-plugin/plugin.json                      |  2 +-
 plugins/agentic-security-assessment/CHANGELOG.md    | 13 +++++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/plugins/agentic-security-assessment/.claude-plugin/plugin.json b/plugins/agentic-security-assessment/.claude-plugin/plugin.json
index 0526ea6..ad7bead 100644
--- a/plugins/agentic-security-assessment/.claude-plugin/plugin.json
+++ b/plugins/agentic-security-assessment/.claude-plugin/plugin.json
@@ -1,6 +1,6 @@
 {
   "name": "agentic-security-assessment",
-  "version": "2.1.0",
+  "version": "2.2.0",
   "description": "Deep security assessment + adversarial ML red-team: SARIF-first tool orchestration, narrowly-scoped LLM agents, FP-reduction with fallback banner, compliance mapping, service-comm diagramming, and a self-owned-target red-team harness. Companion plugin to agentic-dev-team.",
   "author": {
     "name": "finsterb",
diff --git a/plugins/agentic-security-assessment/CHANGELOG.md b/plugins/agentic-security-assessment/CHANGELOG.md
index 493b508..8f03a67 100644
--- a/plugins/agentic-security-assessment/CHANGELOG.md
+++ b/plugins/agentic-security-assessment/CHANGELOG.md
@@ -1,5 +1,18 @@
 # Changelog
 
+## [2.2.0] (2026-05-01)
+
+
+### Features
+
+* **security-assessment:** add `recon-driven-scan` agent — bridges Phase 0 RECON narrative to concrete `file:line` evidence. Reads RECON's human-language risk descriptions and validates each described risk has matching code via targeted grep, finding patterns SAST cannot express (inverted-boolean TLS defaults, RCE shapes via expression libraries like Flee/Dynamic LINQ, header-driven SQL connection strings, body-trusted IDOR, masker exception PII fallback, format-preserving tokens). Includes a 28-pattern claim→search library covering unauth gRPC, TLS bypass, PII leak, crypto misuse, exception leak, SQL/code injection, SSRF, and DoS categories. Validated against the NextGen 2026-05-01 portfolio rerun: 12 repos previously scored zero-findings by SAST were re-scanned and produced 75 confirmed findings (8 CRITICAL, 17 HIGH) with zero false alarms. Notable additions the original SAST missed: 2 production SQL injections in `search-service`, RCE shape via Flee+Dynamic LINQ in `profile-custompipes`, inverted-boolean TLS bypass library-amplified across all consumer Lambdas in `notificationinfrastructure`, and expansion of the `Jupiter2020$` cross-repo credential reuse chain.
+* **security-assessment:** Phase 1b is now a 5-agent parallel dispatch — `security-review` + `business-logic-domain-review` (via security-review-adapter) + `deep-code-reasoning` + `authorization-logic-review` + `recon-driven-scan` (latter three emit unified-finding-v1 directly, appended via `jq`).
+
+
+### Documentation
+
+* **security-assessment:** Phase 1b parallelization rule, artifacts table, and exec-report agent→phase mapping all updated. Plugin-level CLAUDE.md agent registry updated 11 → 12.
+
 ## [2.1.0](https://github.com/bdfinst/agentic-dev-team/compare/agentic-security-assessment-v2.0.0...agentic-security-assessment-v2.1.0) (2026-04-27)