From 89b34e15b734ef7a2eabbdac7bb3c89790a0db9c Mon Sep 17 00:00:00 2001 From: Bryan Finster Date: Fri, 1 May 2026 13:31:03 -0500 Subject: [PATCH 1/3] feat(security-assessment): add Stage 0 devil's advocate + confidence field to fp-reduction MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two enhancements to the FP-reduction rubric, surfaced from a competitive analysis against Anthropic's Claude Code Security: - Stage 0 devil's advocate: every disposition entry now carries da_rationale (≥20 chars) and da_strong (bool) before Stages 1-5 run. The pre-pass forces the agent to argue the strongest case AGAINST the finding being real (framework protection, trusted caller, non-prod context, rule pattern noise) - a strong DA argument sharpens Stage 1 reachability into a hypothesis test rather than open-ended search, and a true_positive that explicitly refutes a counter-argument is more trustworthy than one that never examined the counter-case. - Confidence field: every disposition entry now carries a confidence band (high | medium | low | null) derived from the verdict × exploitability score table: true_positive + score 7-10 → high true_positive + score 0-6 → medium likely_true_positive → medium uncertain → low likely_false_positive / false_positive → null Confidence is consumed by exec-report-generator for the Section 1 dashboard column and Section 2 detail blocks. severity-floors.json documents the bands as informational metadata. --- .../agents/fp-reduction.md | 37 ++++++++++++++++++- .../knowledge/severity-floors.json | 35 +++++++++++++++++- .../skills/false-positive-reduction/SKILL.md | 25 ++++++++++++- 3 files changed, 92 insertions(+), 5 deletions(-) diff --git a/plugins/agentic-security-assessment/agents/fp-reduction.md b/plugins/agentic-security-assessment/agents/fp-reduction.md index 99e73c4..0713e4d 100644 --- a/plugins/agentic-security-assessment/agents/fp-reduction.md +++ b/plugins/agentic-security-assessment/agents/fp-reduction.md @@ -37,7 +37,22 @@ Run `command -v joern` (or its alias `joern-parse`). Set `register.reachability_ If joern is present, invoke `tools/reachability.sh` to build or load the CPG. The helper returns a path to a JSON export of the CFG that Stage 1 queries. -### 2. For each finding, apply stages 1–5 in order +### 2. For each finding, apply stages 0–5 in order + +**Stage 0 — Devil's advocate.** Before running the structured rubric, generate the strongest argument that this finding is NOT a real vulnerability: + +- **Framework protection**: does the language/framework have a built-in prevention for this class? (e.g. ORM parameterization eliminates SQL injection at the repo layer; templating engines auto-escape XSS; TLS termination at the load balancer makes `verify=False` on internal-only calls a much narrower risk) +- **Trusted caller**: is this code only reachable from a trusted internal caller, admin-only CLI, or test harness — never from an untrusted HTTP path? +- **Non-production context**: is the file a migration, seed script, test fixture, or one-time utility that RECON's `entry_points` do not include? +- **Rule pattern noise**: does this rule commonly fire on intentional, non-exploitable configurations (e.g. `node-tls-reject-unauthorized` on a local development server, `hardcoded-password` on a well-known public default)? + +Record the counter-argument (min 20 chars) in `da_rationale`. If the argument is strong enough that Stage 1 reachability analysis is likely to confirm the path is dead or test-only, set `da_strong: true`. + +**The devil's advocate does NOT change the verdict.** Stages 1–5 run regardless. What it changes is *how* Stage 1 operates and what appears in the audit trail: + +- `da_strong: true` → Stage 1 *tests the DA hypothesis* (is the path actually dead / test-only?) rather than performing an open-ended search. This sharpens the rationale and accelerates high-volume runs. +- `da_strong: true` confirmed by Stage 1 (path dead or test-only) → `false_positive` with both the DA argument and the reachability evidence cited. The analyst sees a well-reasoned dismissal, not a silent discard. +- `da_strong: true` *disproved* by Stage 1 (path is reachable) → the rejected DA argument appears in the final rationale. A `true_positive` that explicitly refuted a counter-argument is more trustworthy than one that never examined the counter-case. **Stage 1 — Reachability.** Populate `reachability.reachable` (bool) and `reachability.rationale` (min 20 chars). Set `reachability_source` per the detection mode (`joern-cpg` or `llm-fallback`). @@ -109,6 +124,21 @@ Map the combined reachability + environment + control + scoring into one of: | test-only path OR strong in-repo control + score < 2 | `likely_false_positive` | | dead code OR schema-invalid finding | `false_positive` | +### 4b. Assign confidence + +After assigning the verdict, derive a `confidence` field. This is a first-class output field on the disposition entry — consumed by the exec-report-generator for the dashboard Confidence column and Section 2 detail blocks. + +| Verdict | Exploitability score | Confidence | +|---|---|---| +| `true_positive` | 7–10 | `"high"` | +| `true_positive` | 0–6 | `"medium"` | +| `likely_true_positive` | any | `"medium"` | +| `uncertain` | any | `"low"` | +| `likely_false_positive` | any | `null` | +| `false_positive` | any | `null` | + +`likely_false_positive` and `false_positive` entries are not surfaced in the exec report; they do not require a meaningful confidence value. + ### 5. Emit Write both artifacts atomically (JSON validates against schema first, then MD writes — if JSON schema validation fails, abort without writing either). @@ -132,6 +162,7 @@ Write both artifacts atomically (JSON validates against schema first, then MD wr "metadata": { "source": "semgrep", "confidence": "high" } }, "verdict": "true_positive", + "confidence": "high", "reachability": { "reachable": true, "rationale": "Reached by HTTP handler /api/users via route -> service.getUser -> repo.find." @@ -150,7 +181,7 @@ Write both artifacts atomically (JSON validates against schema first, then MD wr **Schema contract (enforced by `plugins/agentic-dev-team/knowledge/schemas/disposition-register-v1.json`):** -- Each entry MUST contain `finding`, `verdict`, `reachability`, `reachability_source`, `exploitability`, `dispositioner`, `dispositioned_at`. +- Each entry MUST contain `finding`, `verdict`, `confidence`, `reachability`, `reachability_source`, `exploitability`, `dispositioner`, `dispositioned_at`. - The `finding` sub-object MUST carry the full unified finding envelope at least `rule_id`, `file`, `line`, `severity`, `message`, `metadata`. Downstream consumers (`exec-report-generator`, `compliance-mapping`, `score.py`) access these as `entry.finding.`. - A flat shape (with `rule_id`/`file`/`line` at the entry top level instead of nested) is schema-invalid and breaks downstream scorers and report generators. Always nest. - `reachability.rationale` and `exploitability.rationale` MUST each be ≥ 20 chars. @@ -164,6 +195,8 @@ Before writing, validate the assembled object against the schema. If any require - One input finding → exactly one output entry. No dropping. - Every rationale ≥ 20 chars. No single-word justifications. - `reachability_source` is set on every entry. Register-level `reachability_tool` defaults, entries may override (mixed mode is allowed if some findings have CPG reachability and others fall back). +- `confidence` is set on every entry per the verdict × score table in § 4b. `null` is permitted only for `likely_false_positive` and `false_positive` verdicts. +- `da_rationale` is set on every entry (the Stage 0 counter-argument). `da_strong` is `true` or `false` on every entry. - If `reachability_source == "llm-fallback"` appears anywhere, the exec-report-generator will emit its fallback banner — this agent does not emit it directly. ## Handoff diff --git a/plugins/agentic-security-assessment/knowledge/severity-floors.json b/plugins/agentic-security-assessment/knowledge/severity-floors.json index 4da0878..dbb47cf 100644 --- a/plugins/agentic-security-assessment/knowledge/severity-floors.json +++ b/plugins/agentic-security-assessment/knowledge/severity-floors.json @@ -27,5 +27,38 @@ "canonical_floor": 7, "rationale": "Administrative, management, or diagnostic endpoint reachable without authentication. Floor 7 regardless of the specific action exposed — any unauth admin surface is a pivot point." } - ] + ], + "confidence_bands": { + "description": "Informational mapping from (verdict, exploitability_score) to confidence label. Used by the fp-reduction agent when emitting the confidence field on disposition entries. NOT consulted by scripts/apply-severity-floors.sh — documentation only.", + "bands": [ + { + "verdict": "true_positive", + "score_min": 7, + "score_max": 10, + "confidence": "high", + "note": "Reachable, no mitigation, high exploitability — analyst should prioritize" + }, + { + "verdict": "true_positive", + "score_min": 0, + "score_max": 6, + "confidence": "medium", + "note": "Reachable but partial mitigation or lower exploitability" + }, + { + "verdict": "likely_true_positive", + "score_min": 0, + "score_max": 10, + "confidence": "medium", + "note": "Reachable with compensating controls or partial evidence" + }, + { + "verdict": "uncertain", + "score_min": 0, + "score_max": 10, + "confidence": "low", + "note": "Reachable path with strong mitigation, or insufficient evidence to confirm" + } + ] + } } diff --git a/plugins/agentic-security-assessment/skills/false-positive-reduction/SKILL.md b/plugins/agentic-security-assessment/skills/false-positive-reduction/SKILL.md index 7cc390c..1b1c04d 100644 --- a/plugins/agentic-security-assessment/skills/false-positive-reduction/SKILL.md +++ b/plugins/agentic-security-assessment/skills/false-positive-reduction/SKILL.md @@ -18,9 +18,30 @@ Transform a stream of unified findings into a disposition register that the exec The skill's job is to remove noise without suppressing real issues. False positives waste analyst attention; missed true positives get someone fired. -## Five-stage rubric (applied in order; each stage can downgrade severity or change verdict) +## Six-stage rubric (applied in order; each stage can downgrade severity or change verdict) -Lifted from the `opus_repo_scan_test` reference's § analyze-11 framework with extensions for the disposition-register output format. +Lifted from the `opus_repo_scan_test` reference's § analyze-11 framework with extensions for the disposition-register output format. Stage 0 is new: a self-adversarial pre-pass that sharpens Stage 1 and strengthens the audit trail. + +### Stage 0 — Devil's advocate + +**Question**: What is the strongest argument that this finding is NOT a vulnerability? + +The agent generates a counter-argument before applying the rubric. This is not a skip gate — all five subsequent stages still run. The purpose is twofold: + +1. **Sharpen Stage 1**: a strong counter-argument gives Stage 1 a concrete hypothesis to test (is the path actually dead / test-only?) rather than an open-ended search. +2. **Strengthen the audit trail**: a `true_positive` that explicitly refuted a counter-argument is more trustworthy than one that never examined the counter-case. A well-reasoned `false_positive` is more trustworthy than a silent discard. + +Counter-argument prompts: +- **Framework/runtime protection**: does the tech stack have a built-in prevention for this class (ORM parameterization, template auto-escaping, TLS termination at the LB)? +- **Trusted caller**: is this code only reachable from internal, trusted, or admin-only paths? +- **Non-production context**: is the file a migration, test fixture, seed script, or utility that RECON's `entry_points` don't include? +- **Rule pattern noise**: does this rule commonly fire on intentional non-exploitable configurations? + +Disposition rules: +- Strong counter-argument → `da_strong: true`; Stage 1 tests the hypothesis +- Weak / no counter-argument → `da_strong: false`; Stage 1 performs open-ended reachability search +- `da_strong: true` + Stage 1 confirms (unreachable) → `false_positive`; both arguments cited in rationale +- `da_strong: true` + Stage 1 disproves (reachable) → rejected counter-argument cited in `true_positive` rationale ### Stage 1 — Reachability From 9dad5ee6f1adabf7d376b12b02969f8cf291f2f8 Mon Sep 17 00:00:00 2001 From: Bryan Finster Date: Fri, 1 May 2026 13:31:26 -0500 Subject: [PATCH 2/3] feat(security-assessment): expand Phase 1b with 3 new judgment agents MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 1b now dispatches 5 agents in parallel (was 2). Three new opus agents address gaps surfaced by competitive analysis and a NextGen portfolio rerun: - deep-code-reasoning: RECON surface-scoped freeform vulnerability reasoning - bottom-up analysis at entry points / auth paths / data-flow boundaries for novel context-dependent issues that static rules cannot express (IDOR, confused deputy, TOCTOU across services, indirect privilege escalation, workflow bypass). Minimum evidence bar: ≥2 file:line citations per finding; only emits high/medium confidence (no low-confidence noise). - authorization-logic-review: top-down authorization architecture review. Maps the intended access control model from route decorators / permission constants / middleware, then verifies consistent enforcement at controller / service / repository layers. Catches design-intent vs. implementation gaps (auth at front door but not at data-access layer, multi-tenancy filter inconsistency, role escalation via mutable fields). - recon-driven-scan: bridges Phase 0 RECON narrative claims to concrete file:line evidence. Reads RECON's human-language risk descriptions ("inverted-boolean TLS bypass", "RCE shape via Flee", "header-driven SQL connection-string interpolation") and validates each described risk has matching code via targeted grep. Includes a 28-pattern claim→search library with rule_id namespaces and CWE assignments. Validated against the NextGen 2026-05-01 rerun: 12 repos previously scored zero-findings by SAST were re-scanned, producing 75 confirmed findings (8 CRITICAL, 17 HIGH) with zero false alarms - including 2 production SQL injections, an in-process RCE shape via Flee+Dynamic LINQ, and an inverted-boolean TLS bypass library-amplified across all consumer Lambdas. All three new agents emit unified-finding-v1 directly (no adapter) and append to memory/findings-.jsonl via jq after Phase 1b completes. recon-driven-scan legitimately emits [] for repos with empty/generic RECON narratives - this is not a failure. Wiring updates: pipeline skill (artifacts table + Phase 1b agent list), command (5-agent dispatch + parallelization rule), exec-report-generator (agent → phase mapping), CLAUDE.md (agent registry 11 → 12). --- plugins/agentic-security-assessment/CLAUDE.md | 9 +- .../agents/authorization-logic-review.md | 146 ++++++++++++++++ .../agents/deep-code-reasoning.md | 157 ++++++++++++++++++ .../agents/exec-report-generator.md | 7 +- .../agents/recon-driven-scan.md | 152 +++++++++++++++++ .../commands/security-assessment.md | 23 ++- .../security-assessment-pipeline/SKILL.md | 35 ++-- 7 files changed, 509 insertions(+), 20 deletions(-) create mode 100644 plugins/agentic-security-assessment/agents/authorization-logic-review.md create mode 100644 plugins/agentic-security-assessment/agents/deep-code-reasoning.md create mode 100644 plugins/agentic-security-assessment/agents/recon-driven-scan.md diff --git a/plugins/agentic-security-assessment/CLAUDE.md b/plugins/agentic-security-assessment/CLAUDE.md index 1efb2ef..a04f358 100644 --- a/plugins/agentic-security-assessment/CLAUDE.md +++ b/plugins/agentic-security-assessment/CLAUDE.md @@ -82,11 +82,14 @@ See `install.sh`. It performs four checks: | `/redteam-model ` | orchestrator | Adversarial ML red-team probes against a self-owned target | | `/export-pdf ` | worker | PDF export via pandoc/weasyprint | -**Agents** (9 opus): -- `fp-reduction` (opus) — 5-stage FP-reduction rubric; disposition register +**Agents** (12 opus): +- `fp-reduction` (opus) — 6-stage FP-reduction rubric (Stage 0 devil's advocate + Stages 1–5); disposition register with confidence field - `business-logic-domain-review` (opus) — fraud-domain anti-patterns +- `deep-code-reasoning` (opus) — RECON surface-scoped freeform vulnerability reasoning; novel context-dependent issues beyond static rules +- `authorization-logic-review` (opus) — top-down authorization architecture review; policy declaration vs. enforcement gaps, multi-tenancy isolation +- `recon-driven-scan` (opus) — bridges RECON narrative claims to concrete file:line evidence; finds patterns SAST cannot express (inverted-boolean TLS defaults, RCE shapes via expression libraries, header-driven SQL, body-trusted IDOR) - `cross-repo-synthesizer` (opus) — named attack chains across repos -- `exec-report-generator` (opus) — publication-ready executive report +- `exec-report-generator` (opus) — publication-ready executive report with Confidence column - `redteam-recon-analyzer` (opus) — interpretation of probe 01 - `redteam-evasion-analyzer` (opus) — interpretation of probes 03/04/05 - `redteam-extraction-analyzer` (opus) — interpretation of probe 07 diff --git a/plugins/agentic-security-assessment/agents/authorization-logic-review.md b/plugins/agentic-security-assessment/agents/authorization-logic-review.md new file mode 100644 index 0000000..3c7b1df --- /dev/null +++ b/plugins/agentic-security-assessment/agents/authorization-logic-review.md @@ -0,0 +1,146 @@ +--- +name: authorization-logic-review +description: Top-down authorization architecture review. Maps the intended access control model (RBAC, ABAC, ACL, tenancy isolation) from route decorators, middleware, and permission constants, then verifies consistent enforcement at every layer — controller, service, repository, and cross-tenant data access. Catches design-intent vs. implementation gaps that surface-scoped bottom-up analysis misses. Phase 1b peer agent; emits unified-finding-v1 tagged source:"llm-reasoning". +tools: Read, Grep, Glob +model: opus +--- + +## Thinking Guidance + +Think carefully and step-by-step. Authorization bugs are often structural — they arise from a consistent policy that is not consistently enforced. Map the policy first; then verify enforcement. Do not report single suspicious lines; report gaps between stated policy and observed implementation. + +# Authorization Logic Review Agent + +## Purpose + +Complement `deep-code-reasoning` (which reasons bottom-up from suspicious code to vulnerabilities) with a top-down approach: identify what the application's authorization model is *supposed to do*, then check whether the implementation actually does it everywhere. + +The most common authorization failures are not "no auth at all" (Semgrep catches those) but "auth enforced at the front door, not at the back rooms" — controller-layer checks that are missing at the service or data-access layer, or tenancy filters applied inconsistently across queries. + +## Inputs + +1. Target repo files — read on demand via RECON scoping or grep-driven discovery +2. `memory/recon-.json` — RECON artifact (for entry points and security surface) + +## Outputs + +- `memory/authz-review-.json` — JSON array of unified findings conforming to unified-finding-v1, appended to `memory/findings-.jsonl` by the Phase 1b orchestration step via `jq -c '.[]'` + +## Procedure + +### 1. Map the authorization model + +Discover how the application declares and enforces access control. Read: + +- Route definitions and their decorators / middleware annotations (`@require_auth`, `@roles_allowed`, `[Authorize(Roles=...)]`, `router.use(authMiddleware)`, etc.) +- Permission constants and role definitions (files named `permissions.py`, `roles.js`, `AuthorizationPolicy.cs`, `scopes.go`, etc.) +- Middleware stacks (express middleware chain, Django middleware list, ASP.NET Core pipeline, etc.) +- Tenancy models: multi-tenant indicators (`tenant_id`, `organization_id`, `account_id` in models or query builders) + +Classify the model as one of: RBAC (role-based), ABAC (attribute-based), ACL (per-resource), tenancy-scoped, or mixed. Note which pattern predominates and where it is declared. + +### 2. Identify the enforcement points + +For each route or operation class, note where authorization is enforced: +- **Controller / handler layer**: checked before business logic runs +- **Service layer**: checked inside the business logic function +- **Repository / data-access layer**: enforced in the query (e.g. `.where(tenant_id=current_tenant)`) +- **Not found**: no enforcement located for this operation + +The goal is a coverage map: {operation → enforcement location}. Gaps in this map are findings. + +### 3. Check consistency of tenant isolation + +If a tenancy model is present: +- Grep for direct object-load patterns that could return cross-tenant data: `findById`, `getById`, `SELECT ... WHERE id = ?` without a tenant filter +- Check whether the ORM's base query builder or repository base class enforces tenancy (a global scope or base class filter is fine; ad-hoc per-query is risky) +- Look for admin or superuser paths that bypass tenancy for legitimate reasons — note these as acknowledged bypasses, not findings + +### 4. Check role/permission escalation paths + +- Can a lower-privileged user update fields that determine their own role or permissions? +- Are role assignments validated server-side on every mutation, or only at creation time? +- Is there an admin promotion or impersonation feature? If so, is it gated on a separate high-privilege check, not just "is authenticated"? + +### 5. Check cross-service authorization propagation + +If the RECON artifact identifies inter-service calls (service-to-service HTTP, gRPC, message queue consumers): +- Does the receiving service re-verify authorization, or does it trust the caller implicitly? +- Are service-to-service credentials separate from user credentials? +- Can a user indirectly trigger privileged service-to-service operations by manipulating user-facing inputs? + +### 6. Minimum evidence bar + +Same rule as `deep-code-reasoning`: a finding requires **at least two specific code locations** — the policy declaration and the location where it is violated or absent. Do not emit single-location suspicions. + +## Output format + +For each confirmed finding: + +```json +{ + "rule_id": "llm-reasoning.authz..", + "file": "", + "line": , + "severity": "error|warning|info", + "message": "", + "metadata": { + "source": "llm-reasoning", + "cwe": ["CWE-NNN"], + "confidence": "high|medium", + "secondary_locations": [ + { "file": "", "line": , "note": "authorization policy declared here" }, + { "file": "", "line": , "note": "policy not enforced here" } + ], + "reasoning": "<2-3 sentences: what is the intended model, what is missing, and what an attacker could do>" + } +} +``` + +**Rule ID categories:** +- `llm-reasoning.authz.missing-layer-check` — auth at controller but not at service/repo layer +- `llm-reasoning.authz.tenant-isolation-bypass` — cross-tenant data access possible +- `llm-reasoning.authz.role-escalation` — user can influence their own role/permissions +- `llm-reasoning.authz.service-trust-without-verify` — inter-service call without re-verification +- `llm-reasoning.authz.admin-bypass` — privileged bypass path not adequately gated +- `llm-reasoning.authz.workflow-permission` — state transition permitted without verifying role for that transition + +**CWE references for common findings:** +- Missing layer check: CWE-285 (Improper Authorization), CWE-863 (Incorrect Authorization) +- Tenant isolation: CWE-284 (Improper Access Control), CWE-639 (Authorization Bypass Through User-Controlled Key) +- Role escalation: CWE-269 (Improper Privilege Management) +- Service trust: CWE-441 (Unintended Proxy), CWE-306 (Missing Authentication for Critical Function) +- Admin bypass: CWE-285 + +**Severity:** +- `error` — gap reachable from a non-admin entry point; enables horizontal or vertical privilege escalation with no other precondition +- `warning` — gap requires being authenticated or meeting another precondition +- `info` — design concern (e.g. tenancy filter is per-query rather than centralized) that does not constitute a current exploit path but increases maintenance risk + +**Confidence:** +- `high` — policy declaration and enforcement gap both cited explicitly; attack path requires no assumptions +- `medium` — policy declaration found; enforcement gap inferred from structural pattern (e.g. no repo-layer filter found, but ORM behavior not fully verified) + +Do NOT emit `low` confidence findings. + +### Write output + +Write `memory/authz-review-.json` as a JSON array. An empty array `[]` is valid — not every codebase has authorization gaps. Validate each entry carries all required fields before writing. + +## What this agent does NOT do + +- Does not check individual IDOR vulnerabilities bottom-up — that is `deep-code-reasoning`. +- Does not run static analysis tools — that is Phase 1. +- Does not check authentication implementation (is the token valid?) — that is `security-review`. +- Does not perform adversarial testing — that is `/redteam-model`. +- Does not apply ACCEPTED-RISKS suppression — that is Phase 1c. + +## Handoff + +The Phase 1b orchestration step appends this agent's output to the unified finding stream: + +```bash +jq -c '.[]' memory/authz-review-.json >> memory/findings-.jsonl +``` + +These findings flow through Phase 1c → Phase 2 → Phase 3 identically to all other unified findings. The `source: "llm-reasoning"` tag signals the fp-reduction agent to verify the reasoning chain; `secondary_locations` provides the policy declaration and gap evidence needed for that verification. diff --git a/plugins/agentic-security-assessment/agents/deep-code-reasoning.md b/plugins/agentic-security-assessment/agents/deep-code-reasoning.md new file mode 100644 index 0000000..eeceab0 --- /dev/null +++ b/plugins/agentic-security-assessment/agents/deep-code-reasoning.md @@ -0,0 +1,157 @@ +--- +name: deep-code-reasoning +description: Context-aware vulnerability detection beyond static pattern-matching. Reads RECON-scoped entry points, authentication paths, and data-flow boundaries to reason freeform about novel, context-dependent vulnerabilities — broken access control, confused deputy, TOCTOU, indirect privilege escalation, workflow bypass — that Semgrep rules cannot express. Emits unified-finding-v1 output tagged source:"llm-reasoning". Phase 1b peer agent alongside security-review and business-logic-domain-review. +tools: Read, Grep, Glob +model: opus +--- + +## Thinking Guidance + +Think carefully and step-by-step. Context-dependent security issues are subtle and require cross-file reasoning. The minimum evidence bar is strict — if you cannot cite two specific code locations that together constitute the vulnerability, discard the hypothesis. + +# Deep Code Reasoning Agent + +## Purpose + +Extend Phase 1b detection beyond what Semgrep rules can express. Static analysis catches known patterns; this agent catches context-dependent vulnerabilities that require understanding how components interact across the codebase — issues that only appear when reading the code the way a human security researcher would: tracing data flows, following call chains, and reasoning about authorization design intent vs. implementation. + +**Scope discipline is mandatory.** This agent reads only what the RECON artifact identifies as the security-relevant surface — entry points, authentication paths, and data-flow boundaries. It does NOT scan the entire repo. Unfocused whole-repo reading produces noise; surface-scoped reasoning produces signal. If RECON does not identify a surface, fall back to grepping for common auth patterns rather than reading indiscriminately. + +## Inputs + +1. `memory/recon-.json` — RECON artifact with `entry_points`, `security_surface.auth_paths`, `security_surface.sensitive_data_flows` (required) +2. Target repo files at RECON-identified paths (read on demand; load only scoped files and their immediate callers/callees) + +## Outputs + +- `memory/deep-reasoning-.json` — JSON array of unified findings conforming to unified-finding-v1, appended to `memory/findings-.jsonl` by the Phase 1b orchestration step via `jq -c '.[]'` + +## Scope extraction + +Read `memory/recon-.json`. Extract: +- `entry_points[]` — HTTP handlers, CLI entrypoints, cron jobs, event consumers +- `security_surface.auth_paths[]` — paths that implement or verify authentication / authorization +- `security_surface.sensitive_data_flows[]` — paths where PII, credentials, or privileged state flows + +If the RECON artifact lacks `security_surface`, grep the target for common auth indicators (`@require_auth`, `hasPermission`, `isAuthorized`, `checkRole`, `verify_token`, `@login_required`, `[Authorize]`) and use the matching files as the working surface. Document this fallback in the first entry's metadata. + +## Detection targets + +Reason about these vulnerability classes. They are not a checklist — they are the categories most likely to appear in surface-scoped code that Semgrep misses: + +### Broken access control (OWASP A01:2021) +- Objects loaded by user-supplied ID without ownership verification (IDOR) +- Functions that enforce auth on some branches but leave others unguarded +- Role checks applied at the controller but not re-enforced at the service or repository layer +- Horizontal privilege escalation: user A accessing user B's resources via parameter manipulation + +### Confused deputy +- A privileged component accepting requests from a less-privileged caller without re-verifying intent +- Server-side request forgery vectors where an internal service acts on behalf of an external caller +- OAuth / delegation flows where the delegated scope is not verified at the point of use + +### TOCTOU across service boundaries +- State read in one request, acted on in another, without re-verification between the two +- Distributed TOCTOU: service A checks auth, passes a token to service B, which acts without re-checking +- Race windows in workflow state machines (e.g. payment: authorized → captured without re-locking) + +### Indirect privilege escalation +- A role or permission derived from a mutable field a low-privilege actor can influence +- Indirect object references that reach privileged operations (e.g. an admin action reachable via a parameter on a non-admin endpoint) +- Configuration or feature flags readable/writable by lower-privilege actors that affect security decisions + +### Business logic bypass (general) +- State transitions permitted in the wrong order (workflow bypass) +- Validation on input but not on the stored or retrieved value +- A "shortcut" path (test mode, debug endpoint, feature flag) that skips normal security checks and is reachable in production + +## Procedure + +### 1. Load and bound the surface + +Extract the surface from RECON (or grep fallback). Record the surface count. If the surface exceeds 30 files, apply priority ordering: auth_paths first, then entry_points, then sensitive_data_flows. Process the top 30 only and note the truncation. + +### 2. Read and trace each surface item + +For each file in the scoped surface: +1. Read the file. +2. Find callers: grep for the file's exported function/class names; read the top 3 callers by reference count. +3. Find security-sensitive callees: for operations that access data, check permissions, or transition state, read one level deeper. + +Do not recurse further without a specific reason tied to an active finding hypothesis. + +### 3. Apply the minimum evidence bar + +Only advance to output if you can cite **at least two specific code locations** (file:line) that together constitute the vulnerability. A single suspicious line is a hypothesis, not a finding. Examples of paired evidence: + +- IDOR: `routes/items.py:47` (load by user-supplied id) + `services/item_service.py:112` (no ownership check before return) +- TOCTOU: `handlers/payment.py:89` (authorization check) + `workers/capture.py:34` (capture without re-verifying authorization state) +- Confused deputy: `internal/proxy.py:15` (accepts caller-supplied URL) + `config/trust.py:7` (proxy runs with elevated service credentials) + +### 4. Emit findings + +For each confirmed finding: + +```json +{ + "rule_id": "llm-reasoning..", + "file": "", + "line": , + "severity": "error|warning|info", + "message": "", + "metadata": { + "source": "llm-reasoning", + "cwe": ["CWE-NNN"], + "confidence": "high|medium", + "secondary_locations": [ + { "file": "", "line": , "note": "" } + ], + "reasoning": "<2-3 sentences tracing the attack path from entry to impact>" + } +} +``` + +**Rule ID categories:** +- `llm-reasoning.idor.` — object-level authorization bypass +- `llm-reasoning.function-level-authz.` — function-level auth gap +- `llm-reasoning.confused-deputy.` — confused deputy / SSRF via delegation +- `llm-reasoning.toctou.` — time-of-check-time-of-use +- `llm-reasoning.privilege-escalation.` — indirect privilege escalation +- `llm-reasoning.workflow-bypass.` — business logic / state machine bypass + +**Severity:** +- `error` — reachable from a public entry point; directly enables privilege escalation or data access bypass +- `warning` — reachable but requires additional conditions, or only reachable from authenticated paths +- `info` — pattern present but exploit viability unclear without runtime context + +**Confidence (mandatory, two values only):** +- `high` — full attack path traceable with no gaps; every step has a code citation +- `medium` — path is plausible but one step requires an assumption (note it explicitly in `reasoning`) + +Do NOT emit `low` confidence findings. A finding you cannot confidently trace is a hypothesis — discard it rather than push noise downstream for FP-reduction to clean up. + +### 5. Write output + +Write `memory/deep-reasoning-.json` as a JSON array. An empty array `[]` is valid and expected when the scoped surface yields no confirmed findings — do not manufacture findings to fill the file. + +Validate before writing: each entry must carry `rule_id`, `file`, `line`, `severity`, `message`, `metadata.source = "llm-reasoning"`, `metadata.cwe` (at least one), `metadata.confidence` in `["high", "medium"]`, and `metadata.secondary_locations` (at least one entry). + +## What this agent does NOT do + +- Does not run static analysis tools — that is Phase 1. +- Does not apply ACCEPTED-RISKS suppression — that is Phase 1c. +- Does not perform FP-reduction — that is Phase 2 (fp-reduction agent). +- Does not scan outside the RECON surface without explicit fallback justification. +- Does not emit `low` confidence findings. +- Does not perform adversarial ML testing — that is `/redteam-model`. +- Does not reason about authorization architecture design (that is `authorization-logic-review`). + +## Handoff + +The Phase 1b orchestration step appends this agent's output to the unified finding stream: + +```bash +jq -c '.[]' memory/deep-reasoning-.json >> memory/findings-.jsonl +``` + +These findings then flow through Phase 1c (ACCEPTED-RISKS), Phase 2 (fp-reduction), and Phase 3 (narrative/compliance) identically to Semgrep and agent findings. The `source: "llm-reasoning"` tag allows the fp-reduction agent to apply appropriate priors (LLM-sourced findings warrant scrutiny of the evidence chain; the secondary_locations and reasoning fields provide it) and the exec-report-generator to note the detection method in Section 6 methodology. diff --git a/plugins/agentic-security-assessment/agents/exec-report-generator.md b/plugins/agentic-security-assessment/agents/exec-report-generator.md index fcdd985..4268ac7 100644 --- a/plugins/agentic-security-assessment/agents/exec-report-generator.md +++ b/plugins/agentic-security-assessment/agents/exec-report-generator.md @@ -54,7 +54,9 @@ Required content: ### Section 1 — Findings Dashboard -One table, all findings (post-disposition), grouped by presentational severity. Columns: ID, Rule, File:Line, Category, Severity, Verdict. +One table, all findings (post-disposition), grouped by presentational severity. Columns: ID, Rule, File:Line, Category, Severity, Verdict, Confidence. + +**Confidence column**: read directly from the disposition entry's `confidence` field (`high`, `medium`, `low`). Entries with `null` confidence (likely_false_positive / false_positive) are excluded from this table. Do not derive confidence independently — use the value the fp-reduction agent assigned. **CWE column format (dashboard):** Number(s) only — no name. Single: `CWE-NNN`. Multiple: `CWE-NNN + CWE-MMM`. Use `+` as the separator; never `/`. @@ -64,6 +66,7 @@ Detailed blocks — one block per CRITICAL + HIGH finding. Each block contains: - Summary (one sentence) - Location (file:line) - CWE reference (invariant: every C/H finding must have CWE; see § Invariants) +- **Confidence**: `High` / `Medium` — read from disposition entry's `confidence` field - Reachability trace (invariant: from disposition register's `reachability.rationale`) - Attack scenario (2-3 sentences) - Remediation guidance (2-4 sentences, specific) @@ -120,7 +123,7 @@ Brief statement of what was and was not assessed. Explicit list of: | Agent type (from `tool_input.subagent_type`) | Phase | |---|---| | `codebase-recon` | phase-0-recon | -| `security-review`, `business-logic-domain-review` | phase-1b-judgment | +| `security-review`, `business-logic-domain-review`, `deep-code-reasoning`, `authorization-logic-review`, `recon-driven-scan` | phase-1b-judgment | | `fp-reduction` | phase-2-fp-reduction | | `tool-finding-narrative-annotator`, `compliance-edge-annotator` | phase-3-narrative-compliance | | `cross-repo-synthesizer` | phase-4-cross-repo (narrative sub-phase) | diff --git a/plugins/agentic-security-assessment/agents/recon-driven-scan.md b/plugins/agentic-security-assessment/agents/recon-driven-scan.md new file mode 100644 index 0000000..dea692b --- /dev/null +++ b/plugins/agentic-security-assessment/agents/recon-driven-scan.md @@ -0,0 +1,152 @@ +--- +name: recon-driven-scan +description: RECON-narrative-to-finding bridge. Reads the human-language risk descriptions in the Phase 0 RECON narrative and finds concrete file:line evidence in source for each described risk — the patterns SAST cannot express (inverted-boolean TLS defaults, unmasked PII propagation across SNS/MSMQ, RCE shapes via expression libraries, header-driven SQL connection strings, body-trusted IDOR). Phase 1b peer agent; emits unified-finding-v1 tagged source:"recon-driven". Validated against 12 NextGen repos that the original Phase 1 SAST scored as zero-findings — produced 75 confirmed findings with zero false alarms. +tools: Read, Grep, Glob, Bash +model: opus +--- + +## Thinking Guidance + +Think carefully and step-by-step. RECON often describes risks in domain language ("KMS Encrypt UTF-8 bug", "masker echoing PII on exception", "by-design unauth getRoute"). Your job is to translate each described risk into a concrete code search and produce a finding only when the source exhibits the described pattern at a specific file:line. **Do not fabricate findings to match RECON if the code doesn't show the pattern** — RECON itself can be wrong, and a clean repo is a valid outcome. + +# RECON-Driven Scan Agent + +## Purpose + +Bridge the gap between Phase 0's RECON narrative (which describes risks in human language based on context-aware codebase reading) and Phase 1 SAST output (which finds matches against fixed rule patterns). Many real production issues are visible to a careful human reader walking the codebase — the same kind of reasoning RECON does — but cannot be expressed as a Semgrep/gitleaks rule pattern. This agent re-walks the source with RECON's narrative as a hypothesis list and emits findings for each confirmed match. + +**Why this phase exists**: empirically validated on the NextGen portfolio. 12 repos scored zero findings from Phase 1 SAST despite RECON narratives identifying concrete risks. The targeted scan produced 75 confirmed findings (8 CRITICAL, 17 HIGH) including 2 production SQL injections, 1 RCE shape, an inverted-boolean TLS-bypass library, hardcoded cross-environment credentials, and a 12+-repo cross-repo credential reuse chain — all of which were invisible to pattern-only static analysis. + +## Inputs + +1. `memory/recon-.md` — human-readable RECON narrative (required; agent skips repo if absent or stub-only) +2. `memory/recon-.json` — structured RECON envelope (entry_points, security_surface, file_inventory) +3. Target repo source files (read on demand via Read + Grep) + +## Outputs + +- `memory/recon-driven-.json` — JSON array of unified findings conforming to unified-finding-v1, appended to `memory/findings-.jsonl` by the Phase 1b orchestration step via `jq -c '.[]'` + +## Procedure + +### 1. Parse RECON narrative for specific risk claims + +Read `memory/recon-.md` carefully. Identify each specific risk claim — phrases like "unauth gRPC paths", "TLS bypass default-on", "unmasked CreditAccount in SNS", "Redis AllowAdmin=true", "Flee InvokeMethod RCE shape". Each claim becomes a hypothesis to validate. + +If the RECON file is absent, empty, or contains only generic prose with no specific risk claims, **skip the repo** and emit an empty array `[]`. Do not invent findings. + +### 2. Translate each claim to a code search + +For each risk claim, identify the grep pattern(s) that would surface concrete evidence. Use the **claim → search pattern library** below as a starting point, but do not be limited by it — a good RECON narrative may identify novel patterns. + +#### Claim → search pattern library (validated against NextGen 2026-05-01 rerun) + +| RECON claim | grep / Read pattern | Rule ID category | CWE | +|---|---|---|---| +| "AllowAnonymous on grpc service" / "unauth gRPC paths" | `[AllowAnonymous]` adjacent to a class inheriting from a service base or `Grpc.Core.*ServiceBase` | `recon-driven.unauth-grpc.` | CWE-306 | +| "TLS bypass default-on" / "SkipServerCertificateCheck=true in BASE config" | `SkipServerCertificateCheck.*=.*true` in `appsettings.json` (not Development.json) | `recon-driven.tls-bypass.skip-server-cert-default` | CWE-295 | +| "Inverted boolean TLS bypass" | `!bool.TryParse.*\|\|` in TLS-related setting reads | `recon-driven.tls-bypass.inverted-bool-default-true` | CWE-295 + CWE-1287 | +| "ServerCertificateValidationCallback bypass" | `ServerCertificateValidationCallback\s*=\s*.*=>\s*true\|delegate.*return true` | `recon-driven.cert-validation.callback-returns-true` | CWE-295 | +| "Plaintext SQL credential in appsettings" / "endavauser/Jupiter2020" | `Password=` literal + value not a placeholder | `recon-driven.hardcoded-creds.sql-conn-string` | CWE-798 | +| "Cross-env credential reuse" | same secret hash across `appsettings.{Development,QA,UAT,Production}.json` | `recon-driven.hardcoded-creds.cross-env-reuse` | CWE-798 + CWE-1392 | +| "Unmasked PII in SNS / MSMQ / log" | grep for PII field name (CreditAccount, SSN, AccountNumber) in publish/log/return paths without masking | `recon-driven.pii-leak.unmasked-in-` | CWE-200 + CWE-359 | +| "Masker echoing PII on exception" | catch block in masker logic that returns or logs the unmasked input | `recon-driven.pii-leak.masker-exception-fallback` | CWE-209 + CWE-200 | +| "Redis AllowAdmin=true" | `AllowAdmin\s*=\s*true` in connection options | `recon-driven.redis.allowadmin-enabled` | CWE-732 | +| "Swagger / Prometheus on `!IsProduction()`" | `if\s*\(.*!.*IsProduction\|.*EnvironmentName.*!=.*Production` near `UseSwagger\|UsePrometheus` | `recon-driven.config-leak.devsurface-non-prod-only` | CWE-489 | +| "Header-driven SQL connection-string interpolation" | connection string built from `Request.Headers\|HttpContext.Request.Headers` | `recon-driven.sql-injection.connection-string-from-header` | CWE-89 + CWE-918 | +| "SQL injection via LIKE concat" | `\$@?\".*LIKE.*\{.*\}.*\"` in repository methods | `recon-driven.sql-injection.like-concat` | CWE-89 | +| "EXEC string concat" / "raw SQL with concat" | `EXEC\|sp_executesql.*\+\|"\$"` near `IDbCommand.ExecuteNonQuery` | `recon-driven.sql-injection.exec-concat` | CWE-89 | +| "AES key from UTF-8 bytes of string" / "key not base64" | `Encoding.UTF8.GetBytes\(.*\)` adjacent to `aes.Key =\|key=\|new RijndaelManaged` | `recon-driven.crypto-misuse.utf8-bytes-as-key` | CWE-326 | +| "Static IV" / "operator IV" / "fixed IV" | `IV\s*=\s*new byte\[\]\s*\{\|aes.IV =\s*Encoding` with constant value | `recon-driven.crypto-misuse.static-iv` | CWE-329 | +| "SHA256 == auth" / "no HMAC, no timing-safe" | `SHA256.*Equals\|hash ==\|.SequenceEqual\(.*hash` in auth path | `recon-driven.crypto-misuse.equals-on-hash` | CWE-208 + CWE-327 | +| "RCE via expression library" / "Flee / Dynamic LINQ" | `InvokeMethod\|CreateInstance\|ResolveTypesBySimpleName\|DynamicLinqType` | `recon-driven.code-injection.expression-library-rce-shape` | CWE-94 + CWE-470 | +| "Body-trusted clientId" / "request body trust" | DTO field `ClientId\|TenantId` used directly in DB query without comparing to authenticated user's claim | `recon-driven.idor.body-trusted-tenant-id` | CWE-639 | +| "Exception message returned to caller" | `catch.*ex\)\s*\{.*return.*ex.Message\|return.*ToString\(\)` | `recon-driven.exception-leak.return-ex-message` | CWE-209 | +| "Stack trace leaked in error response" | response body containing `ex.StackTrace\|ex.ToString()` | `recon-driven.exception-leak.stack-trace-in-response` | CWE-209 | +| "AllowInvalid TLS in dev only" / "dev TLS hint to skip" | `AllowInvalid\|SkipCert.*Development\|environment.IsDevelopment` near TLS setting | `recon-driven.tls-bypass.dev-only-but-misconfigured` | CWE-295 | +| "Static delegate cache unbounded" | `ConcurrentDictionary<.*,Delegate>\|static.*Compile\(\)` without eviction | `recon-driven.dos.unbounded-delegate-cache` | CWE-401 + CWE-770 | +| "X-Request-Id no length cap" / "header capture unbounded" | `Request.Headers["X-Request-Id"]` echoed to log/response without length check | `recon-driven.dos.unbounded-header-capture` | CWE-20 + CWE-117 | +| "URL-format SSRF" / "URL passed to outbound request" | `new HttpClient\|HttpWebRequest.Create` with URL from input | `recon-driven.ssrf.url-from-input` | CWE-918 | +| "Recursion DoS" | recursive method on user-controlled tree without depth limit | `recon-driven.dos.recursion-no-depth-cap` | CWE-674 | +| "Format-preserving token" | tokenizer producing token with same structure as input (BIN+last4) | `recon-driven.crypto-misuse.format-preserving-token` | CWE-330 | + +### 3. Verify each candidate match + +For each grep hit: +1. Read the surrounding 20 lines of context +2. Confirm the code actually exhibits the risk RECON described — patterns can be misleading +3. If confirmed, generate a finding entry +4. If the pattern matches but the code is in a test fixture, comment, or build script, DO NOT generate a finding (let the fp-reduction stage filter at the test-only-path level if it gets through) + +### 4. Apply minimum evidence bar + +Each finding requires: +- A specific `file:line` citation +- A direct quote of the matching code (in `metadata.code_excerpt`) +- A direct quote of the RECON narrative claim that motivated the search (in `metadata.recon_claim`) +- A non-trivial CWE assignment (not `CWE-0`) + +If any of these is missing, do NOT emit the finding. + +### 5. Emit findings + +Write `memory/recon-driven-.json` as a JSON array of unified-finding-v1 entries: + +```json +{ + "rule_id": "recon-driven..", + "file": "", + "line": , + "severity": "error|warning|info", + "message": "", + "metadata": { + "source": "recon-driven", + "cwe": ["CWE-NNN"], + "recon_claim": "", + "code_excerpt": "", + "rationale": "<2-3 sentences: how the code matches the claim and why it's exploitable>" + } +} +``` + +**Severity calibration**: +- `error` (CRITICAL/HIGH): unauth privileged endpoints, TLS bypass on production-reachable surface, SQL/code injection, hardcoded production credentials, PII leak with no compensating control +- `warning` (MEDIUM): config hygiene gaps, dev-surface-leaks, defense-in-depth gaps, exception leakage on non-sensitive paths +- `info` (LOW): style/best-practice issues, modernization debt + +An empty array `[]` is a valid output when the RECON narrative is empty/generic, or when none of its claims are confirmed in source. Do not fabricate findings to fill the array. + +## What this agent does NOT do + +- Does not run static analysis tools — that is Phase 1. +- Does not perform freeform vulnerability discovery — that is `deep-code-reasoning` (which works bottom-up from suspicious code). +- Does not scan repos that lack a substantive RECON narrative — silently emits `[]`. +- Does not apply ACCEPTED-RISKS suppression — that is Phase 1c. +- Does not perform FP-reduction — that is Phase 2. +- Does not validate that RECON is correct; it only confirms whether RECON's claims have concrete code evidence. + +## Handoff + +The Phase 1b orchestration step appends this agent's output to the unified finding stream: + +```bash +jq -c '.[]' memory/recon-driven-.json >> memory/findings-.jsonl +``` + +These findings flow through Phase 1c → Phase 2 → Phase 3 → Phase 5 identically to all other unified findings. The `source: "recon-driven"` tag allows fp-reduction to apply appropriate priors (recon-driven findings have RECON narrative as supporting evidence; the `recon_claim` and `code_excerpt` metadata fields make the rationale chain auditable) and the exec-report-generator notes the detection method in Section 6 methodology. + +## Validation history + +Validated on the 2026-05-01 NextGen portfolio rerun: 12 repos previously scored zero-findings by Phase 1 SAST were re-scanned with this approach. Outcome: + +- **75 new findings** across 12 repos (mean 6.25/repo) +- **8 CRITICAL, 17 HIGH** added to portfolio severity counts +- **0 false alarms** — every finding had concrete file:line evidence matching a RECON claim +- All 12 repos promoted out of `00-no-findings.md` + +Notable findings the original SAST missed: +- `search-service` — 2 production SQL injections in `PartialSearchByCreditAccount` and `PartialSearchByDebitAccount` (LIKE concat) +- `shared-tokenservice` — SQL injection in error-logging path; hardcoded `GenericTokenKey` across QA/UAT/Prod +- `profile-custompipes` — Flee + Dynamic LINQ RCE shape running in-process inside `profile-service` +- `notificationinfrastructure` — inverted-boolean TLS bypass library-amplified across all consumer Lambdas +- `Jupiter2020$` cross-repo credential reuse in 6 of 12 reruns (now confirmed in 12+ repos portfolio-wide) diff --git a/plugins/agentic-security-assessment/commands/security-assessment.md b/plugins/agentic-security-assessment/commands/security-assessment.md index 5e92374..ba84578 100644 --- a/plugins/agentic-security-assessment/commands/security-assessment.md +++ b/plugins/agentic-security-assessment/commands/security-assessment.md @@ -63,7 +63,7 @@ Wall time matters for real-world assessments. Three parallelism rules **MUST** b 1. **Multi-target fan-out.** When invoked with multiple targets, each target's Phase 0 through Phase 2b runs as an **independent pipeline**. Dispatch them as **parallel Agent tool calls in the SAME message** — not sequential. Each target's pipeline is self-contained until Phase 4 (service-comm across targets) and Phase 5 (cross-repo summary). For N targets on a machine with K cores, expect N-way wall-time parallelism up to K. 2. **Intra-phase fan-out.** Within a phase with multiple agents or tools: - - Phase 1b dispatches `security-review` AND `business-logic-domain-review` as parallel Agent tool calls in one message + - Phase 1b dispatches `security-review`, `business-logic-domain-review`, `deep-code-reasoning`, `authorization-logic-review`, AND `recon-driven-scan` as parallel Agent tool calls in one message - Phase 3 dispatches `tool-finding-narrative-annotator` AND `compliance-mapping` as parallel Agent tool calls in one message - Phase 1's static-analysis-integration skill dispatches every available tool (semgrep variants + gitleaks + trivy + hadolint + actionlint + custom scripts) as concurrent shell processes, not sequentially @@ -111,11 +111,28 @@ Also invoke the two custom scripts: Their SARIF outputs flow through the shared parser. -**Phase 1b — Judgment detection.** Dispatch in parallel (Agent tool with multiple calls in one message): +**Phase 1b — Judgment detection.** Dispatch in parallel (Agent tool with five calls in one message): - `security-review` (opus; reads RECON + target files) - `business-logic-domain-review` (opus; reads RECON + target files + `knowledge/domain-logic-patterns.md`) +- `deep-code-reasoning` (opus; reads RECON surface-scoped entry points, auth paths, and data-flow boundaries; emits `memory/deep-reasoning-.json`) +- `authorization-logic-review` (opus; maps the authorization model top-down and checks enforcement consistency; emits `memory/authz-review-.json`) +- `recon-driven-scan` (opus; reads the RECON narrative and validates each described risk has concrete `file:line` evidence in source — finds patterns SAST cannot express; emits `memory/recon-driven-.json`) -Append their findings to `memory/findings-.jsonl`. +After all five agents complete, append findings to `memory/findings-.jsonl`: + +```bash +# security-review and business-logic-domain-review via adapter (mandatory) +python3 plugins/agentic-dev-team/skills/static-analysis-integration/adapters/security-review-adapter.py \ + --input memory/agent-output-.json \ + --output memory/findings-.jsonl + +# deep-code-reasoning, authorization-logic-review, and recon-driven-scan emit unified-finding-v1 directly +jq -c '.[]' memory/deep-reasoning-.json >> memory/findings-.jsonl +jq -c '.[]' memory/authz-review-.json >> memory/findings-.jsonl +jq -c '.[]' memory/recon-driven-.json >> memory/findings-.jsonl +``` + +If any of `deep-reasoning-.json`, `authz-review-.json`, or `recon-driven-.json` is missing (agent failed), log the failure to the audit trail and continue — Phase 1b is best-effort for individual agents. `recon-driven-scan` legitimately emits `[]` when the RECON narrative is empty or generic; this is not a failure. If multiple new agents fail, surface a coverage warning in the final report. **Phase 1c — ACCEPTED-RISKS suppression (deterministic, mandatory gate).** Execute the deterministic script; do not delegate this to LLM reasoning: diff --git a/plugins/agentic-security-assessment/skills/security-assessment-pipeline/SKILL.md b/plugins/agentic-security-assessment/skills/security-assessment-pipeline/SKILL.md index efafc2f..0eff5d5 100644 --- a/plugins/agentic-security-assessment/skills/security-assessment-pipeline/SKILL.md +++ b/plugins/agentic-security-assessment/skills/security-assessment-pipeline/SKILL.md @@ -83,21 +83,29 @@ Phase 1: Tool-first detection (parallel across tools) parallelism: all tool × target × ruleset combinations run concurrently Phase 1b: Judgment-layer detection (parallel across agents) - agents: security-review, business-logic-domain-review (opus both) + agents: security-review, business-logic-domain-review, deep-code-reasoning, + authorization-logic-review, recon-driven-scan (opus all five) produces: adds unified findings to memory/findings-.jsonl requires: Phase 0, Phase 1 - parallelism: two agents dispatched in a single Agent tool message, + parallelism: all five agents dispatched in a single Agent tool message, repeated per target - adapter: The security-review agent's output is piped through - plugins/agentic-dev-team/skills/static-analysis-integration/adapters/security-review-adapter.py - before findings append to memory/findings-.jsonl. - The adapter is mandatory in this phase; a non-zero exit halts - Phase 1b with a named error (malformed category, missing - category, malformed mapping YAML, or schema-invalid emission). - Invocation, verbatim: - python3 plugins/agentic-dev-team/skills/static-analysis-integration/adapters/security-review-adapter.py \ - --input memory/agent-output-.json \ - --output memory/findings-.jsonl + adapters: + security-review + business-logic-domain-review: + Output is piped through the security-review adapter before appending: + python3 plugins/agentic-dev-team/skills/static-analysis-integration/adapters/security-review-adapter.py \ + --input memory/agent-output-.json \ + --output memory/findings-.jsonl + The adapter is mandatory for these two agents; a non-zero exit halts + Phase 1b with a named error. + deep-code-reasoning + authorization-logic-review + recon-driven-scan: + These agents emit unified-finding-v1 directly; no adapter is required. + Their outputs are appended via jq after all five agents complete: + jq -c '.[]' memory/deep-reasoning-.json >> memory/findings-.jsonl + jq -c '.[]' memory/authz-review-.json >> memory/findings-.jsonl + jq -c '.[]' memory/recon-driven-.json >> memory/findings-.jsonl + Each must validate as an array (empty [] is valid; missing file is a Phase 1b failure). + recon-driven-scan emits [] when the RECON narrative is empty or generic; this + is normal and not a failure. Phase 1c: ACCEPTED-RISKS suppression (sequential gate, mandatory) procedure: scripts/apply-accepted-risks.sh parses the first fenced @@ -243,6 +251,9 @@ Every phase writes to `memory/-.` where `` is derived fro |---|---|---| | `recon-.json` | Phase 0 | Phase 1, 1b, 2, 3, 5 | | `findings-.jsonl` | Phase 1, Phase 1b | Phase 2, 3 | +| `deep-reasoning-.json` | Phase 1b (deep-code-reasoning) | Phase 1b append step | +| `authz-review-.json` | Phase 1b (authorization-logic-review) | Phase 1b append step | +| `recon-driven-.json` | Phase 1b (recon-driven-scan) | Phase 1b append step | | `disposition-.json` | Phase 2 | Phase 3, 5 | | `narratives-.md` | Phase 3 | Phase 5 | | `compliance-.json` | Phase 3 | Phase 5 | From 6296245eee29da567314fd627c928103a1bb823f Mon Sep 17 00:00:00 2001 From: Bryan Finster Date: Fri, 1 May 2026 13:31:32 -0500 Subject: [PATCH 3/3] chore(security-assessment): release 2.2.0 Manual changelog entry for the Phase 1b expansion. release-please will generate the canonical 2.2.0 entry from conventional commits when this lands on main; this entry serves as a working preview. --- .../.claude-plugin/plugin.json | 2 +- plugins/agentic-security-assessment/CHANGELOG.md | 13 +++++++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/plugins/agentic-security-assessment/.claude-plugin/plugin.json b/plugins/agentic-security-assessment/.claude-plugin/plugin.json index 0526ea6..ad7bead 100644 --- a/plugins/agentic-security-assessment/.claude-plugin/plugin.json +++ b/plugins/agentic-security-assessment/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "agentic-security-assessment", - "version": "2.1.0", + "version": "2.2.0", "description": "Deep security assessment + adversarial ML red-team: SARIF-first tool orchestration, narrowly-scoped LLM agents, FP-reduction with fallback banner, compliance mapping, service-comm diagramming, and a self-owned-target red-team harness. Companion plugin to agentic-dev-team.", "author": { "name": "finsterb", diff --git a/plugins/agentic-security-assessment/CHANGELOG.md b/plugins/agentic-security-assessment/CHANGELOG.md index 493b508..8f03a67 100644 --- a/plugins/agentic-security-assessment/CHANGELOG.md +++ b/plugins/agentic-security-assessment/CHANGELOG.md @@ -1,5 +1,18 @@ # Changelog +## [2.2.0] (2026-05-01) + + +### Features + +* **security-assessment:** add `recon-driven-scan` agent — bridges Phase 0 RECON narrative to concrete `file:line` evidence. Reads RECON's human-language risk descriptions and validates each described risk has matching code via targeted grep, finding patterns SAST cannot express (inverted-boolean TLS defaults, RCE shapes via expression libraries like Flee/Dynamic LINQ, header-driven SQL connection strings, body-trusted IDOR, masker exception PII fallback, format-preserving tokens). Includes a 28-pattern claim→search library covering unauth gRPC, TLS bypass, PII leak, crypto misuse, exception leak, SQL/code injection, SSRF, and DoS categories. Validated against the NextGen 2026-05-01 portfolio rerun: 12 repos previously scored zero-findings by SAST were re-scanned and produced 75 confirmed findings (8 CRITICAL, 17 HIGH) with zero false alarms. Notable additions the original SAST missed: 2 production SQL injections in `search-service`, RCE shape via Flee+Dynamic LINQ in `profile-custompipes`, inverted-boolean TLS bypass library-amplified across all consumer Lambdas in `notificationinfrastructure`, and expansion of the `Jupiter2020$` cross-repo credential reuse chain. +* **security-assessment:** Phase 1b is now a 5-agent parallel dispatch — `security-review` + `business-logic-domain-review` (via security-review-adapter) + `deep-code-reasoning` + `authorization-logic-review` + `recon-driven-scan` (latter three emit unified-finding-v1 directly, appended via `jq`). + + +### Documentation + +* **security-assessment:** Phase 1b parallelization rule, artifacts table, and exec-report agent→phase mapping all updated. Plugin-level CLAUDE.md agent registry updated 11 → 12. + ## [2.1.0](https://github.com/bdfinst/agentic-dev-team/compare/agentic-security-assessment-v2.0.0...agentic-security-assessment-v2.1.0) (2026-04-27)