Skip to content

feat(security-assessment): recalibrate CRITICAL threshold against opus_repo_scan_test reference (v2.3.0)#26

Merged
bdfinst merged 2 commits into
mainfrom
feat/severity-recalibration
May 1, 2026
Merged

feat(security-assessment): recalibrate CRITICAL threshold against opus_repo_scan_test reference (v2.3.0)#26
bdfinst merged 2 commits into
mainfrom
feat/severity-recalibration

Conversation

@bdfinst
Copy link
Copy Markdown
Owner

@bdfinst bdfinst commented May 1, 2026

Summary

Recalibrates the CRITICAL severity threshold to align with the opus_repo_scan_test reference framework. Earlier score >= 7 → CRITICAL produced an inverted severity pyramid (CRITICAL > HIGH); the reference reserves CRITICAL for "exploitable immediately with no prerequisites; leads to data breach or fraud bypass." Tightening to score >= 9 restores the proper distribution.

Stacked on top of #25 (v2.2.0). Merge that one first.

Validation

Recalibrated against real portfolio data (NextGen + Walletron, ~97 repos):

Pre-fix (v2.2.0) Post-fix (v2.3.0)
NextGen 198C / 95H 79C / 158H
Walletron 307C / 10H 57C / 251H
Combined 505C / 105H (inverted) 136C / 409H (proper pyramid)

The reference framework's published example output is 7C / 12H / 7M / 3L (HIGH ≈ 1.7× CRITICAL). Our recalibrated combined ratio is 1 : 3 — same shape.

Changes

knowledge/severity-floors.json

  • New score_to_severity thresholds: 9→CRITICAL, 6→HIGH, 3→MEDIUM, 0→LOW. Each tier carries the reference's qualitative criteria.
  • discriminator fields on hardcoded-creds (production-reachable vs dev-only-fallback) and unauth-admin-endpoint (direct-privilege-escalation vs info-disclosure-only) so context-dependent floors don't all collapse to the same value.
  • New explicit floor=9 classes: fail-open-scoring, emulation-bypass, client-controlled-aggregate — matching reference S03-FS-01/02/03/04.
  • Each class rationale cites its corresponding reference finding ID for audit traceability.

agents/fp-reduction.md

  • Floor table reworked with reference citations (S01-FS-01 production keys, X-06 TLS-disabled, S07-FS-03 MD5, S02-FS-01/AG-01 unauth admin with privilege escalation).
  • Discriminator guidance for hardcoded-creds and unauth-admin-endpoint.
  • Calibration-reference paragraph explaining the 2026-05-01 change.

.claude-plugin/plugin.json + CHANGELOG.md

  • Version 2.2.0 → 2.3.0
  • Manual changelog entry; release-please will generate canonical entry on merge to main.

Calibration discriminator examples

Same finding type, different floors:

Class Reference example Discriminator Floor
hardcoded-creds AWS production keys (S01-FS-01) production-reachable 9 (CRITICAL)
hardcoded-creds "fallback-secret-for-dev" (S01-AG-03) dev-only-fallback 7 (HIGH)
unauth-admin-endpoint /admin/reload-model (S02-FS-01) direct-privilege-escalation 9 (CRITICAL)
unauth-admin-endpoint /actuator/heap (S02-FS-02) info-disclosure-only 7 (HIGH)

The discriminator-aware floors will fully apply on the next fresh fp-reduction run; existing dispositions don't have discriminator fields yet but the score threshold change alone produced the proper pyramid.

Test plan

  • /agent-audit passes for agents/fp-reduction.md (structural compliance)
  • Run /security-assessment on a sample target with hardcoded production creds → confirm finding gets score=9, severity=CRITICAL
  • Run on a sample target with fallback-secret-for-dev → confirm finding gets score=7, severity=HIGH (with discriminator rationale)
  • Verify exec-report-generator renders the new severity distribution correctly

🤖 Generated with Claude Code

bdfinst added 2 commits May 1, 2026 16:31
…s_repo_scan_test reference

Earlier `score >= 7 → CRITICAL` combined with broad domain-class floors
at 7 produced an inverted CRITICAL/HIGH pyramid (NextGen 198C/95H,
Walletron 307C/10H). The reference framework
(opus_repo_scan_test analyze-11) reserves CRITICAL for findings
"exploitable immediately with no prerequisites; leads to data breach
or fraud bypass" — produces a proper HIGH > CRITICAL distribution.

Changes:

- knowledge/severity-floors.json:
  * Add `score_to_severity` thresholds: 9→CRITICAL, 6→HIGH, 3→MEDIUM,
    0→LOW. Each tier carries the reference's qualitative criteria.
  * Add `discriminator` fields to `hardcoded-creds` (production-reachable
    vs dev-only-fallback) and `unauth-admin-endpoint` (direct-privilege-
    escalation vs info-disclosure-only) so context-dependent floors
    don't all collapse to the same value.
  * New explicit floor=9 classes: `fail-open-scoring`, `emulation-bypass`,
    `client-controlled-aggregate` — matching reference S03-FS-01/02/03/04
    where these are CRITICAL.
  * Each class rationale cites its corresponding reference finding ID
    for audit traceability.

- agents/fp-reduction.md:
  * Floor table reworked with reference citations
    (S01-FS-01 production keys, X-06 TLS-disabled, S07-FS-03 MD5,
    S02-FS-01/AG-01 unauth admin with privilege escalation).
  * Discriminator guidance for hardcoded-creds and unauth-admin-endpoint.
  * Calibration-reference paragraph explaining the 2026-05-01 change
    and why earlier floors were too aggressive.

The recalibration produces:
  NextGen   198C/95H   →  79C/158H
  Walletron 307C/10H   →  57C/251H
  Combined  505C/105H  →  136C/409H (proper pyramid: HIGH > CRITICAL)
Manual changelog entry for the severity-recalibration release.
release-please will generate the canonical 2.3.0 entry from
conventional commits when this lands on main.
Base automatically changed from feat/phase-1b-expansion to main May 1, 2026 21:38
@bdfinst bdfinst merged commit 7a3c320 into main May 1, 2026
1 check passed
@bdfinst bdfinst deleted the feat/severity-recalibration branch May 1, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant