feat(security-assessment): recalibrate CRITICAL threshold against opus_repo_scan_test reference (v2.3.0) by bdfinst · Pull Request #26 · bdfinst/agentic-dev-team

bdfinst · 2026-05-01T21:32:28Z

Summary

Recalibrates the CRITICAL severity threshold to align with the opus_repo_scan_test reference framework. Earlier score >= 7 → CRITICAL produced an inverted severity pyramid (CRITICAL > HIGH); the reference reserves CRITICAL for "exploitable immediately with no prerequisites; leads to data breach or fraud bypass." Tightening to score >= 9 restores the proper distribution.

Stacked on top of #25 (v2.2.0). Merge that one first.

Validation

Recalibrated against real portfolio data (NextGen + Walletron, ~97 repos):

	Pre-fix (v2.2.0)	Post-fix (v2.3.0)
NextGen	198C / 95H	79C / 158H ✓
Walletron	307C / 10H	57C / 251H ✓
Combined	505C / 105H (inverted)	136C / 409H (proper pyramid)

The reference framework's published example output is 7C / 12H / 7M / 3L (HIGH ≈ 1.7× CRITICAL). Our recalibrated combined ratio is 1 : 3 — same shape.

Changes

`knowledge/severity-floors.json`

New score_to_severity thresholds: 9→CRITICAL, 6→HIGH, 3→MEDIUM, 0→LOW. Each tier carries the reference's qualitative criteria.
discriminator fields on hardcoded-creds (production-reachable vs dev-only-fallback) and unauth-admin-endpoint (direct-privilege-escalation vs info-disclosure-only) so context-dependent floors don't all collapse to the same value.
New explicit floor=9 classes: fail-open-scoring, emulation-bypass, client-controlled-aggregate — matching reference S03-FS-01/02/03/04.
Each class rationale cites its corresponding reference finding ID for audit traceability.

`agents/fp-reduction.md`

Floor table reworked with reference citations (S01-FS-01 production keys, X-06 TLS-disabled, S07-FS-03 MD5, S02-FS-01/AG-01 unauth admin with privilege escalation).
Discriminator guidance for hardcoded-creds and unauth-admin-endpoint.
Calibration-reference paragraph explaining the 2026-05-01 change.

`.claude-plugin/plugin.json` + `CHANGELOG.md`

Version 2.2.0 → 2.3.0
Manual changelog entry; release-please will generate canonical entry on merge to main.

Calibration discriminator examples

Same finding type, different floors:

Class	Reference example	Discriminator	Floor
`hardcoded-creds`	AWS production keys (S01-FS-01)	`production-reachable`	9 (CRITICAL)
`hardcoded-creds`	`"fallback-secret-for-dev"` (S01-AG-03)	`dev-only-fallback`	7 (HIGH)
`unauth-admin-endpoint`	`/admin/reload-model` (S02-FS-01)	`direct-privilege-escalation`	9 (CRITICAL)
`unauth-admin-endpoint`	`/actuator/heap` (S02-FS-02)	`info-disclosure-only`	7 (HIGH)

The discriminator-aware floors will fully apply on the next fresh fp-reduction run; existing dispositions don't have discriminator fields yet but the score threshold change alone produced the proper pyramid.

Test plan

/agent-audit passes for agents/fp-reduction.md (structural compliance)
Run /security-assessment on a sample target with hardcoded production creds → confirm finding gets score=9, severity=CRITICAL
Run on a sample target with fallback-secret-for-dev → confirm finding gets score=7, severity=HIGH (with discriminator rationale)
Verify exec-report-generator renders the new severity distribution correctly

🤖 Generated with Claude Code

…s_repo_scan_test reference Earlier `score >= 7 → CRITICAL` combined with broad domain-class floors at 7 produced an inverted CRITICAL/HIGH pyramid (NextGen 198C/95H, Walletron 307C/10H). The reference framework (opus_repo_scan_test analyze-11) reserves CRITICAL for findings "exploitable immediately with no prerequisites; leads to data breach or fraud bypass" — produces a proper HIGH > CRITICAL distribution. Changes: - knowledge/severity-floors.json: * Add `score_to_severity` thresholds: 9→CRITICAL, 6→HIGH, 3→MEDIUM, 0→LOW. Each tier carries the reference's qualitative criteria. * Add `discriminator` fields to `hardcoded-creds` (production-reachable vs dev-only-fallback) and `unauth-admin-endpoint` (direct-privilege- escalation vs info-disclosure-only) so context-dependent floors don't all collapse to the same value. * New explicit floor=9 classes: `fail-open-scoring`, `emulation-bypass`, `client-controlled-aggregate` — matching reference S03-FS-01/02/03/04 where these are CRITICAL. * Each class rationale cites its corresponding reference finding ID for audit traceability. - agents/fp-reduction.md: * Floor table reworked with reference citations (S01-FS-01 production keys, X-06 TLS-disabled, S07-FS-03 MD5, S02-FS-01/AG-01 unauth admin with privilege escalation). * Discriminator guidance for hardcoded-creds and unauth-admin-endpoint. * Calibration-reference paragraph explaining the 2026-05-01 change and why earlier floors were too aggressive. The recalibration produces: NextGen 198C/95H → 79C/158H Walletron 307C/10H → 57C/251H Combined 505C/105H → 136C/409H (proper pyramid: HIGH > CRITICAL)

Manual changelog entry for the severity-recalibration release. release-please will generate the canonical 2.3.0 entry from conventional commits when this lands on main.

bdfinst added 2 commits May 1, 2026 16:31

chore(security-assessment): release 2.3.0

1d61422

Manual changelog entry for the severity-recalibration release. release-please will generate the canonical 2.3.0 entry from conventional commits when this lands on main.

Base automatically changed from feat/phase-1b-expansion to main May 1, 2026 21:38

bdfinst merged commit 7a3c320 into main May 1, 2026
1 check passed

bdfinst deleted the feat/severity-recalibration branch May 1, 2026 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(security-assessment): recalibrate CRITICAL threshold against opus_repo_scan_test reference (v2.3.0)#26

feat(security-assessment): recalibrate CRITICAL threshold against opus_repo_scan_test reference (v2.3.0)#26
bdfinst merged 2 commits into
mainfrom
feat/severity-recalibration

bdfinst commented May 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bdfinst commented May 1, 2026

Summary

Validation

Changes

knowledge/severity-floors.json

agents/fp-reduction.md

.claude-plugin/plugin.json + CHANGELOG.md

Calibration discriminator examples

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`knowledge/severity-floors.json`

`agents/fp-reduction.md`

`.claude-plugin/plugin.json` + `CHANGELOG.md`