feat(security-assessment): recalibrate CRITICAL threshold against opus_repo_scan_test reference (v2.3.0)#26
Merged
Merged
Conversation
…s_repo_scan_test reference
Earlier `score >= 7 → CRITICAL` combined with broad domain-class floors
at 7 produced an inverted CRITICAL/HIGH pyramid (NextGen 198C/95H,
Walletron 307C/10H). The reference framework
(opus_repo_scan_test analyze-11) reserves CRITICAL for findings
"exploitable immediately with no prerequisites; leads to data breach
or fraud bypass" — produces a proper HIGH > CRITICAL distribution.
Changes:
- knowledge/severity-floors.json:
* Add `score_to_severity` thresholds: 9→CRITICAL, 6→HIGH, 3→MEDIUM,
0→LOW. Each tier carries the reference's qualitative criteria.
* Add `discriminator` fields to `hardcoded-creds` (production-reachable
vs dev-only-fallback) and `unauth-admin-endpoint` (direct-privilege-
escalation vs info-disclosure-only) so context-dependent floors
don't all collapse to the same value.
* New explicit floor=9 classes: `fail-open-scoring`, `emulation-bypass`,
`client-controlled-aggregate` — matching reference S03-FS-01/02/03/04
where these are CRITICAL.
* Each class rationale cites its corresponding reference finding ID
for audit traceability.
- agents/fp-reduction.md:
* Floor table reworked with reference citations
(S01-FS-01 production keys, X-06 TLS-disabled, S07-FS-03 MD5,
S02-FS-01/AG-01 unauth admin with privilege escalation).
* Discriminator guidance for hardcoded-creds and unauth-admin-endpoint.
* Calibration-reference paragraph explaining the 2026-05-01 change
and why earlier floors were too aggressive.
The recalibration produces:
NextGen 198C/95H → 79C/158H
Walletron 307C/10H → 57C/251H
Combined 505C/105H → 136C/409H (proper pyramid: HIGH > CRITICAL)
Manual changelog entry for the severity-recalibration release. release-please will generate the canonical 2.3.0 entry from conventional commits when this lands on main.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Recalibrates the CRITICAL severity threshold to align with the
opus_repo_scan_testreference framework. Earlierscore >= 7 → CRITICALproduced an inverted severity pyramid (CRITICAL > HIGH); the reference reserves CRITICAL for "exploitable immediately with no prerequisites; leads to data breach or fraud bypass." Tightening toscore >= 9restores the proper distribution.Stacked on top of #25 (v2.2.0). Merge that one first.
Validation
Recalibrated against real portfolio data (NextGen + Walletron, ~97 repos):
The reference framework's published example output is 7C / 12H / 7M / 3L (HIGH ≈ 1.7× CRITICAL). Our recalibrated combined ratio is 1 : 3 — same shape.
Changes
knowledge/severity-floors.jsonscore_to_severitythresholds:9→CRITICAL,6→HIGH,3→MEDIUM,0→LOW. Each tier carries the reference's qualitative criteria.discriminatorfields onhardcoded-creds(production-reachable vs dev-only-fallback) andunauth-admin-endpoint(direct-privilege-escalation vs info-disclosure-only) so context-dependent floors don't all collapse to the same value.floor=9classes:fail-open-scoring,emulation-bypass,client-controlled-aggregate— matching reference S03-FS-01/02/03/04.agents/fp-reduction.mdhardcoded-credsandunauth-admin-endpoint..claude-plugin/plugin.json+CHANGELOG.mdCalibration discriminator examples
Same finding type, different floors:
hardcoded-credsproduction-reachablehardcoded-creds"fallback-secret-for-dev"(S01-AG-03)dev-only-fallbackunauth-admin-endpoint/admin/reload-model(S02-FS-01)direct-privilege-escalationunauth-admin-endpoint/actuator/heap(S02-FS-02)info-disclosure-onlyThe discriminator-aware floors will fully apply on the next fresh fp-reduction run; existing dispositions don't have discriminator fields yet but the score threshold change alone produced the proper pyramid.
Test plan
/agent-auditpasses foragents/fp-reduction.md(structural compliance)/security-assessmenton a sample target with hardcoded production creds → confirm finding getsscore=9, severity=CRITICALfallback-secret-for-dev→ confirm finding getsscore=7, severity=HIGH (with discriminator rationale)🤖 Generated with Claude Code