Skip to content

Commit ce69bac

Browse files
committed
wrap up phase 2 hardening
1 parent 0a3bc43 commit ce69bac

10 files changed

Lines changed: 1324 additions & 4 deletions

File tree

.github/workflows/phase1-ci-and-release.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ jobs:
2727

2828
- name: Install dependencies
2929
run: |
30-
python -m pip install --upgrade pip pre-commit pytest
30+
python -m pip install --upgrade pip pre-commit pytest bandit
3131
python -m pip install -e predicate_contracts -e predicate_authority
3232
3333
- name: Verify package release order
@@ -36,6 +36,11 @@ jobs:
3636
- name: Run tests
3737
run: python -m pytest -q
3838

39+
- name: Run auth module security checks
40+
run: |
41+
python -m bandit -q -r predicate_authority/bridge.py predicate_authority/daemon.py predicate_authority/control_plane.py
42+
python scripts/check_no_plaintext_okta_secrets.py
43+
3944
- name: Run pre-commit checks
4045
run: pre-commit run --all-files
4146

.github/workflows/tests.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,13 @@ jobs:
2525

2626
- name: Install test dependencies
2727
run: |
28-
python -m pip install --upgrade pip pytest
28+
python -m pip install --upgrade pip pytest bandit
2929
python -m pip install -e predicate_contracts -e predicate_authority
3030
3131
- name: Run tests
3232
run: python -m pytest -q
33+
34+
- name: Run auth module security checks
35+
run: |
36+
python -m bandit -q -r predicate_authority/bridge.py predicate_authority/daemon.py predicate_authority/control_plane.py
37+
python scripts/check_no_plaintext_okta_secrets.py

docs/authorityd-operations.md

Lines changed: 81 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ Use this section when validating enterprise IdP readiness for Phase 2.
9494
- [ ] Validate JWKS retrieval and cache behavior for normal operation.
9595
- [ ] Validate key rotation behavior (`kid` rollover) without service restart.
9696
- [ ] Validate fail-closed behavior for cold-start JWKS failure and stale key scenarios.
97-
- [ ] Validate redaction: no token/secret leakage in logs on failures/retries.
97+
- [x] Validate redaction: no token/secret leakage in logs on failures/retries.
9898
- [x] Validate startup diagnostics for missing/invalid auth configuration.
9999
- [ ] Validate revocation path behavior under Okta-backed principals.
100100

@@ -116,6 +116,43 @@ Use this section when validating enterprise IdP readiness for Phase 2.
116116
| OKTA-12 | Principal/intent revocation during run | Subsequent action denied promptly |
117117
| OKTA-13 | Log redaction check | No raw tokens/secrets in logs |
118118

119+
### Emergency JWKS key-rotation runbook (owner + on-call flow)
120+
121+
Owner model:
122+
123+
- Primary owner: Platform Identity On-call.
124+
- Secondary owner: Security On-call (approver for forced key disable).
125+
- Incident commander: Platform lead on duty.
126+
127+
Trigger conditions:
128+
129+
- compromised signing key suspected,
130+
- unexpected `kid` churn causing authorization failures,
131+
- emergency tenant request to invalidate active key material.
132+
133+
Runbook steps:
134+
135+
1. **Declare incident + freeze risky deploys**
136+
- open incident channel and assign owner/approver,
137+
- freeze policy/auth-related deploy pipelines until stabilized.
138+
2. **Rotate signing key in Okta**
139+
- publish new signing key and ensure new `kid` appears in JWKS,
140+
- stop issuing tokens from compromised/old key.
141+
3. **Force validation against refreshed JWKS**
142+
- run targeted validation:
143+
- `python3 -m pytest tests/test_identity_bridge_phase2.py -k "jwks_kid_rollover_refreshes_without_restart"`
144+
- if runtime impact is active, temporarily reduce cache TTL and trigger sidecar restart waves.
145+
4. **Confirm deny behavior for old/unknown `kid`**
146+
- run:
147+
- `python3 -m pytest tests/test_identity_bridge_phase2.py -k "jwks_stale_cache_and_outage_fails_closed_with_diagnostics"`
148+
- verify fail-closed behavior remains active.
149+
5. **Recovery validation**
150+
- confirm healthy authorization path with new `kid`,
151+
- confirm no broad deny regressions in tenant traffic.
152+
6. **Closeout**
153+
- document timeline, affected tenants, and remediation actions,
154+
- restore deploy pipeline and publish post-incident notes.
155+
119156
### Signoff evidence commands (deterministic integration tests)
120157

121158
Run these from `AgentIdentity` repo root and attach output to signoff evidence.
@@ -143,6 +180,25 @@ Checkpoints:
143180
- post-restart `POST /ledger/flush-now` reports `sent_count >= 1`,
144181
- post-flush queue is empty (`GET /ledger/flush-queue` returns no items).
145182

183+
3) Redaction and failure-reason validation:
184+
185+
```bash
186+
python3 -m pytest tests/test_identity_bridge_phase2.py -k "reasonful_and_redacted"
187+
```
188+
189+
Checkpoints:
190+
191+
- validation error includes a reason category (e.g. issuer mismatch),
192+
- error text does not include raw token string or sensitive claim values.
193+
194+
### Secret storage policy (Okta credentials)
195+
196+
- never commit Okta client secrets/API tokens/private keys to repo files,
197+
- store Okta credentials in runtime secret manager and CI secret store only,
198+
- CI enforcement:
199+
- `scripts/check_no_plaintext_okta_secrets.py` scans for plaintext Okta secrets,
200+
- auth module security checks run Bandit for `predicate_authority` auth paths.
201+
146202
When enabled, daemon bootstrap auto-attaches `ControlPlaneTraceEmitter` so each
147203
authority decision pushes:
148204

@@ -170,9 +226,33 @@ PYTHONPATH=. predicate-authorityd \
170226
--okta-issuer "$OKTA_ISSUER" \
171227
--okta-client-id "$OKTA_CLIENT_ID" \
172228
--okta-audience "$OKTA_AUDIENCE" \
229+
--okta-required-claims "sub,tenant_id" \
230+
--okta-required-scopes "authority:check" \
231+
--okta-required-roles "authority-operator" \
232+
--okta-allowed-tenants "tenant-a" \
233+
--idp-token-ttl-s 300 \
234+
--mandate-ttl-s 300 \
173235
--policy-file examples/authorityd/policy.json
174236
```
175237

238+
Safety gate note:
239+
240+
- in `cloud_connected` mode, `identity-mode local` or `identity-mode local-idp` now requires explicit `--allow-local-fallback`,
241+
- this prevents accidental implicit downgrade to local identity behavior.
242+
243+
TTL alignment note:
244+
245+
- startup enforces `idp-token-ttl-s >= mandate-ttl-s` to avoid mandates outliving identity session controls.
246+
247+
### Emergency rollback route (Okta integration)
248+
249+
If Okta integration causes broad auth failures, use this rollback sequence:
250+
251+
1. disable the affected Okta app integration for the impacted environment,
252+
2. rotate signing keys and invalidate compromised sessions in Okta,
253+
3. switch sidecar traffic to a known-good identity config (or controlled local fallback with explicit `--allow-local-fallback`),
254+
4. verify deny behavior + recovery through signoff evidence commands before restoring normal traffic.
255+
176256
## 3b) Optional local identity registry (ephemeral task identities)
177257

178258
Enable local identity support:

predicate_authority/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@
99
OIDCIdentityBridge,
1010
OktaBridgeConfig,
1111
OktaIdentityBridge,
12+
OktaTokenClaims,
1213
TokenExchangeResult,
14+
TokenValidationError,
1315
)
1416
from predicate_authority.client import AuthorityClient, LocalAuthorizationContext
1517
from predicate_authority.control_plane import (
@@ -75,6 +77,7 @@
7577
"OIDCIdentityBridge",
7678
"OktaBridgeConfig",
7779
"OktaIdentityBridge",
80+
"OktaTokenClaims",
7881
"OpenTelemetryTraceEmitter",
7982
"PolicyEngine",
8083
"PolicyFileSource",
@@ -89,5 +92,6 @@
8992
"CompositeTraceEmitter",
9093
"LedgerQueueItem",
9194
"TaskIdentityRecord",
95+
"TokenValidationError",
9296
"UsageCreditRecord",
9397
]

0 commit comments

Comments
 (0)