@@ -81,12 +81,217 @@ PYTHONPATH=. predicate-authorityd \
8181 --mandate-signing-key-env PREDICATE_AUTHORITY_SIGNING_KEY
8282```
8383
84+ ## 2b) Okta production hardening checklist + staging matrix
85+
86+ Use this section when validating enterprise IdP readiness for Phase 2.
87+
88+ ### Checklist
89+
90+ - [ ] Configure dedicated Okta OIDC app integration per environment (staging/prod split).
91+ - [ ] Verify configured ` issuer ` and ` audience ` are exact matches to the target environment.
92+ - [ ] Verify required claims/scopes/groups mapping used by authority role/tenant checks.
93+ - [ ] Enforce strict JWT checks (` iss ` , ` aud ` , ` exp ` , ` nbf ` , ` iat ` , required claims, alg allowlist).
94+ - [ ] Validate JWKS retrieval and cache behavior for normal operation.
95+ - [ ] Validate key rotation behavior (` kid ` rollover) without service restart.
96+ - [ ] Validate fail-closed behavior for cold-start JWKS failure and stale key scenarios.
97+ - [x] Validate redaction: no token/secret leakage in logs on failures/retries.
98+ - [x] Validate startup diagnostics for missing/invalid auth configuration.
99+ - [ ] Validate revocation path behavior under Okta-backed principals.
100+
101+ ### Staging test matrix
102+
103+ | Test ID | Scenario | Expected Result |
104+ | --- | --- | --- |
105+ | OKTA-01 | Valid token (correct issuer/audience/scope) | Request authorized and audit emitted |
106+ | OKTA-02 | Wrong issuer | Denied with issuer mismatch reason |
107+ | OKTA-03 | Wrong audience | Denied with audience mismatch reason |
108+ | OKTA-04 | Missing required scope | Denied fail-closed before action |
109+ | OKTA-05 | Expired token | Denied with expiration reason |
110+ | OKTA-06 | Future ` nbf ` beyond leeway | Denied with temporal validation reason |
111+ | OKTA-07 | Unsupported signing algorithm | Denied before trust decision |
112+ | OKTA-08 | JWKS rotation (` kid ` changes) | Validation recovers without restart |
113+ | OKTA-09 | JWKS outage with warm cache | Existing key path continues until cache boundary |
114+ | OKTA-10 | JWKS outage with cold cache | Startup/auth fails closed with actionable diagnostics |
115+ | OKTA-11 | Tenant outside allow-list | Denied with tenant policy reason |
116+ | OKTA-12 | Principal/intent revocation during run | Subsequent action denied promptly |
117+ | OKTA-13 | Log redaction check | No raw tokens/secrets in logs |
118+
119+ ### Emergency JWKS key-rotation runbook (owner + on-call flow)
120+
121+ Owner model:
122+
123+ - Primary owner: Platform Identity On-call.
124+ - Secondary owner: Security On-call (approver for forced key disable).
125+ - Incident commander: Platform lead on duty.
126+
127+ Trigger conditions:
128+
129+ - compromised signing key suspected,
130+ - unexpected ` kid ` churn causing authorization failures,
131+ - emergency tenant request to invalidate active key material.
132+
133+ Runbook steps:
134+
135+ 1 . ** Declare incident + freeze risky deploys**
136+ - open incident channel and assign owner/approver,
137+ - freeze policy/auth-related deploy pipelines until stabilized.
138+ 2 . ** Rotate signing key in Okta**
139+ - publish new signing key and ensure new ` kid ` appears in JWKS,
140+ - stop issuing tokens from compromised/old key.
141+ 3 . ** Force validation against refreshed JWKS**
142+ - run targeted validation:
143+ - ` python3 -m pytest tests/test_identity_bridge_phase2.py -k "jwks_kid_rollover_refreshes_without_restart" `
144+ - if runtime impact is active, temporarily reduce cache TTL and trigger sidecar restart waves.
145+ 4 . ** Confirm deny behavior for old/unknown ` kid ` **
146+ - run:
147+ - ` python3 -m pytest tests/test_identity_bridge_phase2.py -k "jwks_stale_cache_and_outage_fails_closed_with_diagnostics" `
148+ - verify fail-closed behavior remains active.
149+ 5 . ** Recovery validation**
150+ - confirm healthy authorization path with new ` kid ` ,
151+ - confirm no broad deny regressions in tenant traffic.
152+ 6 . ** Closeout**
153+ - document timeline, affected tenants, and remediation actions,
154+ - restore deploy pipeline and publish post-incident notes.
155+
156+ ### Signoff evidence commands (deterministic integration tests)
157+
158+ Run these from ` AgentIdentity ` repo root and attach output to signoff evidence.
159+
160+ 1 ) Network partition fail-closed behavior:
161+
162+ ``` bash
163+ python3 -m pytest tests/test_daemon_phase2.py -k " network_partition_fail_closed_raises_and_tracks_failure"
164+ ```
165+
166+ Checkpoints:
167+
168+ - pass result proves fail-closed error path is enforced when control-plane is partitioned and ` fail_open=False ` ,
169+ - ` /status ` payload includes incremented control-plane failure counters.
170+
171+ 2 ) Restart recovery with persisted queue:
172+
173+ ``` bash
174+ python3 -m pytest tests/test_daemon_phase2.py -k " restart_recovers_queue_after_partition"
175+ ```
176+
177+ Checkpoints:
178+
179+ - pre-restart flush queue has pending event(s),
180+ - post-restart ` POST /ledger/flush-now ` reports ` sent_count >= 1 ` ,
181+ - post-flush queue is empty (` GET /ledger/flush-queue ` returns no items).
182+
183+ 3 ) Redaction and failure-reason validation:
184+
185+ ``` bash
186+ python3 -m pytest tests/test_identity_bridge_phase2.py -k " reasonful_and_redacted"
187+ ```
188+
189+ Checkpoints:
190+
191+ - validation error includes a reason category (e.g. issuer mismatch),
192+ - error text does not include raw token string or sensitive claim values.
193+
194+ 4 ) Okta token exchange/OBO compatibility (tenant capability-gated):
195+
196+ ``` bash
197+ # If tenant supports token exchange:
198+ export OKTA_OBO_COMPAT_CHECK_ENABLED=1
199+ export OKTA_SUPPORTS_TOKEN_EXCHANGE=true
200+ python3 -m pytest tests/test_okta_obo_compatibility.py -k " live_check_when_enabled"
201+
202+ # If tenant does NOT support token exchange:
203+ export OKTA_OBO_COMPAT_CHECK_ENABLED=1
204+ export OKTA_SUPPORTS_TOKEN_EXCHANGE=false
205+ python3 -m pytest tests/test_okta_obo_compatibility.py -k " live_check_when_enabled"
206+ ```
207+
208+ Checkpoints:
209+
210+ - ` client_credentials_ok ` must pass in both modes,
211+ - when ` OKTA_SUPPORTS_TOKEN_EXCHANGE=true ` , token exchange must succeed,
212+ - when ` OKTA_SUPPORTS_TOKEN_EXCHANGE=false ` , token exchange path is explicitly gated as tenant-disabled (no false failure).
213+
214+ ### Example demo script: Okta delegation compatibility
215+
216+ Run example from repo root:
217+
218+ ``` bash
219+ python3 examples/delegation/okta_obo_compat_demo.py \
220+ --issuer " $OKTA_ISSUER " \
221+ --client-id " $OKTA_CLIENT_ID " \
222+ --client-secret " $OKTA_CLIENT_SECRET " \
223+ --audience " $OKTA_AUDIENCE " \
224+ --scope " ${OKTA_SCOPE:- authority: check} " \
225+ --supports-token-exchange
226+ ```
227+
228+ Notes:
229+
230+ - omit ` --supports-token-exchange ` for tenants that do not support OBO/token exchange,
231+ - script reports whether delegation path should use IdP token exchange or authority mandate delegation.
232+
233+ ### Secret storage policy (Okta credentials)
234+
235+ - never commit Okta client secrets/API tokens/private keys to repo files,
236+ - store Okta credentials in runtime secret manager and CI secret store only,
237+ - CI enforcement:
238+ - ` scripts/check_no_plaintext_okta_secrets.py ` scans for plaintext Okta secrets,
239+ - auth module security checks run Bandit for ` predicate_authority ` auth paths.
240+
84241When enabled, daemon bootstrap auto-attaches ` ControlPlaneTraceEmitter ` so each
85242authority decision pushes:
86243
87244- audit events -> ` /v1/audit/events:batch `
88245- usage credits -> ` /v1/metering/usage:batch `
89246
247+ ### Optional: use Okta identity mode
248+
249+ Provide Okta OIDC values via env vars:
250+
251+ ``` bash
252+ export OKTA_ISSUER=" https://<org>.okta.com/oauth2/default"
253+ export OKTA_CLIENT_ID=" <okta-client-id>"
254+ export OKTA_AUDIENCE=" api://predicate-authority"
255+ ```
256+
257+ Start daemon in Okta mode:
258+
259+ ``` bash
260+ PYTHONPATH=. predicate-authorityd \
261+ --host 127.0.0.1 \
262+ --port 8787 \
263+ --mode cloud_connected \
264+ --identity-mode okta \
265+ --okta-issuer " $OKTA_ISSUER " \
266+ --okta-client-id " $OKTA_CLIENT_ID " \
267+ --okta-audience " $OKTA_AUDIENCE " \
268+ --okta-required-claims " sub,tenant_id" \
269+ --okta-required-scopes " authority:check" \
270+ --okta-required-roles " authority-operator" \
271+ --okta-allowed-tenants " tenant-a" \
272+ --idp-token-ttl-s 300 \
273+ --mandate-ttl-s 300 \
274+ --policy-file examples/authorityd/policy.json
275+ ```
276+
277+ Safety gate note:
278+
279+ - in ` cloud_connected ` mode, ` identity-mode local ` or ` identity-mode local-idp ` now requires explicit ` --allow-local-fallback ` ,
280+ - this prevents accidental implicit downgrade to local identity behavior.
281+
282+ TTL alignment note:
283+
284+ - startup enforces ` idp-token-ttl-s >= mandate-ttl-s ` to avoid mandates outliving identity session controls.
285+
286+ ### Emergency rollback route (Okta integration)
287+
288+ If Okta integration causes broad auth failures, use this rollback sequence:
289+
290+ 1 . disable the affected Okta app integration for the impacted environment,
291+ 2 . rotate signing keys and invalidate compromised sessions in Okta,
292+ 3 . switch sidecar traffic to a known-good identity config (or controlled local fallback with explicit ` --allow-local-fallback ` ),
293+ 4 . verify deny behavior + recovery through signoff evidence commands before restoring normal traffic.
294+
90295## 3b) Optional local identity registry (ephemeral task identities)
91296
92297Enable local identity support:
0 commit comments