feat(rbac): resolver + RequirePermission middleware (PR 2a of 4)#301
Conversation
In-memory permission set the per-request resolver materializes from the user_organization_role × role × role_permission join. Pure data structure — DB integration lands in the next commit (PermissionResolver). Has(key, ResourceContext) implements the plan's narrowing rule: - Org-scope assignment grants the action everywhere within the org. - Site-scope assignment overrides the org grant at that site only (site-scope shadow), so an admin can add a smaller site role to narrow a user without revoking their org-wide access. - Org-scoped actions (no site context in the request) are NEVER shadowed by site-scope grants — there is no site key to narrow on, so only org-scope assignments can satisfy user:manage and friends. FlatKeys() projects the union for UserInfo.permissions (client-side coarse gate; the server still enforces scope via Has() on every call). Test-first per the plan's execution note. Ten tests cover: org-scope allows everywhere, site-scope only at that site, org-scoped action needs org grant, narrowing site-overrides-org at the matched site but org still applies elsewhere, narrowing doesn't shadow org-scoped actions, multi-site union, empty deny, unknown-key deny, FlatKeys determinism, and the FIELD_TECH AE (blink yes, reboot no).
Has() never reads role identity (id/name/builtin_key) — it only consults permission_key + scope. AssignmentRole was a speculative field; remove it to keep the resolver's input shape minimal and avoid a population requirement that the ListEffectivePermissionsForUser query doesn't even satisfy.
…ions LoadEffective(userID, orgID) runs a single query against user_organization_role × role × role_permission via the ListEffectivePermissionsForUser sqlc binding, groups the flat (assignment_id, scope, permission_key) rows into one Assignment per assignment_id, and materializes the slice into an EffectivePermissions value the middleware can query via Has(). Soft-deleted assignments and roles are excluded by the underlying SQL. A user with no live assignments returns a non-nil empty value — the fail-closed default for a deactivated user or a user who was never in this org. Integration tests (DB-backed via testutil.GetTestDB) cover: - SUPER_ADMIN at org-scope gets the full catalog. - FIELD_TECH at site-scope is bounded at that site. - Narrowing end-to-end with two assignments (ADMIN @ org + FIELD_TECH @ site). - Soft-deleted assignment is ignored on the next resolver call. - No-assignments returns empty deny-all (non-nil).
… denial payload
Runtime counterpart of EffectivePermissions.Has() — the call site each
handler uses in place of RequireAdmin (the swap lands in PR 2b).
Behavior:
- Unauthenticated request (no session.Info on context) → Connect
Unauthenticated.
- Internal actor (session.Info.Actor != "") short-circuits to ALLOW.
Scheduler and curtailment-reconciler synthesize sessions without a
real user; they have no rows in user_organization_role and are
trusted by virtue of running in-process.
- Authenticated user, EffectivePermissions missing on context →
Connect Internal. Fail-closed: the gate is the security boundary
and the missing value indicates a wiring bug, not "allow."
- Authenticated user, eff.Has(key, rc) false → Connect
PermissionDenied with structured JSON body
`{"required": "<key>", "scope": {"site_id": <N>}}`. Scope echoes
the caller's ResourceContext only — never server-side assignment
ids, role names, or the effective-permission list.
WithEffectivePermissions stashes the resolver's per-request output on
the context under a private key so handler code can't accidentally
read or overwrite it.
Eight unit tests cover: allow path, denial payload shape (both
org-scoped and site-scoped variants), unauthenticated, fail-closed on
missing EffectivePermissions, ActorScheduler short-circuit,
ActorCurtailment short-circuit, narrowing semantics threading
through the full middleware path (site-1 denies, site-2 allows under
the same caller).
Every authenticated request now ends with a per-request EffectivePermissions stashed on the context, regardless of whether the caller authenticated by session cookie or API key. RequirePermission (committed in the previous change) consults that stashed value to gate handler entry. Implementation: - AuthInterceptor gains a *authz.PermissionResolver field; both authenticateWithSession and authenticateWithApiKey end with a shared loadEffectivePermissions helper that calls resolver.LoadEffective and wraps the result with middleware.WithEffectivePermissions before returning the context. - API-key callers inherit the user's CURRENT effective set on every request (live inheritance). Revocation propagates on next request. Snapshot-at-issue is out of scope for this PR and noted in the resolver godoc. - DB or wiring errors from LoadEffective are surfaced via the same classifyLookupError path as the existing user/role lookups — authenticated requests with a broken resolver path fail closed. - main.go constructs the resolver after migrations finish (Reconcile runs first so per-org built-in rows exist) and threads it into NewAuthInterceptor. - testutil.ServiceProvider gains a PermissionResolver field; all NewAuthInterceptor callsites updated. Two unit tests that passed nil for unused dependencies (config_test, authentication_apikey_error_test) add a nil for the resolver too — those paths exit before the resolver runs and don't need a real one.
🔐 Codex Security Review
Review SummaryOverall Risk: MEDIUM Findings[MEDIUM] Permission Rechecks Reuse Stale Authorization Snapshot
NotesNo SQL injection, command injection, pool hijacking, protobuf wire-format, plugin-boundary, or frontend security issues were evident in the reviewed diff. Generated by Codex Security Review | |
There was a problem hiding this comment.
Pull request overview
Adds the new granular RBAC “effective permissions” snapshot + resolver, wires it into the auth interceptor, and introduces a RequirePermission middleware that will become the new handler gate (without switching any call sites yet).
Changes:
- Introduces
authz.EffectivePermissionswith narrowing semantics and a DB-backedPermissionResolver. - Adds
handlers/middleware.RequirePermissionand context plumbing for per-request effective permissions. - Wires resolver loading into the auth interceptor and updates server/test wiring to pass the resolver.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| server/internal/testutil/service_provider.go | Adds PermissionResolver to test service provider wiring. |
| server/internal/testutil/infrastructure_provider.go | Threads the resolver into NewAuthInterceptor in test infra. |
| server/internal/handlers/middleware/permission.go | New RequirePermission middleware + context stash/accessors and structured deny payload. |
| server/internal/handlers/middleware/permission_test.go | Unit tests for allow/deny, payload shape, fail-closed behavior, and actor short-circuit. |
| server/internal/handlers/interceptors/config_test.go | Updates interceptor construction signature in tests. |
| server/internal/handlers/interceptors/authentication.go | Loads/stashes effective permissions after auth (session + API key). |
| server/internal/handlers/interceptors/authentication_test.go | Updates interceptor construction to pass resolver in integration tests. |
| server/internal/handlers/interceptors/authentication_apikey_error_test.go | Updates interceptor construction signature. |
| server/internal/domain/authz/effective.go | New in-memory effective-permissions decision structure + Has/FlatKeys. |
| server/internal/domain/authz/effective_test.go | Unit tests for narrowing semantics, unions, and flat projection. |
| server/internal/domain/authz/resolver.go | New resolver that materializes EffectivePermissions from one SQL query. |
| server/internal/domain/authz/resolver_test.go | DB-backed integration tests for org/site scope and narrowing behavior. |
| server/cmd/fleetd/main.go | Constructs and injects the resolver into the auth interceptor at boot. |
…-closed Round-1 review on #301 caught three real bugs. **Zero-permission site assignments fail open (Codex HIGH + thread 2):** ListEffectivePermissionsForUser was INNER JOINing role_permission, so a site-scope role with zero permissions produced no rows. The resolver never recorded bySite[siteID] for that site and narrowing collapsed back to the user's org-scope grant — an empty "lockdown" role intended to revoke broad access at a single site would have silently failed and allowed miner:reboot / miner:update_pools / etc. through. Switched to LEFT JOIN role_permission and permission; permission_key is now nullable. assignmentsFromRows skips null keys but still creates the Assignment row so the bySite[] entry exists. Two regression tests added: pure unit test (in-memory empty assignment narrows correctly) plus DB-backed test exercising the LEFT JOIN path with a real empty custom role. **Any non-empty Actor bypasses RBAC (Codex MEDIUM):** RequirePermission short-circuited on `info.Actor != ""`, which would accept any future or mistyped Actor value as fully trusted. Switched to an explicit allowlist: ActorScheduler and ActorCurtailment short-circuit to ALLOW; any other non-empty Actor returns Internal. Regression test proves an unknown Actor surfaces as Internal, not ALLOW. **Nil PermissionResolver panic (thread 1):** loadEffectivePermissions dereferenced i.permissionResolver without a nil check. A miswired test or boot path constructing the interceptor with nil (a few unit tests pass nil for unused dependencies) would have panicked the server instead of failing closed. Added the nil check — returns Internal so the wiring bug surfaces immediately rather than as a crash.
Codex Security Review (round 1 on #301) — responseBoth findings addressed in HIGH — Zero-permission site assignments fail open to org-scope grants
You're right — the INNER JOIN dropped empty-role assignments and narrowing collapsed. Fixed by switching Two regression tests added:
MEDIUM — Any non-empty internal actor bypasses all permission checks
Done. Replaced the if info.Actor != "" {
switch info.Actor {
case session.ActorScheduler, session.ActorCurtailment:
return info, nil
default:
return nil, fleeterror.NewInternalErrorf(
"authz: unknown internal actor %q; refusing to short-circuit RBAC",
info.Actor,
)
}
}
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7f762ab8c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
… succeeds Codex P2 caught: authenticateWithApiKey was calling RecordSuccessfulUse before loadEffectivePermissions, so a resolver failure (DB error, nil wiring) would reject the request while still crediting last_used_at as a successful authentication. Audit and usage telemetry would report failed auths as successful. Moved RecordSuccessfulUse below the loadEffectivePermissions call so it only fires when the full authentication chain succeeds. Behavior on success is unchanged.
…Manage
CI caught that TestResolver_ZeroPermissionSiteAssignmentStillNarrows
asserted ADMIN holds user:manage at org scope. ADMIN's seed formula is
AllPermissions() minus {user:read, user:manage, role:manage} — the
user:manage exclusion is the whole point of the SUPER_ADMIN-only
floor. Swapped the org-scope assertion to site:manage, which IS in
ADMIN's seed and exercises the same property (org-scoped action,
satisfied by org-scope grant despite the empty site-scope assignment).
…d scope helper Review-driven simplification pass over PR 2a (no behavior change): **resolver.go assignmentsFromRows**: dropped the `started bool` + `flush()` closure pattern. Pre-seed the slice from `rows[0]` then iterate uniformly; the same code path now handles first-row and same-assignment-row cases. Extracted `newAssignmentFromRow` for the scope-only seed. Net: -8 lines and clearer flow. **effective.go Has()**: tightened the inline narration. The type-level doc already explains narrowing in full; the function body's three repeat-explanations were noise. Code is short enough to read directly. **resolver_test.go assignAssignment**: helper now takes `authz.ScopeType` instead of raw `string`, and call sites use `authz.ScopeOrg` / `authz.ScopeSite` constants instead of literal `"org"` / `"site"`. Removes seven stringly-typed call sites. **resolver_test.go / backfill_test.go**: moved `insertTestSite` next to its sibling `insertTestOrganization` and `insertTestUser` helpers in backfill_test.go — all three are package-local DB-seeding helpers that belong together. **permission.go**: dropped the "in U7" task-tracker reference from the RequirePermission godoc; replaced with "as call sites migrate to the permission model" so the doc outlives the migration. **resolver.go NewPermissionResolver**: trimmed the constructor godoc to the WHY (per-request read query, no surrounding tx) and dropped the sentence that just narrated the signature. Build, vet, golangci-lint, and non-DB authz/middleware tests all green.
First half of PR 2 of the four-part granular RBAC plan (the plan PR). Building on the foundation in #283.
This PR introduces the permission resolver and the
RequirePermissionmiddleware — the security gate the rest of the codebase will route through. Critically, no callsites are swapped yet: every existingRequireAdmincheck still gates handler entry, this PR is purely additive. The gate exists, but nothing reads it. PR 2b will flip the switch (swap callers and neutralize the legacyuser_organization.role_idcolumn).Splitting PR 2 in half keeps the "live security gate flip" change small and reviewable — see #283's review history for context (five rounds, many design-level pivots). The resolver-and-middleware piece doesn't change behavior, so it can settle in review without holding up the swap.
What landed
domain/authz/effective.go— in-memory decision typeEffectivePermissionsis the per-request immutable snapshot of one user's authorization state within one organization.Has(key, ResourceContext)implements narrowing semantics from the plan:FlatKeys()projects the union forUserInfo.permissions(client-side coarse gate; the server still enforces scope viaHas()on every call).domain/authz/resolver.go—PermissionResolverLoadEffective(ctx, userID, orgID)runs one query (ListEffectivePermissionsForUser) againstuser_organization_role × role × role_permissionand materializes the result into anEffectivePermissions.(assignment_id, scope, permission_key)rows by assignment id so multi-permission roles fold into oneAssignmentvalue.testutil.GetTestDB) cover SUPER_ADMIN-at-org-scope, FIELD_TECH-at-site-scope, narrowing end-to-end with two assignments, soft-delete is ignored, and empty deny-all.handlers/middleware/permission.go—RequirePermissionThe runtime call site each handler will use in place of
RequireAdmin. Returns thesession.Infofor handlers that need the caller's identity, or one of:session.Infoon context.EffectivePermissionsmissing from context (the interceptor failed to populate it). ALLOW is never the fail-closed default.{"required": "<permission_key>", "scope": {"site_id": <N>}}ResourceContextonly — never server-side assignment IDs, role names, or the caller's effective permission list.Internal-actor short-circuit: when
session.Info.Actoris non-empty (scheduler, curtailment-reconciler, etc.),RequirePermissionshort-circuits to ALLOW without consultingEffectivePermissions. Synthesized sessions have no real user; they're trusted by virtue of running in-process.WithEffectivePermissions(ctx, eff)is the public stash function used by the interceptor; the read path is private to enforce that handlers only consume viaRequirePermission.8 unit tests cover allow, deny (both org-scoped and site-scoped payload shape), unauthenticated, fail-closed-on-missing, scheduler short-circuit, curtailment-reconciler short-circuit, and the narrowing matrix.
handlers/interceptors/authentication.go— wiringBoth
authenticateWithSessionandauthenticateWithApiKeyend with a sharedloadEffectivePermissions(ctx, info)helper that callsresolver.LoadEffectiveand stashes the result viamiddleware.WithEffectivePermissionsbefore returning the request context.LoadEffectiveflow through the sameclassifyLookupErrorpath as user/role lookups — broken resolver path fails closed.main.goconstructs the resolver afterReconcile(so per-org built-in rows exist) and threads it intoNewAuthInterceptor.testutil.ServiceProvidergains aPermissionResolverfield; allNewAuthInterceptorcallsites updated.What this PR does NOT do
RequireAdmincallers — every handler still uses the old gate. PR 2b.rpc_permissions.godeclaration map or the contract test — PR 2b.user_organization.role_idor add the raising trigger — PR 2b.handlers/middleware/admin.go— kept until callers move in PR 2b.Test plan
go build ./...cleango vet ./...cleanlefthook run pre-push(golangci-lint) cleanlefthook run pre-pushcovers the typecheck for the interceptor threading the new resolver inRollback
Reverting this PR removes the resolver call from the auth interceptor, the new files, and the
PermissionResolverfield onAuthInterceptor. No schema change, no data migration, no live behavior change to reverse.