feat(rbac): resolver + RequirePermission middleware (PR 2a of 4) by mcharles-square · Pull Request #301 · block/proto-fleet

mcharles-square · 2026-05-22T16:41:40Z

First half of PR 2 of the four-part granular RBAC plan (the plan PR). Building on the foundation in #283.

This PR introduces the permission resolver and the RequirePermission middleware — the security gate the rest of the codebase will route through. Critically, no callsites are swapped yet: every existing RequireAdmin check still gates handler entry, this PR is purely additive. The gate exists, but nothing reads it. PR 2b will flip the switch (swap callers and neutralize the legacy user_organization.role_id column).

Splitting PR 2 in half keeps the "live security gate flip" change small and reviewable — see #283's review history for context (five rounds, many design-level pivots). The resolver-and-middleware piece doesn't change behavior, so it can settle in review without holding up the swap.

What landed

`domain/authz/effective.go` — in-memory decision type

EffectivePermissions is the per-request immutable snapshot of one user's authorization state within one organization.

Has(key, ResourceContext) implements narrowing semantics from the plan:
- Org-scope assignment grants the action everywhere within the org.
- Site-scope assignment overrides the org grant at that site only, so admins can add a smaller site-scoped role to narrow a user without revoking their org-wide access.
- Org-scoped actions (no site context in the request) are never shadowed by site-scope grants — there's no site key to narrow on.
FlatKeys() projects the union for UserInfo.permissions (client-side coarse gate; the server still enforces scope via Has() on every call).
Pure data structure with no I/O; built test-first with 10 unit tests covering the narrowing matrix, multi-site union, empty deny, unknown-key deny, and the FIELD_TECH AE.

`domain/authz/resolver.go` — `PermissionResolver`

LoadEffective(ctx, userID, orgID) runs one query (ListEffectivePermissionsForUser) against user_organization_role × role × role_permission and materializes the result into an EffectivePermissions.
Groups flat (assignment_id, scope, permission_key) rows by assignment id so multi-permission roles fold into one Assignment value.
A user with no live assignments gets a non-nil empty value — the fail-closed default for a deactivated user or a user who was never in this org.
5 DB-backed integration tests (via testutil.GetTestDB) cover SUPER_ADMIN-at-org-scope, FIELD_TECH-at-site-scope, narrowing end-to-end with two assignments, soft-delete is ignored, and empty deny-all.

`handlers/middleware/permission.go` — `RequirePermission`

The runtime call site each handler will use in place of RequireAdmin. Returns the session.Info for handlers that need the caller's identity, or one of:

Unauthenticated — no session.Info on context.
Internal — fail-closed; EffectivePermissions missing from context (the interceptor failed to populate it). ALLOW is never the fail-closed default.
PermissionDenied with structured JSON body:
```
{"required": "<permission_key>", "scope": {"site_id": <N>}}
```
Scope echoes the caller's ResourceContext only — never server-side assignment IDs, role names, or the caller's effective permission list.

Internal-actor short-circuit: when session.Info.Actor is non-empty (scheduler, curtailment-reconciler, etc.), RequirePermission short-circuits to ALLOW without consulting EffectivePermissions. Synthesized sessions have no real user; they're trusted by virtue of running in-process.

WithEffectivePermissions(ctx, eff) is the public stash function used by the interceptor; the read path is private to enforce that handlers only consume via RequirePermission.

8 unit tests cover allow, deny (both org-scoped and site-scoped payload shape), unauthenticated, fail-closed-on-missing, scheduler short-circuit, curtailment-reconciler short-circuit, and the narrowing matrix.

`handlers/interceptors/authentication.go` — wiring

Both authenticateWithSession and authenticateWithApiKey end with a shared loadEffectivePermissions(ctx, info) helper that calls resolver.LoadEffective and stashes the result via middleware.WithEffectivePermissions before returning the request context.

API-key callers inherit the user's current effective set on every request (live inheritance). Revocation propagates on next request. Snapshot-at-issue is out of scope (noted in resolver godoc).
DB or wiring errors from LoadEffective flow through the same classifyLookupError path as user/role lookups — broken resolver path fails closed.
main.go constructs the resolver after Reconcile (so per-org built-in rows exist) and threads it into NewAuthInterceptor.
testutil.ServiceProvider gains a PermissionResolver field; all NewAuthInterceptor callsites updated.

What this PR does NOT do

Does not swap any RequireAdmin callers — every handler still uses the old gate. PR 2b.
Does not introduce the rpc_permissions.go declaration map or the contract test — PR 2b.
Does not rename user_organization.role_id or add the raising trigger — PR 2b.
Does not delete handlers/middleware/admin.go — kept until callers move in PR 2b.

Test plan

go build ./... clean
go vet ./... clean
lefthook run pre-push (golangci-lint) clean
Non-DB tests pass: 10 EffectivePermissions tests + 8 RequirePermission middleware tests, all green
DB-dependent resolver integration tests (5 functions) — need Docker for the full reconcile+assign+resolve loop
Behavior smoke: existing handler gates still work because no callsite was swapped; lefthook run pre-push covers the typecheck for the interceptor threading the new resolver in

Rollback

Reverting this PR removes the resolver call from the auth interceptor, the new files, and the PermissionResolver field on AuthInterceptor. No schema change, no data migration, no live behavior change to reverse.

In-memory permission set the per-request resolver materializes from the user_organization_role × role × role_permission join. Pure data structure — DB integration lands in the next commit (PermissionResolver). Has(key, ResourceContext) implements the plan's narrowing rule: - Org-scope assignment grants the action everywhere within the org. - Site-scope assignment overrides the org grant at that site only (site-scope shadow), so an admin can add a smaller site role to narrow a user without revoking their org-wide access. - Org-scoped actions (no site context in the request) are NEVER shadowed by site-scope grants — there is no site key to narrow on, so only org-scope assignments can satisfy user:manage and friends. FlatKeys() projects the union for UserInfo.permissions (client-side coarse gate; the server still enforces scope via Has() on every call). Test-first per the plan's execution note. Ten tests cover: org-scope allows everywhere, site-scope only at that site, org-scoped action needs org grant, narrowing site-overrides-org at the matched site but org still applies elsewhere, narrowing doesn't shadow org-scoped actions, multi-site union, empty deny, unknown-key deny, FlatKeys determinism, and the FIELD_TECH AE (blink yes, reboot no).

Has() never reads role identity (id/name/builtin_key) — it only consults permission_key + scope. AssignmentRole was a speculative field; remove it to keep the resolver's input shape minimal and avoid a population requirement that the ListEffectivePermissionsForUser query doesn't even satisfy.

…ions LoadEffective(userID, orgID) runs a single query against user_organization_role × role × role_permission via the ListEffectivePermissionsForUser sqlc binding, groups the flat (assignment_id, scope, permission_key) rows into one Assignment per assignment_id, and materializes the slice into an EffectivePermissions value the middleware can query via Has(). Soft-deleted assignments and roles are excluded by the underlying SQL. A user with no live assignments returns a non-nil empty value — the fail-closed default for a deactivated user or a user who was never in this org. Integration tests (DB-backed via testutil.GetTestDB) cover: - SUPER_ADMIN at org-scope gets the full catalog. - FIELD_TECH at site-scope is bounded at that site. - Narrowing end-to-end with two assignments (ADMIN @ org + FIELD_TECH @ site). - Soft-deleted assignment is ignored on the next resolver call. - No-assignments returns empty deny-all (non-nil).

… denial payload Runtime counterpart of EffectivePermissions.Has() — the call site each handler uses in place of RequireAdmin (the swap lands in PR 2b). Behavior: - Unauthenticated request (no session.Info on context) → Connect Unauthenticated. - Internal actor (session.Info.Actor != "") short-circuits to ALLOW. Scheduler and curtailment-reconciler synthesize sessions without a real user; they have no rows in user_organization_role and are trusted by virtue of running in-process. - Authenticated user, EffectivePermissions missing on context → Connect Internal. Fail-closed: the gate is the security boundary and the missing value indicates a wiring bug, not "allow." - Authenticated user, eff.Has(key, rc) false → Connect PermissionDenied with structured JSON body `{"required": "<key>", "scope": {"site_id": <N>}}`. Scope echoes the caller's ResourceContext only — never server-side assignment ids, role names, or the effective-permission list. WithEffectivePermissions stashes the resolver's per-request output on the context under a private key so handler code can't accidentally read or overwrite it. Eight unit tests cover: allow path, denial payload shape (both org-scoped and site-scoped variants), unauthenticated, fail-closed on missing EffectivePermissions, ActorScheduler short-circuit, ActorCurtailment short-circuit, narrowing semantics threading through the full middleware path (site-1 denies, site-2 allows under the same caller).

Every authenticated request now ends with a per-request EffectivePermissions stashed on the context, regardless of whether the caller authenticated by session cookie or API key. RequirePermission (committed in the previous change) consults that stashed value to gate handler entry. Implementation: - AuthInterceptor gains a *authz.PermissionResolver field; both authenticateWithSession and authenticateWithApiKey end with a shared loadEffectivePermissions helper that calls resolver.LoadEffective and wraps the result with middleware.WithEffectivePermissions before returning the context. - API-key callers inherit the user's CURRENT effective set on every request (live inheritance). Revocation propagates on next request. Snapshot-at-issue is out of scope for this PR and noted in the resolver godoc. - DB or wiring errors from LoadEffective are surfaced via the same classifyLookupError path as the existing user/role lookups — authenticated requests with a broken resolver path fail closed. - main.go constructs the resolver after migrations finish (Reconcile runs first so per-org built-in rows exist) and threads it into NewAuthInterceptor. - testutil.ServiceProvider gains a PermissionResolver field; all NewAuthInterceptor callsites updated. Two unit tests that passed nil for unused dependencies (config_test, authentication_apikey_error_test) add a nil for the resolver too — those paths exit before the resolver runs and don't need a real one.

github-actions · 2026-05-22T16:44:32Z

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

Reviewed pull request diff only (3f157ca2f68465471f88a1af8397104bae72d239...7a202c74f2fc9f198c8cc7066a75131df2079684, exact PR three-dot diff)

Model: gpt-5.5

💡 Click "edited" above to see previous reviews for this PR.

Review Summary

Overall Risk: MEDIUM

Findings

[MEDIUM] Permission Rechecks Reuse Stale Authorization Snapshot

Category: Auth
Location: server/internal/handlers/middleware/permission.go:102
Description: RequirePermission only reads the EffectivePermissions value cached on the request context during authentication. The new comment says long-running RPCs can re-invoke RequirePermission so revocation propagates within the same streaming session, but re-invocation uses the same cached snapshot and never reloads from the resolver.
Impact: If a user or API key loses a role/permission while a long-running RPC is active, later side effects in that same RPC can continue under the permissions loaded at request start. This matters for firmware updates, log downloads, and streaming control paths where a request can outlive an operator’s revocation action.
Recommendation: Either document that permissions are strictly request-start snapshots, or add a fresh-check path for long-running handlers that reloads permissions from PermissionResolver before each significant side effect.

Notes

No SQL injection, command injection, pool hijacking, protobuf wire-format, plugin-boundary, or frontend security issues were evident in the reviewed diff.

_{Generated by Codex Security Review |

Triggered by: @mcharles-square |

Review workflow run}

Copilot

Pull request overview

Adds the new granular RBAC “effective permissions” snapshot + resolver, wires it into the auth interceptor, and introduces a RequirePermission middleware that will become the new handler gate (without switching any call sites yet).

Changes:

Introduces authz.EffectivePermissions with narrowing semantics and a DB-backed PermissionResolver.
Adds handlers/middleware.RequirePermission and context plumbing for per-request effective permissions.
Wires resolver loading into the auth interceptor and updates server/test wiring to pass the resolver.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
server/internal/testutil/service_provider.go	Adds `PermissionResolver` to test service provider wiring.
server/internal/testutil/infrastructure_provider.go	Threads the resolver into `NewAuthInterceptor` in test infra.
server/internal/handlers/middleware/permission.go	New `RequirePermission` middleware + context stash/accessors and structured deny payload.
server/internal/handlers/middleware/permission_test.go	Unit tests for allow/deny, payload shape, fail-closed behavior, and actor short-circuit.
server/internal/handlers/interceptors/config_test.go	Updates interceptor construction signature in tests.
server/internal/handlers/interceptors/authentication.go	Loads/stashes effective permissions after auth (session + API key).
server/internal/handlers/interceptors/authentication_test.go	Updates interceptor construction to pass resolver in integration tests.
server/internal/handlers/interceptors/authentication_apikey_error_test.go	Updates interceptor construction signature.
server/internal/domain/authz/effective.go	New in-memory effective-permissions decision structure + `Has`/`FlatKeys`.
server/internal/domain/authz/effective_test.go	Unit tests for narrowing semantics, unions, and flat projection.
server/internal/domain/authz/resolver.go	New resolver that materializes `EffectivePermissions` from one SQL query.
server/internal/domain/authz/resolver_test.go	DB-backed integration tests for org/site scope and narrowing behavior.
server/cmd/fleetd/main.go	Constructs and injects the resolver into the auth interceptor at boot.

…-closed Round-1 review on #301 caught three real bugs. **Zero-permission site assignments fail open (Codex HIGH + thread 2):** ListEffectivePermissionsForUser was INNER JOINing role_permission, so a site-scope role with zero permissions produced no rows. The resolver never recorded bySite[siteID] for that site and narrowing collapsed back to the user's org-scope grant — an empty "lockdown" role intended to revoke broad access at a single site would have silently failed and allowed miner:reboot / miner:update_pools / etc. through. Switched to LEFT JOIN role_permission and permission; permission_key is now nullable. assignmentsFromRows skips null keys but still creates the Assignment row so the bySite[] entry exists. Two regression tests added: pure unit test (in-memory empty assignment narrows correctly) plus DB-backed test exercising the LEFT JOIN path with a real empty custom role. **Any non-empty Actor bypasses RBAC (Codex MEDIUM):** RequirePermission short-circuited on `info.Actor != ""`, which would accept any future or mistyped Actor value as fully trusted. Switched to an explicit allowlist: ActorScheduler and ActorCurtailment short-circuit to ALLOW; any other non-empty Actor returns Internal. Regression test proves an unknown Actor surfaces as Internal, not ALLOW. **Nil PermissionResolver panic (thread 1):** loadEffectivePermissions dereferenced i.permissionResolver without a nil check. A miswired test or boot path constructing the interceptor with nil (a few unit tests pass nil for unused dependencies) would have panicked the server instead of failing closed. Added the nil check — returns Internal so the wiring bug surfaces immediately rather than as a crash.

mcharles-square · 2026-05-22T17:03:23Z

Codex Security Review (round 1 on #301) — response

Both findings addressed in e7f762a.

HIGH — Zero-permission site assignments fail open to org-scope grants

A site-scoped role with zero permissions produces no permission rows, so the resolver never records bySite[siteID]; Has then falls back to the user's org-scope grant.

You're right — the INNER JOIN dropped empty-role assignments and narrowing collapsed. Fixed by switching ListEffectivePermissionsForUser to LEFT JOIN role_permission and LEFT JOIN permission, with permission_key now sql.NullString. The resolver creates the Assignment (and therefore the bySite[siteID] marker) on first sight of the assignment id, and only appends a permission key when the row carries a non-null one. An assignment with zero permissions still triggers narrowing.

Two regression tests added:

TestEffective_ZeroPermissionSiteAssignmentStillNarrows — pure unit test against NewEffectivePermissions directly.
TestResolver_ZeroPermissionSiteAssignmentStillNarrows — DB-backed, creates an empty custom role via UpsertCustomRoleForOrg and confirms an org-scope ADMIN + zero-permission site role denies miner:reboot at the targeted site, but still allows it at sites with no narrower assignment.

MEDIUM — Any non-empty internal actor bypasses all permission checks

Switch on explicit allowed constants (session.ActorScheduler, session.ActorCurtailment) and fail closed for unknown non-empty actors. Add a test proving an unknown actor does not bypass EffectivePermissions.

Done. Replaced the info.Actor != "" short-circuit with an explicit allowlist:

if info.Actor != "" {
    switch info.Actor {
    case session.ActorScheduler, session.ActorCurtailment:
        return info, nil
    default:
        return nil, fleeterror.NewInternalErrorf(
            "authz: unknown internal actor %q; refusing to short-circuit RBAC",
            info.Actor,
        )
    }
}

TestRequirePermission_UnknownActorDoesNotBypass mounts a context with Actor: session.Actor("future-orchestrator-typo") and no EffectivePermissions; the call returns Internal (not ALLOW) — proves the typo path does not silently grant.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e7f762ab8c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

… succeeds Codex P2 caught: authenticateWithApiKey was calling RecordSuccessfulUse before loadEffectivePermissions, so a resolver failure (DB error, nil wiring) would reject the request while still crediting last_used_at as a successful authentication. Audit and usage telemetry would report failed auths as successful. Moved RecordSuccessfulUse below the loadEffectivePermissions call so it only fires when the full authentication chain succeeds. Behavior on success is unchanged.

…Manage CI caught that TestResolver_ZeroPermissionSiteAssignmentStillNarrows asserted ADMIN holds user:manage at org scope. ADMIN's seed formula is AllPermissions() minus {user:read, user:manage, role:manage} — the user:manage exclusion is the whole point of the SUPER_ADMIN-only floor. Swapped the org-scope assertion to site:manage, which IS in ADMIN's seed and exercises the same property (org-scoped action, satisfied by org-scope grant despite the empty site-scope assignment).

…d scope helper Review-driven simplification pass over PR 2a (no behavior change): **resolver.go assignmentsFromRows**: dropped the `started bool` + `flush()` closure pattern. Pre-seed the slice from `rows[0]` then iterate uniformly; the same code path now handles first-row and same-assignment-row cases. Extracted `newAssignmentFromRow` for the scope-only seed. Net: -8 lines and clearer flow. **effective.go Has()**: tightened the inline narration. The type-level doc already explains narrowing in full; the function body's three repeat-explanations were noise. Code is short enough to read directly. **resolver_test.go assignAssignment**: helper now takes `authz.ScopeType` instead of raw `string`, and call sites use `authz.ScopeOrg` / `authz.ScopeSite` constants instead of literal `"org"` / `"site"`. Removes seven stringly-typed call sites. **resolver_test.go / backfill_test.go**: moved `insertTestSite` next to its sibling `insertTestOrganization` and `insertTestUser` helpers in backfill_test.go — all three are package-local DB-seeding helpers that belong together. **permission.go**: dropped the "in U7" task-tracker reference from the RequirePermission godoc; replaced with "as call sites migrate to the permission model" so the doc outlives the migration. **resolver.go NewPermissionResolver**: trimmed the constructor godoc to the WHY (per-request read query, no surrounding tx) and dropped the sentence that just narrated the signature. Build, vet, golangci-lint, and non-DB authz/middleware tests all green.

ankitgoswami

looks good

mcharles-square added 5 commits May 22, 2026 19:22

mcharles-square requested a review from a team as a code owner May 22, 2026 16:41

Copilot AI review requested due to automatic review settings May 22, 2026 16:41

github-actions Bot assigned mcharles-square May 22, 2026

github-actions Bot added the server label May 22, 2026

Copilot started reviewing on behalf of mcharles-square May 22, 2026 16:42 View session

Copilot AI reviewed May 22, 2026

View reviewed changes

Comment thread server/internal/handlers/interceptors/authentication.go

Comment thread server/internal/domain/authz/resolver.go

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Comment thread server/internal/handlers/interceptors/authentication.go Outdated

mcharles-square added 3 commits May 22, 2026 20:16

mcharles-square mentioned this pull request May 22, 2026

feat(rbac): RPC permission declaration map + contract test (PR 2b of 4) #303

Merged

6 tasks

ankitgoswami approved these changes May 22, 2026

View reviewed changes

mcharles-square merged commit cf6ff01 into main May 22, 2026
68 checks passed

mcharles-square deleted the feat/rbac-pr2a-resolver branch May 22, 2026 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rbac): resolver + RequirePermission middleware (PR 2a of 4)#301

feat(rbac): resolver + RequirePermission middleware (PR 2a of 4)#301
mcharles-square merged 9 commits into
mainfrom
feat/rbac-pr2a-resolver

mcharles-square commented May 22, 2026

Uh oh!

github-actions Bot commented May 22, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

mcharles-square commented May 22, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

ankitgoswami left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mcharles-square commented May 22, 2026

What landed

domain/authz/effective.go — in-memory decision type

domain/authz/resolver.go — PermissionResolver

handlers/middleware/permission.go — RequirePermission

handlers/interceptors/authentication.go — wiring

What this PR does NOT do

Test plan

Rollback

Uh oh!

github-actions Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔐 Codex Security Review

Review Summary

Findings

[MEDIUM] Permission Rechecks Reuse Stale Authorization Snapshot

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

mcharles-square commented May 22, 2026

Codex Security Review (round 1 on #301) — response

HIGH — Zero-permission site assignments fail open to org-scope grants

MEDIUM — Any non-empty internal actor bypasses all permission checks

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

ankitgoswami left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

`domain/authz/effective.go` — in-memory decision type

`domain/authz/resolver.go` — `PermissionResolver`

`handlers/middleware/permission.go` — `RequirePermission`

`handlers/interceptors/authentication.go` — wiring

github-actions Bot commented May 22, 2026 •

edited

Loading