Cache JWKS instance across verifyAccessToken calls#67
Conversation
The internal verifyAccessToken helper called createRemoteJWKSet from jose
on every invocation, which creates a fresh JWKS instance and throws away
jose's built-in per-instance key cache. This forces a network request to
the JWKS endpoint for every token verification, which has been observed
to take >3s under load.
Lazily initialize the JWKS instance at module scope so it is created
once and reused for the lifetime of the process. getWorkOS() and
getConfig('clientId') are still resolved on first use (not at module
load time), so consumer configuration remains respected.
Co-Authored-By: jonah <jonah+cursor@workos.com>
Original prompt from jonah
|
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Greptile SummaryThis PR caches the Confidence Score: 5/5Safe to merge — the caching logic is correct, URL-invalidation handles clientId changes, and tests are properly isolated. All previously raised P0/P1 concerns (cache URL staleness, test isolation) are addressed in this revision. The remaining iss/aud JWT validation gap is pre-existing, explicitly acknowledged as out of scope, and being tracked separately. No new issues introduced. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[verifyAccessToken] --> B[getJWKS]
B --> C{cachedJWKS exists AND url unchanged?}
C -- Yes --> D[Return cachedJWKS]
C -- No --> E[Compute jwksUrl from clientId]
E --> F[createRemoteJWKSet]
F --> G[Update cachedJWKS and cachedJWKSUrl]
G --> D
D --> H[jwtVerify accessToken JWKS]
Reviews (4): Last reviewed commit: "Use createRequire for isolated test modu..." | Re-trigger Greptile |
Keep the lazy per-process cache, but also remember the URL it was built
for. If getWorkOS()/getConfig('clientId') ever produce a different URL
(e.g. a multi-tenant worker re-configures per request), discard the old
instance and build a new one instead of silently serving stale keys.
Add a test that exercises the URL-change path alongside the existing
'reuse on same URL' test.
Co-Authored-By: jonah <jonah+cursor@workos.com>
| const JWKS = createRemoteJWKSet(new URL(getWorkOS().userManagement.getJwksUrl(getConfig('clientId')))); | ||
| const JWKS = getJWKS(); | ||
| try { | ||
| await jwtVerify(accessToken, JWKS); |
There was a problem hiding this comment.
Missing
iss and aud claim validation in jwtVerify
jwtVerify is called without issuer or audience options, so the JWT's iss and aud claims go unchecked. A validly-signed WorkOS JWT issued for a different client ID (aud) or by a different issuer (iss) — e.g., a token from a different WorkOS environment — would pass this check and authenticate the request. The team's JWT rule requires both claims to be validated.
Note: The exact
issuerstring should match what WorkOS sets in theissclaim — verify against the WorkOS docs or a decoded token. At minimum, passaudience: getConfig('clientId')to bind the token to the configured client.
Rule Used: JWTs should always be validated before use and the... (source)
There was a problem hiding this comment.
Thanks — this is a real gap but it's pre-existing on main (the original jwtVerify(accessToken, JWKS) call had no issuer/audience either), and fixing it is out of scope for a caching-only change. The validation requirements (what to pass for issuer, how to handle multi-environment aud, rollout considerations) deserve their own PR and tests. Will flag to the user to track separately.
There was a problem hiding this comment.
Bots, I hear you. But since it's flagged, and since it's security related... it would be more than nice if this validation occurred. 😅
There was a problem hiding this comment.
Confirmed with Jonah that we're keeping this PR caching-only — none of the other WorkOS SDKs validate iss/aud on access tokens today, so adding it here alone would be an inconsistent one-off. If/when we do that work, it should go across SDKs together (and needs decisions on the exact iss string WorkOS stamps and how to handle tokens from other environments).
Happy to file a cross-SDK tracking issue if useful — just let me know.
There was a problem hiding this comment.
Ok, sure, turn it into an issue.
There was a problem hiding this comment.
Filed: #69 — scoped it as a cross-SDK tracking issue (since the same gap exists in authkit-nextjs / authkit-remix / authkit-astro and needs a coordinated rollout + a pinned-down iss string). This PR stays caching-only.
| }); | ||
|
|
||
| describe('JWKS caching', () => { | ||
| const createLoaderArgs = (request: Request): LoaderFunctionArgs => |
There was a problem hiding this comment.
Since we're dealing with a cache, and the cache is at the module-level, I think we should sandbox these tests within jest.isolateModules() to ensure that anything going on here doesn't affect any other tests. It would be an unlikely ordering bug should such an issue occur, but a subtle and annoying one, to be sure.
There was a problem hiding this comment.
Done in 57e2138 — each test now loads a fresh copy of ./session.js via jest.isolateModules(...) and re-wires the mocks on the isolated jose/workos/sessionStorage/iron-session. That also kills the module-scoped JWKS cache at the end of every test, so no state leaks out of this describe.
| let cachedJWKSUrl: string | undefined; | ||
|
|
||
| function getJWKS(): ReturnType<typeof createRemoteJWKSet> { | ||
| const jwksUrl = getWorkOS().userManagement.getJwksUrl(getConfig('clientId')); |
There was a problem hiding this comment.
Since getJwksUrl issues an API call, I think it would make sense, if possible, to check that getConfig('clientId') is in the cache. As in, if getConfig('clientId')'s JWKS URL is fetched once, don't fetch it again. I guess it depends on how frequently one's JWKS URL would change -- I'm not sure if this is a real problem or not, but it does seem expensive to hit an endpoint every time you're investigating a cache.
There was a problem hiding this comment.
Good instinct to check, but getJwksUrl is pure string formatting — no API call. In @workos-inc/node:
getJwksUrl(clientId) {
if (!clientId) {
throw TypeError('clientId must be a valid clientId');
}
return `${this.workos.baseURL}/sso/jwks/${clientId}`;
}So the "fetch once per clientId" goal is already what this cache achieves: same clientId → same URL → cache hit; no network. The only thing we re-invoke each call is a template-literal concatenation, which is effectively free vs. the actual JWKS round-trip this PR is eliminating.
There was a problem hiding this comment.
Ok, sounds good! Thanks clanker.
| expect(createRemoteJWKSetMock).toHaveBeenCalledTimes(1); | ||
| } finally { | ||
| if (originalImpl) { | ||
| getJwksUrl.mockImplementation(originalImpl); |
There was a problem hiding this comment.
This should be done in some kind of afterEach, rather than finally.
There was a problem hiding this comment.
Good call — with the isolated-modules refactor in 57e2138 the try/finally is gone entirely. The getJwksUrl mock lives on a workos instance inside the isolated module, so it's torn down with the rest of the isolated graph at end of test; nothing to restore.
| const JWKS = createRemoteJWKSet(new URL(getWorkOS().userManagement.getJwksUrl(getConfig('clientId')))); | ||
| const JWKS = getJWKS(); | ||
| try { | ||
| await jwtVerify(accessToken, JWKS); |
There was a problem hiding this comment.
Bots, I hear you. But since it's flagged, and since it's security related... it would be more than nice if this validation occurred. 😅
Load a fresh copy of ./session.js inside jest.isolateModules for each JWKS caching test so the module-scoped JWKS cache cannot leak across tests (or out of the describe block). This removes the need for the try/finally restore of the getJwksUrl mock. Co-Authored-By: jonah <jonah+cursor@workos.com>
Co-Authored-By: jonah <jonah+cursor@workos.com>
| function loadIsolated(): IsolatedModules { | ||
| let isolated!: IsolatedModules; | ||
| jest.isolateModules(() => { | ||
| /* eslint-disable @typescript-eslint/no-require-imports, @typescript-eslint/no-var-requires */ |
There was a problem hiding this comment.
Can the code be written such that these disables are not necessary?
There was a problem hiding this comment.
Done in 71ba2b2 — swapped the bare require() calls for a createRequire(__filename)-bound loader so the eslint disables are gone. Tests still pass and npm run lint is clean locally.
Replaces the bare require() calls inside jest.isolateModules with a createRequire-bound loader so the eslint disables for @typescript-eslint/no-require-imports and no-var-requires are no longer needed. Co-Authored-By: jonah <jonah+cursor@workos.com>
|
|
||
| function getJWKS(): ReturnType<typeof createRemoteJWKSet> { | ||
| const jwksUrl = getWorkOS().userManagement.getJwksUrl(getConfig('clientId')); | ||
| if (!cachedJWKS || cachedJWKSUrl !== jwksUrl) { |
There was a problem hiding this comment.
In src/session.ts:603, cachedJWKS and cachedJWKSUrl are module-level let bindings — in environments that share a module instance across concurrent requests (Node.js workers, edge runtimes), what happens if two requests race through the if (!cachedJWKS || cachedJWKSUrl !== jwksUrl) check simultaneously with different URLs? Could both write cachedJWKS and cachedJWKSUrl non-atomically, leaving the URL and instance out of sync?
There was a problem hiding this comment.
Good question, but getJWKS() is fully synchronous — there's no await between the check and the two assignments — and both Node.js workers and edge runtimes (CF Workers, Vercel Edge, etc.) run JS on a single-threaded event loop. The function runs to completion atomically from the loop's perspective, so two concurrent verifyAccessToken calls cannot interleave between the if and the writes, and cachedJWKS/cachedJWKSUrl cannot end up out of sync.
Within a single process the failure mode is the trivial one: if two calls genuinely arrive in the same tick before either has populated the cache (only possible if a prior microtask scheduled them both), each would compute jwksUrl, both would see !cachedJWKS, and the second would just overwrite the first with an equivalent instance — wasted construction, never inconsistent state.
True parallelism only shows up across separate workers / isolates, and those don't share module state at all — each gets its own cachedJWKS, which is the intended behavior (one JWKS instance per process, with jose's built-in key cache per instance).
Summary
verifyAccessTokeninsrc/session.tscalledcreateRemoteJWKSetfromjoseon every invocation, creating a fresh JWKS instance for each token verification. Since each instance maintains its own key cache, this effectively bypassedjose's built-in caching and forced a network round trip to the JWKS endpoint on every request — observed to take >3s under load againstapi.workos.com/sso/jwks/{clientId}.This change lazily initializes the JWKS instance at module scope and reuses it across all
verifyAccessTokencalls for the lifetime of the process.cachedJWKS: ReturnType<typeof createRemoteJWKSet> | undefined(not exported).getJWKS()helper (not exported) that creates the instance on first use and returns the cached one on subsequent calls. Lazy, not top-level —getWorkOS()andgetConfig('clientId')are only read on first access, so consumer configuration viaconfigure()is still respected.verifyAccessTokenkeeps the same signature and behavior; it just pulls the JWKS fromgetJWKS().createRemoteJWKSetthat prime the module-scoped cache, clear the mock, then invoke the authenticated loader path multiple times and assertcreateRemoteJWKSetis not called again even thoughjwtVerifystill runs every time.Review & Testing Checklist for Human
configure()runs at request boundaries — the cache is keyed by process lifetime, not byclientId, so switchingclientIdat runtime would continue to use the first-seen URL.api.workos.com/sso/jwks/...).src/session.tsaroundverifyAccessToken/getJWKSto confirm nothing else in the file references the internal cache.Notes
jose'screateRemoteJWKSetreturns agetKeyfunction that already performs per-instance key caching and coalesces concurrent JWKS fetches, so caching the instance (rather than caching raw key material ourselves) gives us that behavior for free while staying on the library's supported API.Link to Devin session: https://app.devin.ai/sessions/d92cce9f168c44f1903291c8d723895b
Requested by: @ohjonah