Skip to content

Cache JWKS instance across verifyAccessToken calls#67

Open
ohjonah wants to merge 5 commits into
mainfrom
devin/1776709961-cache-jwks
Open

Cache JWKS instance across verifyAccessToken calls#67
ohjonah wants to merge 5 commits into
mainfrom
devin/1776709961-cache-jwks

Conversation

@ohjonah
Copy link
Copy Markdown

@ohjonah ohjonah commented Apr 20, 2026

Summary

verifyAccessToken in src/session.ts called createRemoteJWKSet from jose on every invocation, creating a fresh JWKS instance for each token verification. Since each instance maintains its own key cache, this effectively bypassed jose's built-in caching and forced a network round trip to the JWKS endpoint on every request — observed to take >3s under load against api.workos.com/sso/jwks/{clientId}.

This change lazily initializes the JWKS instance at module scope and reuses it across all verifyAccessToken calls for the lifetime of the process.

  • New module-level cachedJWKS: ReturnType<typeof createRemoteJWKSet> | undefined (not exported).
  • New private getJWKS() helper (not exported) that creates the instance on first use and returns the cached one on subsequent calls. Lazy, not top-level — getWorkOS() and getConfig('clientId') are only read on first access, so consumer configuration via configure() is still respected.
  • verifyAccessToken keeps the same signature and behavior; it just pulls the JWKS from getJWKS().
  • Added tests spying on createRemoteJWKSet that prime the module-scoped cache, clear the mock, then invoke the authenticated loader path multiple times and assert createRemoteJWKSet is not called again even though jwtVerify still runs every time.

Review & Testing Checklist for Human

  • Confirm the lazy-init approach is acceptable for SSR/worker runtimes where configure() runs at request boundaries — the cache is keyed by process lifetime, not by clientId, so switching clientId at runtime would continue to use the first-seen URL.
  • Verify in a running app that token verification no longer hits the JWKS endpoint on every request (e.g. via request logs / traces to api.workos.com/sso/jwks/...).
  • Skim src/session.ts around verifyAccessToken / getJWKS to confirm nothing else in the file references the internal cache.

Notes

  • The test file for session contains 88 tests total — 87 previously passing plus the new JWKS caching test. All pass locally.
  • jose's createRemoteJWKSet returns a getKey function that already performs per-instance key caching and coalesces concurrent JWKS fetches, so caching the instance (rather than caching raw key material ourselves) gives us that behavior for free while staying on the library's supported API.

Link to Devin session: https://app.devin.ai/sessions/d92cce9f168c44f1903291c8d723895b
Requested by: @ohjonah


Open in Devin Review

The internal verifyAccessToken helper called createRemoteJWKSet from jose
on every invocation, which creates a fresh JWKS instance and throws away
jose's built-in per-instance key cache. This forces a network request to
the JWKS endpoint for every token verification, which has been observed
to take >3s under load.

Lazily initialize the JWKS instance at module scope so it is created
once and reused for the lifetime of the process. getWorkOS() and
getConfig('clientId') are still resolved on first use (not at module
load time), so consumer configuration remains respected.

Co-Authored-By: jonah <jonah+cursor@workos.com>
@devin-ai-integration
Copy link
Copy Markdown
Contributor

Original prompt from jonah

SYSTEM:
=== BEGIN THREAD HISTORY ===
Devin (U08G1NMS6AV): @Jonah Oh (U06D60MDWPJ) <https://work-os.slack.com/archives/D0ATK5DEY8K/p1776470286311259?thread_ts=1776468637.767169&amp;amp;cid=D0ATK5DEY8K|asked>:
&gt;Referencing <https://work-os.slack.com/archives/C01DSKPKFMF/p1776435724387069|this thread>, according to the WorkOS main monorepo, why do we create groups based on the "Report_Entry" field, and is there a way we could screen whether the object is empty before trying to create a directory group?

Devin (U08G1NMS6AV): Open web app button 💡 Tip: Use !ask for follow-up questions, or !deep for deeper research.

💡 Tip: Use !ask for follow-up questions, or !deep for deeper research.

Devin (U08G1NMS6AV): Why Groups Are Created Based on Report_Entry You're asking about the Workday adapter's group creation logic — specifically why Report_Entry is used as the basis for creating directory groups, and whether there's a way to screen for empty objects before attempting group creation. Why Report_Entry Drives Group Creation In the Workday integration, the adapter fetches a "report" from Workday's RaaS (Report-as-a-Service) API. The response is a WorkdayReport object whose primary data field is Report_Entry — an array of row objects, each representing one employee/record from the Workday report.
Groups are derived from this field in two places:

  1. From the group report (processGroupReport): Each entry in Report_Entry is parsed into a DirectoryGroup via DirectoryAttributeMapTranslator.parseDirectoryGroup. <https://github.com/workos/workos/blob/5ba872c1/packages/api/src/directory-sync/adapters/workday/workday-adapter.ts`#`L457-L494|workos/workos:workday-adapter.ts:457-494>
  2. From the user report (processUserReport): Each user entry in Report_Entry also carries a group field (e.g., department), so groups are also derived inline from user rows. <https://github.com/workos/workos/blob/5ba872c1/packages/api/src/directory-sync/adapters/work... (9115 chars truncated...)

@devin-ai-integration
Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 20, 2026

Greptile Summary

This PR caches the createRemoteJWKSet instance at module scope in src/session.ts, preventing a redundant JWKS network round-trip on every verifyAccessToken call. Previous concerns about clientId switching and test isolation have been addressed: getJWKS() now tracks cachedJWKSUrl and rebuilds on URL changes, and tests use jest.isolateModules + createRequire to avoid cross-test cache leakage.

Confidence Score: 5/5

Safe to merge — the caching logic is correct, URL-invalidation handles clientId changes, and tests are properly isolated.

All previously raised P0/P1 concerns (cache URL staleness, test isolation) are addressed in this revision. The remaining iss/aud JWT validation gap is pre-existing, explicitly acknowledged as out of scope, and being tracked separately. No new issues introduced.

No files require special attention.

Important Files Changed

Filename Overview
src/session.ts Adds cachedJWKS/cachedJWKSUrl module-level state and a getJWKS() helper that lazily initialises and URL-invalidates the JWKS instance; verifyAccessToken now delegates to getJWKS().
src/session.spec.ts Adds two tests under JWKS caching using jest.isolateModules+createRequire for per-test module isolation, covering cache reuse and cache invalidation on URL change.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[verifyAccessToken] --> B[getJWKS]
    B --> C{cachedJWKS exists AND url unchanged?}
    C -- Yes --> D[Return cachedJWKS]
    C -- No --> E[Compute jwksUrl from clientId]
    E --> F[createRemoteJWKSet]
    F --> G[Update cachedJWKS and cachedJWKSUrl]
    G --> D
    D --> H[jwtVerify accessToken JWKS]
Loading

Reviews (4): Last reviewed commit: "Use createRequire for isolated test modu..." | Re-trigger Greptile

greptile-apps[bot]

This comment was marked as resolved.

Keep the lazy per-process cache, but also remember the URL it was built
for. If getWorkOS()/getConfig('clientId') ever produce a different URL
(e.g. a multi-tenant worker re-configures per request), discard the old
instance and build a new one instead of silently serving stale keys.

Add a test that exercises the URL-change path alongside the existing
'reuse on same URL' test.

Co-Authored-By: jonah <jonah+cursor@workos.com>
Comment thread src/session.ts
const JWKS = createRemoteJWKSet(new URL(getWorkOS().userManagement.getJwksUrl(getConfig('clientId'))));
const JWKS = getJWKS();
try {
await jwtVerify(accessToken, JWKS);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 security Missing iss and aud claim validation in jwtVerify

jwtVerify is called without issuer or audience options, so the JWT's iss and aud claims go unchecked. A validly-signed WorkOS JWT issued for a different client ID (aud) or by a different issuer (iss) — e.g., a token from a different WorkOS environment — would pass this check and authenticate the request. The team's JWT rule requires both claims to be validated.

Note: The exact issuer string should match what WorkOS sets in the iss claim — verify against the WorkOS docs or a decoded token. At minimum, pass audience: getConfig('clientId') to bind the token to the configured client.

Rule Used: JWTs should always be validated before use and the... (source)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks — this is a real gap but it's pre-existing on main (the original jwtVerify(accessToken, JWKS) call had no issuer/audience either), and fixing it is out of scope for a caching-only change. The validation requirements (what to pass for issuer, how to handle multi-environment aud, rollout considerations) deserve their own PR and tests. Will flag to the user to track separately.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bots, I hear you. But since it's flagged, and since it's security related... it would be more than nice if this validation occurred. 😅

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed with Jonah that we're keeping this PR caching-only — none of the other WorkOS SDKs validate iss/aud on access tokens today, so adding it here alone would be an inconsistent one-off. If/when we do that work, it should go across SDKs together (and needs decisions on the exact iss string WorkOS stamps and how to handle tokens from other environments).

Happy to file a cross-SDK tracking issue if useful — just let me know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sure, turn it into an issue.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed: #69 — scoped it as a cross-SDK tracking issue (since the same gap exists in authkit-nextjs / authkit-remix / authkit-astro and needs a coordinated rollout + a pinned-down iss string). This PR stays caching-only.

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

@ohjonah ohjonah requested review from gjtorikian and nicknisi April 21, 2026 23:34
Comment thread src/session.spec.ts
});

describe('JWKS caching', () => {
const createLoaderArgs = (request: Request): LoaderFunctionArgs =>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're dealing with a cache, and the cache is at the module-level, I think we should sandbox these tests within jest.isolateModules() to ensure that anything going on here doesn't affect any other tests. It would be an unlikely ordering bug should such an issue occur, but a subtle and annoying one, to be sure.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 57e2138 — each test now loads a fresh copy of ./session.js via jest.isolateModules(...) and re-wires the mocks on the isolated jose/workos/sessionStorage/iron-session. That also kills the module-scoped JWKS cache at the end of every test, so no state leaks out of this describe.

Comment thread src/session.ts
let cachedJWKSUrl: string | undefined;

function getJWKS(): ReturnType<typeof createRemoteJWKSet> {
const jwksUrl = getWorkOS().userManagement.getJwksUrl(getConfig('clientId'));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since getJwksUrl issues an API call, I think it would make sense, if possible, to check that getConfig('clientId') is in the cache. As in, if getConfig('clientId')'s JWKS URL is fetched once, don't fetch it again. I guess it depends on how frequently one's JWKS URL would change -- I'm not sure if this is a real problem or not, but it does seem expensive to hit an endpoint every time you're investigating a cache.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good instinct to check, but getJwksUrl is pure string formatting — no API call. In @workos-inc/node:

getJwksUrl(clientId) {
  if (!clientId) {
    throw TypeError('clientId must be a valid clientId');
  }
  return `${this.workos.baseURL}/sso/jwks/${clientId}`;
}

So the "fetch once per clientId" goal is already what this cache achieves: same clientId → same URL → cache hit; no network. The only thing we re-invoke each call is a template-literal concatenation, which is effectively free vs. the actual JWKS round-trip this PR is eliminating.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds good! Thanks clanker.

Comment thread src/session.spec.ts Outdated
expect(createRemoteJWKSetMock).toHaveBeenCalledTimes(1);
} finally {
if (originalImpl) {
getJwksUrl.mockImplementation(originalImpl);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be done in some kind of afterEach, rather than finally.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — with the isolated-modules refactor in 57e2138 the try/finally is gone entirely. The getJwksUrl mock lives on a workos instance inside the isolated module, so it's torn down with the rest of the isolated graph at end of test; nothing to restore.

Comment thread src/session.ts
const JWKS = createRemoteJWKSet(new URL(getWorkOS().userManagement.getJwksUrl(getConfig('clientId'))));
const JWKS = getJWKS();
try {
await jwtVerify(accessToken, JWKS);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bots, I hear you. But since it's flagged, and since it's security related... it would be more than nice if this validation occurred. 😅

devin-ai-integration Bot and others added 2 commits April 22, 2026 00:47
Load a fresh copy of ./session.js inside jest.isolateModules for each
JWKS caching test so the module-scoped JWKS cache cannot leak across
tests (or out of the describe block). This removes the need for the
try/finally restore of the getJwksUrl mock.

Co-Authored-By: jonah <jonah+cursor@workos.com>
Co-Authored-By: jonah <jonah+cursor@workos.com>
Comment thread src/session.spec.ts Outdated
function loadIsolated(): IsolatedModules {
let isolated!: IsolatedModules;
jest.isolateModules(() => {
/* eslint-disable @typescript-eslint/no-require-imports, @typescript-eslint/no-var-requires */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the code be written such that these disables are not necessary?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 71ba2b2 — swapped the bare require() calls for a createRequire(__filename)-bound loader so the eslint disables are gone. Tests still pass and npm run lint is clean locally.

Replaces the bare require() calls inside jest.isolateModules with a
createRequire-bound loader so the eslint disables for
@typescript-eslint/no-require-imports and no-var-requires are no
longer needed.

Co-Authored-By: jonah <jonah+cursor@workos.com>
Comment thread src/session.ts

function getJWKS(): ReturnType<typeof createRemoteJWKSet> {
const jwksUrl = getWorkOS().userManagement.getJwksUrl(getConfig('clientId'));
if (!cachedJWKS || cachedJWKSUrl !== jwksUrl) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In src/session.ts:603, cachedJWKS and cachedJWKSUrl are module-level let bindings — in environments that share a module instance across concurrent requests (Node.js workers, edge runtimes), what happens if two requests race through the if (!cachedJWKS || cachedJWKSUrl !== jwksUrl) check simultaneously with different URLs? Could both write cachedJWKS and cachedJWKSUrl non-atomically, leaving the URL and instance out of sync?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, but getJWKS() is fully synchronous — there's no await between the check and the two assignments — and both Node.js workers and edge runtimes (CF Workers, Vercel Edge, etc.) run JS on a single-threaded event loop. The function runs to completion atomically from the loop's perspective, so two concurrent verifyAccessToken calls cannot interleave between the if and the writes, and cachedJWKS/cachedJWKSUrl cannot end up out of sync.

Within a single process the failure mode is the trivial one: if two calls genuinely arrive in the same tick before either has populated the cache (only possible if a prior microtask scheduled them both), each would compute jwksUrl, both would see !cachedJWKS, and the second would just overwrite the first with an equivalent instance — wasted construction, never inconsistent state.

True parallelism only shows up across separate workers / isolates, and those don't share module state at all — each gets its own cachedJWKS, which is the intended behavior (one JWKS instance per process, with jose's built-in key cache per instance).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants