Skip to content

Shared Nexus endpoint lookup cache#10204

Open
stephanos wants to merge 1 commit into
mainfrom
stephanos/sano-fix-flaky-endpoint
Open

Shared Nexus endpoint lookup cache#10204
stephanos wants to merge 1 commit into
mainfrom
stephanos/sano-fix-flaky-endpoint

Conversation

@stephanos
Copy link
Copy Markdown
Contributor

@stephanos stephanos commented May 8, 2026

What changed?

Shared cache for Nexus endpoint name lookups when history/matching run in same process.

Why?

During testing - such as functional server tests or SDK tests against the CLI - the Nexus endpoint is not always immediately available after creation. This creates friction as it requires extra retries and causes flakiness if not guarded against. This change eliminates that need.

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Risk

Significantly more complex change than #10208

@stephanos stephanos marked this pull request as ready for review May 8, 2026 16:38
@stephanos stephanos requested review from a team as code owners May 8, 2026 16:38
@stephanos stephanos changed the title Fix ensureNexusEndpoint Fix ensureNexusEndpoint waits for endpoint to be created May 8, 2026
@stephanos stephanos requested a review from S15 May 8, 2026 16:38
S15
S15 previously approved these changes May 8, 2026
@S15 S15 dismissed their stale review May 8, 2026 22:42

Test failures

@S15 S15 self-requested a review May 8, 2026 22:42
@stephanos stephanos changed the title Fix ensureNexusEndpoint waits for endpoint to be created Make Nexus endpoints strongly consistent in single-process moe May 8, 2026
@stephanos stephanos changed the title Make Nexus endpoints strongly consistent in single-process moe Make Nexus endpoints strongly consistent in single-process mode May 8, 2026
@stephanos stephanos requested a review from a team as a code owner May 8, 2026 23:02
@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch 3 times, most recently from ab4d940 to 65c65a9 Compare May 8, 2026 23:20
@stephanos stephanos changed the title Make Nexus endpoints strongly consistent in single-process mode Strongly consistent Nexus endpoints in single-process mode May 8, 2026
@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch 2 times, most recently from 975897c to 8ef607e Compare May 8, 2026 23:29
@stephanos stephanos marked this pull request as draft May 8, 2026 23:30
@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch 6 times, most recently from b16743a to 00f7ea4 Compare May 8, 2026 23:37
@stephanos stephanos changed the title Strongly consistent Nexus endpoints in single-process mode Strongly consistent Nexus endpoint lookups in single-process mode May 8, 2026
@stephanos stephanos changed the title Strongly consistent Nexus endpoint lookups in single-process mode Strongly consistent Nexus endpoint lookup in single-process mode May 8, 2026
@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch from 00f7ea4 to 818e3e6 Compare May 13, 2026 00:30
Comment thread tests/nexus_test_base.go
}
return true
}, 10*time.Second, 100*time.Millisecond, "endpoint should become visible")
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obsolete now

@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch 7 times, most recently from 6535d73 to b704a69 Compare May 13, 2026 16:18
}
}

func (r *EndpointRegistryImpl) refreshEndpointsLoop(ctx context.Context, dataReady *dataReady) error {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's redundant locally now but I opted to leave this in-place when in-process to not diverge the behavior from prod any more.

@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch 7 times, most recently from bced3f9 to c1d9baf Compare May 13, 2026 20:35
@stephanos stephanos changed the title Strongly consistent Nexus endpoint lookup in single-process mode Shared Nexus endpoint lookup cache May 13, 2026
@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch 4 times, most recently from b94441a to d169a8b Compare May 13, 2026 22:38
endpointsByName map[string]*persistencespb.NexusEndpointEntry
tableVersionChanged chan struct{}
sync.RWMutex // protects endpointEntries
endpointEntries []*persistencespb.NexusEndpointEntry // sorted by ID to support pagination during ListNexusEndpoints
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endpointEntries could also live in the cache, but then history would need to allocate this needlessly.

@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch 2 times, most recently from e28b958 to 2932125 Compare May 13, 2026 23:51
}

// reset cached view since we will be paging from the start
m.resetCacheStateLocked()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clearing the cache here as it would also clear the history cache; the view on the matching side will still be consistent as this happens under a lock

@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch from 2932125 to bf69331 Compare May 13, 2026 23:54
@stephanos stephanos force-pushed the stephanos/sano-fix-flaky-endpoint branch from bf69331 to db55bb8 Compare May 13, 2026 23:56
@stephanos stephanos marked this pull request as ready for review May 14, 2026 00:21
@stephanos stephanos requested a review from a team as a code owner May 14, 2026 00:21
@stephanos stephanos requested a review from bergundy May 14, 2026 00:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants