feat(metadata): strengthen service-app mapping consistency, retry and…#3373
Open
NeverENG wants to merge 3 commits into
Open
feat(metadata): strengthen service-app mapping consistency, retry and…#3373NeverENG wants to merge 3 commits into
NeverENG wants to merge 3 commits into
Conversation
467a8d8 to
1ad53bd
Compare
… dedup (apache#3354) Make interface-to-app mapping registration safe under concurrent providers and give it a proper retry policy. - Optimistic concurrency across all backends so concurrent appends no longer clobber each other: etcd (GetValAndRev + UpdateWithRev), zookeeper (versioned SetContent), nacos (CasMd5). Each backend wraps its native conflict (ErrCompareFail / ErrBadVersion / ErrNodeExists / nacos publish failure) into the shared report.ErrMappingCASConflict sentinel via %w. - Graded retry: registerWithRetry retries only CAS conflicts (errors.Is) with exponential backoff + jitter, and returns permanent errors immediately instead of burning the whole retry budget. - Extract shared logic: report.MergeServiceAppMapping (whole-element dedup, fixing the strings.Contains substring false positive and the leading-comma bug on empty values) and report.DecodeServiceAppNames (skips empty elements). - Listener cleanup: zookeeper removal via CacheListener.RemoveKeyListeners; etcd documents the listener as unsupported instead of silently succeeding. - Tests: helper unit tests plus a concurrency test that reproduces the lost-update bug and proves CAS preserves every writer (200 writers / 20 readers, passes under -race). Known nacos-only limitation (documented in code): CasMd5 is an optimistic UPDATE and cannot guard the first INSERT, so the initial concurrent registration of a brand-new interface can still race. etcd and zookeeper are not affected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
1ad53bd to
56e0b44
Compare
Contributor
|
先看一下应该没和#3371重复吧 |
Alanxtl
reviewed
Jun 7, 2026
…#3354) - nacos: stop swallowing the getConfig read error. On a failed read the old value was treated as empty, so registration would publish only the current app and overwrite an existing set (e.g. appA,appB -> appC). Return the error instead so an existing mapping is never clobbered. A genuinely absent config still returns ("", nil) and takes the first-write path. - zookeeper: CacheListener.DataChange now builds the set via report.DecodeServiceAppNames, so mapping change events no longer surface empty app names from legacy/malformed comma-separated values (",app", "app,,other"). Added a listener test covering this. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…g registration The previous commit returned any getConfig error from RegisterServiceAppMapping. Nacos signals a never-written key with a "config data not exist" error (not an empty value), so the first registration of a fresh interface failed and the provider panicked on service export (broke the registry/nacos integration test). Only treat genuine read failures (network/auth/server) as errors; the not-found signal is handled as an empty old value so the first write can create the key. Detection mirrors config_center/nacos's isConfigNotExistErr. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.


Description
Fixes # issue3354
What this PR does
Fixes #3354. Hardens application-level service-app mapping (
interface -> app names)registration so it is correct under concurrent providers, and gives it a proper retry policy.
完善应用级 service-app mapping 的写入一致性、重试与去重。
Background
The mapping value is a comma-separated set of application names stored under a single
interface key, shared by all providers of that interface. Registration is therefore a
read-modify-write, and the previous implementations had several reliability gaps.
Changes
1. Optimistic concurrency across all backends (no more lost updates)
Concurrent appends no longer clobber each other:
Get+Put→GetValAndRev+UpdateWithRev(CAS onModRevision),Createfor first write.SetContent, now surfaces version conflicts instead of swallowing them.CasMd5optimistic lock.Each backend wraps its native conflict (
ErrCompareFail/ErrBadVersion/ErrNodeExists/nacos publish failure) into a shared
report.ErrMappingCASConflictsentinel via%w.2. Graded retry (was: fixed loop, no backoff)
registerWithRetryretries only CAS conflicts (errors.Is) with exponential backoff + jitter,and returns permanent errors (network/auth) immediately instead of burning the whole retry budget.
原来任何错误都空转重试 10 次且无 sleep,现在按错误类型分级重试。
3. Extract shared logic + fix two hidden bugs
report.MergeServiceAppMapping: whole-element dedup. Fixes thestrings.Containssubstringfalse positive (registering
orderwas wrongly treated as present whenorder-serviceexisted)and the leading-comma bug (
"" + "," + app→",app").report.DecodeServiceAppNames: parse into a set, skipping empty elements.4. Listener cleanup
CacheListener.RemoveKeyListeners(was a silentreturn nilthat leaked listeners).
5. Tests
proves CAS preserves every writer (200 writers / 20 concurrent readers). Passes under
-race.Known limitation (documented in code)
Nacos
CasMd5is an optimistic UPDATE and cannot guard the first INSERT (Nacos has nocreate-if-absent primitive), so the initial concurrent registration of a brand-new interface can
still race. etcd and zookeeper are not affected. Left as a documented limitation; can be revisited
if Nacos exposes a SETNX-style primitive.
Test
go test -race ./metadata/report/... ./metadata/mapping/...
Checklist
develop