feat(metadata): Clarify metadata report selection for multi-registry and multi-instance scenarios#3369
feat(metadata): Clarify metadata report selection for multi-registry and multi-instance scenarios#3369XnLemon wants to merge 16 commits into
Conversation
…hread registryId through createInstance
…gistryId instead of silently falling back to wrong registry When a caller provides a specific non-empty registryId that has no registered report, the previous implementation silently fell back to GetMetadataReport() and returned an arbitrary report from a different registry. This undermined the per-registry scoping added in the earlier commits: metadata for registry A could be fetched from registry B's report without any indication. Fix: return nil (with a Warnf log) when a specific registryId is not found. The existing nil-guard in GetMetadataFromMetadataReport (client.go) already surfaces this as an error to the caller, which is the correct behaviour. The empty-string path is unchanged: it still delegates to GetMetadataReport() because callers that pass "" have no registry context and should receive the stable default. Also expose ClearMetadataReportInstances() to allow cross-package test isolation of the package-level instances map.
…of returning only the last one The previous implementation used a lastErr variable that was overwritten on each iteration. If the first report failed and the second succeeded, the error was silently discarded and the caller saw no error despite a partial failure — leaving a stale mapping in the first registry. Fix: collect all errors with errors.Join so the caller receives the full failure picture. The loop continues past individual failures (best-effort fan-out) so a transient error in one registry does not prevent removal from the others. Add a comment explaining the intentional behavioural difference from Map(), which is fail-fast: Map stops at the first report failure, while Remove continues and collects all errors so every reachable registry gets the removal attempt.
Three gaps identified in PR review, all addressed:
1. metadata/client_test.go — rewrite TestGetMetadataFromMetadataReport
- Add sub-test: specific registryId routes to its own report (not default)
- Add sub-test: unknown registryId returns error, not silent wrong-registry result
- Each sub-test now resets instances independently to prevent state leakage
2. registry/servicediscovery/customizer/service_revision_customizer_test.go
- Add TestExportedRevisionMissingRegistryIdYieldsZero: instance with no
RegistryIdKey gets revision "0" and does not borrow another registry's
service list
- Add TestSubscribedRevisionMissingRegistryIdYieldsZero: same for subscribed
- Add t.Cleanup(ClearMetadataReportInstances) + unique registry-id prefixes
to all four tests to prevent cross-test global map pollution
3. registry/servicediscovery/service_instances_changed_listener_impl_test.go
- Add TestListenerUsesRegistryIdToFetchRemoteMetadata: creates a listener
with registryId="remote-reg-test", registers a mock MetadataReport under
that id, triggers OnEvent with a RemoteMetadataStorageType instance, and
asserts via mock.AssertExpectations that the correct per-registry report
was called — end-to-end proof that registryId threads from
NewServiceInstancesChangedListener through to GetMetadataFromMetadataReport
- metadata/report_instance_test.go: translate inline comments and testify assertion messages in TestGetMetadataReportIsDeterministic from Chinese to English, consistent with the rest of the file - registry/servicediscovery/customizer/service_revision_customizer.go: translate Customize doc-comments for both exported and subscribed customizers from Chinese to English - registry/servicediscovery/service_instances_changed_listener_impl_test.go: strip the UTF-8 BOM (EF BB BF) that was introduced at line 1; Go toolchain accepts BOMs but they are not idiomatic and cause spurious diffs in some editors and CI tools
… accurate consequence The previous Warnf said 'revision will be empty', which is wrong on two counts: the actual computed revision is "0" (not empty), and revision "0" causes OnEvent to silently skip the instance entirely, making it permanently invisible to all consumers. Upgrade to Errorf and describe the actual outcome so operators can diagnose misconfigurations immediately. Applies to both exported and subscribed revision customizers.
…nt cross-registry metadata poisoning The cache in GetMetadataInfo was keyed by revision string only. In a multi-registry setup, if two registries happen to produce the same revision value (e.g. identical service sets yielding the same CRC, or test fixtures using literal revision strings), the second registry's listener would get a cache hit and return MetadataInfo fetched via a different registry's report — silently serving wrong endpoints with no error surfaced. Fix: use registryId+":"+revision as the cache key for both Get and Set, so each registry's metadata is cached independently. Update all test helpers and direct metaCache.Set/Delete calls to use the same composite key (with constant.DefaultKey as the registryId for all existing single-registry test listeners).
…alignment The previous commit removed the TODO without fixing the underlying issue. GetMetadataService() still returns a global singleton, so all instances across registries receive the same metadata service URL regardless of which registry they belong to. Restore the TODO with a more detailed explanation so the gap is visible to future contributors. The fix requires per-registry MetadataService support and is out of scope for this PR.
…ice URL alignment" This reverts commit 5d7bcc6.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3369 +/- ##
===========================================
+ Coverage 46.76% 52.64% +5.88%
===========================================
Files 295 492 +197
Lines 17172 37896 +20724
===========================================
+ Hits 8031 19952 +11921
- Misses 8287 16333 +8046
- Partials 854 1611 +757 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
看下没和#3362重复吧 |
Alanxtl
left a comment
There was a problem hiding this comment.
Findings
-
[P1] Standalone
metadata-reportconfigs can no longer be used by named registries.
In PR head, GetMetadataReportByRegistry returnsnilfor any non-empty unknownregistryId. But standalonemetadata-reportconfig is still registered underdefaultin config/metadata_config.go, while registry URLs get ids likenacos/zk. That meansserviceDiscoveryRegistry.metadataReportbecomesnilin service_discovery_registry.go, and remote metadata publish/fetch fails for a common config shape: named registry plus separate metadata-report. Need either preservedefaultas an explicit global fallback or add config/API support to bind standalone metadata-report to registry ids. -
[P2] #3352’s mapping-read problem is still not fixed.
ServiceNameMapping.Get still reads fromGetMetadataReport()only, so in multi-report setups it now deterministically picksdefaultor lexicographically first. That makes the old random behavior stable, but it still does not use the subscribing registry’s report, nor does it fan out consistently withMap/Remove. The issue explicitly calls out “mapping writes all reports, reads one report”; this PR only changesRemove.
on is resolved.
| metadataReports := metadata.GetMetadataReports() | ||
| if len(metadataReports) == 0 { |
There was a problem hiding this comment.
| metadataReports := metadata.GetMetadataReports() | |
| if len(metadataReports) == 0 { | |
| metadataReports := metadata.GetMetadataReports() | |
| if metadataReport == nil || len(metadataReports) == 0 { |
nil不需要处理了吗
There was a problem hiding this comment.
GetMetadataReports() 这个方法返回的是一个切片 里面有make([]report.MetadataReport, len(instances)) 所以不返回nil 顶多可能抛个空切片而已
之前的GetMetadataReport()全局 instances map 是空的话就会返回nil 所以一开始才兜了 给它加上也行
| // ClearMetadataReportInstances resets the package-level instances map. | ||
| // Intended for test isolation only; do not call in production code. | ||
| func ClearMetadataReportInstances() { | ||
| instances = make(map[string]report.MetadataReport) | ||
| } |
There was a problem hiding this comment.
因为新增了跨包测试文件 举个例子 比如 TestListenerUsesRegistryIdToFetchRemoteMetadata 要验证 registryId 从 NewServiceInstancesChangedListener → OnEvent → GetMetadataInfo → GetMetadataFromMetadataReport → 最终调到正确的那个 report 实例。这条链路跨了 3 个 package和 5 个函数 中间每一步都读全局 instances
需要覆盖的话就得加上 不覆盖的话测对应每一层不代表连起来就能成 所以还是加上了(
您怎么看owo (
第一个修了 目前可以正常fallback |
|



Description
Fixes #3352
Background / 背景
The #2534 refactor centralized metadata logic and settled on an application-level metadata path where
MetadataReporthandles app-levelMetadataInfoand service-app mapping. That refactor was correct in intent, but it left the selection semantics ofMetadataReportunder-specified for multi-registry and multi-instance deployments.Concretely, when a process connects to more than one registry (e.g. Nacos + Zookeeper), each registry gets its own
MetadataReportinstance, keyed byregistryIdin theinstancesmap. The pre-existing code had four independent correctness gaps in how it chose which report to use.#2534 重构将 metadata 逻辑集中化,以应用级
MetadataInfo和 service-app mapping 为核心路径。重构方向正确,但在多注册中心和多实例部署场景下,MetadataReport的选择语义存在空白。具体来说,当进程同时连接多个注册中心(如 Nacos + Zookeeper)时,每个注册中心会在
instancesmap 中保有独立的MetadataReport实例。原有代码在选择使用哪个 report 时存在以下四个独立的正确性问题。Problems Fixed / 修复的问题
1.
GetMetadataReport()returned a non-deterministic report / 返回结果不确定Go map iteration order is randomized.
GetMetadataReport()returned whichever entry happened to come first — a different report on every call in some runs, making any downstream behavior non-reproducible.Fix: prefer the
"default"key; fall back to the lexicographically firstregistryIdso the result is always stable.Go map 迭代顺序随机,
GetMetadataReport()每次可能返回不同的 report。修复:优先返回"default"键;不存在时退回到字典序最小的registryId,确保结果稳定。2.
GetMetadataReportByRegistry()silently fell back to the wrong registry's report / 静默返回错误注册中心的 reportWhen a caller passed a specific
registryIdthat was not registered, the function silently fell back toGetMetadataReport()and returned an unrelated registry's report. InGetMetadataFromMetadataReport, this meant remote metadata could be fetched from the wrong backend with no error surfaced.Fix: return
nil+Warnfwhen a specific registryId is not found. The existing nil-guard inGetMetadataFromMetadataReportalready surfaces this as an actionable error.调用方传入未注册的
registryId时,函数静默退回到其他注册中心的 report,且无任何报错。修复:找不到时返回nil+Warnf,由调用方的 nil 检查给出明确错误。3.
ServiceNameMapping.Remove()only returned the last error / 只保留最后一个错误Remove()fanned out to all reports but used alastErraccumulator overwritten on each iteration. If the first report failed and the second succeeded, the error was silently discarded and the caller sawnil, leaving a stale mapping in the first registry.Fix: collect all errors with
errors.Joinso the caller receives the full failure picture.Remove()向所有 report 扇出,但lastErr在每次循环中被覆盖。若第一个 report 失败、第二个成功,错误被静默丢弃。修复:使用errors.Join收集全部错误。4. Revision calculation and remote metadata fetch used cross-registry data / Revision 计算和远端 metadata 拉取使用了跨注册中心数据
The revision customizers called the global
GetMetadataService()which merges services across all registries, so two instances on different registries could compute the same revision even though their actual service sets differed — breaking consumer-side change detection.The listener cache in
GetMetadataInfo()was keyed byrevisionalone. Two registries producing the same revision string would collide, and the second registry's listener would silently receive the first registry's cached metadata.Fix: thread
registryIdthroughcreateInstance(), both revision customizers,GetMetadataFromMetadataReport(),NewServiceInstancesChangedListener(), andGetMetadataInfo(). Change the cache key toregistryId + ":" + revision.Revision customizer 通过全局
GetMetadataService()获取服务列表,合并了所有注册中心的数据,导致不同注册中心的实例可能计算出相同的 revision。监听器的 metadata 缓存仅以revision为键,多注册中心场景下会发生缓存污染。修复:将registryId贯穿传递到相关链路,缓存键改为registryId + ":" + revision。Changes / 改动内容
metadata/report_instance.goGetMetadataReport()deterministic;GetMetadataReportByRegistry()returns nil for unknown id; addClearMetadataReportInstances()for test isolationmetadata/client.goGetMetadataFromMetadataReport()takesregistryId, routes to correct per-registry reportmetadata/mapping/metadata/service_name_mapping.goRemove()collects all errors witherrors.Joinregistry/servicediscovery/service_instances_changed_listener_impl.goGetMetadataInfo()andNewServiceInstancesChangedListener()takeregistryId; cache key scoped toregistryId + ":" + revisionregistry/servicediscovery/service_discovery_registry.gocreateInstance()injectsregistryIdinto instance metadata;SubscribeURLpassesregistryIdto listenerregistry/servicediscovery/customizer/service_revision_customizer.goMetadataInfo; upgrade missing-registryId log toErrorfwith accurate consequence descriptionregistry/servicediscovery/customizer/metadata_service_url_params_customizer.goTests Added / 新增测试
TestGetMetadataReportIsDeterministic— 20 iterations verify stability under Go's randomized map iterationTestGetMetadataReportByRegistry/TestGetMetadataReportByRegistryFallsBackDeterministically— hit, miss, and empty-string pathsTestServiceNameMappingRemoveFansOutToAllReports— both reports called on successTestServiceNameMappingRemoveCollectsAllErrors— both errors visible viaerrors.Isin joined returnTestServiceNameMappingRemoveContinuesAfterPartialFailure— loop does not short-circuit on first failureTestExportedRevisionIsRegistryScoped/TestSubscribedRevisionIsRegistryScoped— different registries produce different revisionsTestExportedRevisionMissingRegistryIdYieldsZero/TestSubscribedRevisionMissingRegistryIdYieldsZero— missingRegistryIdKeyyields revision"0", does not borrow another registry's dataTestGetMetadataFromMetadataReport(rewritten) — specific registryId routes to correct report; unknown registryId returns error, not wrong reportTestListenerUsesRegistryIdToFetchRemoteMetadata— end-to-end: registryId flows fromNewServiceInstancesChangedListenerthroughOnEvent→GetMetadataInfo→GetMetadataFromMetadataReport→ correct mock report called#### TODO / 本 PR 还未做完Multi-instance metadata service URL alignment (metadata_service_url_params_customizer.go):GetMetadataService()still returns a global singleton so all instances across registries receive the same metadata service URL.多实例 metadata service URL 对齐:GetMetadataService()仍返回全局单例,所有注册中心实例获得同一 metadata service URL。Checklist
develop