Skip to content

feat(metadata): add concurrency safety for application-level metadata state (#3353)#3367

Open
jieguo-coder wants to merge 9 commits into
apache:mainfrom
jieguo-coder:fix/issue-3353-metadata-concurrency
Open

feat(metadata): add concurrency safety for application-level metadata state (#3353)#3367
jieguo-coder wants to merge 9 commits into
apache:mainfrom
jieguo-coder:fix/issue-3353-metadata-concurrency

Conversation

@jieguo-coder
Copy link
Copy Markdown

Description

Summary

Fixes concurrency safety issues in the application-level metadata path by adding proper synchronization protection for multiple global maps and shared states.

Background

After the #2534 metadata refactor, application-level metadata is maintained through shared states such as local MetadataInfo, metadata report instances, and MetadataService. Currently, multiple core states are maps without clear synchronization protection. Data races, stale reads, or fatal error: concurrent map writes might occur when service registration, unregistration, subscription, unsubscription, instance changes, and metadata service queries happen concurrently.

Related Issue: Fixes #3353

Changes

  1. Internal Lock Protection for MetadataInfo (metadata/info/metadata_info.go)
  • Added a sync.RWMutex field to the MetadataInfo struct (with json:"-" and hessian:"-" tags to skip serialization) to protect the three internal maps: Services, exportedServiceURLs, and subscribedServiceURLs.
  • AddService / RemoveService / AddSubscribeURL / RemoveSubscribeURL now acquire the write lock.
  • GetExportedServiceURLs / GetSubscribedURLs now acquire the read lock.
  • Added a new GetServices() method that returns a snapshot of Services under a read lock for safe external iteration.
  1. Global registryMetadataInfo Lock Protection (metadata/metadata.go)

Added a sync.RWMutex for the global map[string]*MetadataInfo.

The get-or-create phase in AddService / AddSubscribeURL is now executed atomically under a write lock to prevent race conditions.

GetMetadataInfo is protected by a read lock.

  1. Global instances Lock Protection (metadata/report_instance.go)
  • Added a sync.RWMutex for the global map[string]MetadataReport.
  • Extracted an internal helper function getMetadataReportUnsafe to avoid deadlocks caused by the un-reentrant nature of Go's RWMutex during the fallback path of GetMetadataReportByRegistry.
  1. Concurrency Safety for Listeners (registry/.../service_instances_changed_listener_impl.go)

  2. Added missing mutex protection for AddListenerAndNotify and RemoveListener to prevent concurrent reads/writes on listeners and serviceUrls against OnEvent.

  3. Safe Access to Services Field

Replaced direct accesses to metadataInfo.Services in OnEvent and convertV2 with the new safe method GetServices().

Test Plan

  • go vet ./metadata/... ./registry/... : Zero warnings.
  • golangci-lint run ./metadata/... ./registry/servicediscovery/... : Zero issues.
  • go test ./metadata/... ./registry/... : All 21 packages passed.
  • go build ./... : Full project compilation passed successfully.

…rialization (apache#3353)

Add sync.RWMutex to MetadataInfo struct with json:"-" / hessian:"-"
tags to skip serialization. All mutating methods (AddService, RemoveService,
AddSubscribeURL, RemoveSubscribeURL) acquire the write lock, and read
methods (GetExportedServiceURLs, GetSubscribedURLs, GetServices) acquire
the read lock. The new GetServices method returns a snapshot copy.

Signed-off-by: jieguo-coder <1193249232@qq.com>
…apache#3353)

Add sync.RWMutex to protect registryMetadataInfo in metadata.go and
instances in report_instance.go. Extract getMetadataReportUnsafe helper
to avoid reentrant RLock deadlock in GetMetadataReportByRegistry fallback.
Fix nacos report_test to use pointer to MetadataInfo for json.Marshal.

Signed-off-by: jieguo-coder <1193249232@qq.com>
…rnal calls (apache#3353)

Add mutex locking to AddListenerAndNotify and RemoveListener to protect
shared fields listeners and serviceUrls. Replace direct access to
MetadataInfo.Services with safe GetServices method in OnEvent and
convertV2 to prevent unprotected map reads.

Signed-off-by: jieguo-coder <1193249232@qq.com>
@Alanxtl
Copy link
Copy Markdown
Contributor

Alanxtl commented Jun 3, 2026

our project use import-formatter to format import blocks, that's the reason why ur CI fails. For you, u should

  1. run go install github.com/dubbogo/tools/cmd/imports-formatter@latest
  2. cd to the root dir of dubbo-go
  3. run imports-formatter

@Alanxtl Alanxtl added ✏️ Feature 3.3.2 version 3.3.2 labels Jun 3, 2026
Signed-off-by: jieguo-coder <1193249232@qq.com>
Signed-off-by: jieguo-coder <1193249232@qq.com>
@jieguo-coder
Copy link
Copy Markdown
Author

our project use import-formatter to format import blocks, that's the reason why ur CI fails. For you, u should

  1. run go install github.com/dubbogo/tools/cmd/imports-formatter@latest
  2. cd to the root dir of dubbo-go
  3. run imports-formatter

Thanks for the guidance! @Alanxtl
I have formatted the import blocks using and pushed the updates. The CI should be happy now. 😊

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 98.07692% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 52.46%. Comparing base (e6e14fd) to head (a7bad03).

Files with missing lines Patch % Lines
metadata/metadata_service.go 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3367      +/-   ##
==========================================
+ Coverage   52.40%   52.46%   +0.06%     
==========================================
  Files         492      492              
  Lines       37785    37832      +47     
==========================================
+ Hits        19800    19849      +49     
+ Misses      16380    16379       -1     
+ Partials     1605     1604       -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines 94 to 105
func (mts *DefaultMetadataService) GetMetadataInfo(revision string) (*info.MetadataInfo, error) {
if revision == "" {
return nil, nil
}
for _, metadataInfo := range mts.metadataMap {
if metadataInfo.Revision == revision {
return metadataInfo, nil
}
}
logger.Warnf("metadata not found for revision: %s", revision)
return nil, nil
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DefaultMetadataService still races on registryMetadataInfo**
metadata_service.go:98, L110, L128

PR 给 registryMetadataInfo 增加了 registryMetadataLock,但 metadataService 仍持有同一个 map,并在 GetMetadataInfo / GetExportedServiceURLs / GetSubscribedURLs 里无锁遍历。注册或订阅并发写入 map 时,metadata service 查询仍会 race,正好是 issue #3353 要修的场景。

我用临时 race 测试复现了:
metadata.AddService()metadata.go:50 写 map,同时 DefaultMetadataService.GetExportedServiceURLs()metadata_service.go:110 迭代 map,race detector 报 WARNING: DATA RACE

修法建议:不要让 DefaultMetadataService 直接遍历裸 map。可以加受锁保护的 snapshot helper,或者让 DefaultMetadataService 共享同一把锁并在遍历期间持 RLock

Comment thread metadata/metadata.go
Comment on lines 60 to 71
func AddSubscribeURL(registryId string, url *common.URL) {
registryMetadataLock.Lock()
if _, exist := registryMetadataInfo[registryId]; !exist {
registryMetadataInfo[registryId] = info.NewMetadataInfo(
url.GetParam(constant.ApplicationKey, ""),
url.GetParam(constant.ApplicationTagKey, ""),
)
}
registryMetadataLock.Unlock()

registryMetadataInfo[registryId].AddSubscribeURL(url)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AddService / AddSubscribeURL 解锁后又无锁读取全局 map
metadata.go:55-L57, L68-L70

这两个函数只把 get-or-create 包在写锁里,但随后解锁,再执行 registryMetadataInfo[registryId].AddService(url) / AddSubscribeURL(url)。这仍然是对全局 map 的无锁读;只要另一个 goroutine 正在为其他 registryId 插入 map,就可能发生读写 race。

修法建议:在锁内取出 metadataInfo := registryMetadataInfo[registryId],解锁后只操作这个指针,不再访问全局 map。

Signed-off-by: jieguo-coder <1193249232@qq.com>
Signed-off-by: jieguo-coder <1193249232@qq.com>
@jieguo-coder jieguo-coder force-pushed the fix/issue-3353-metadata-concurrency branch from bc20d95 to e65f694 Compare June 4, 2026 09:13
Signed-off-by: jieguo-coder <1193249232@qq.com>
func (lstn *ServiceInstancesChangedListenerImpl) RemoveListener(serviceKey string) {
lstn.mutex.Lock()
delete(lstn.listeners, serviceKey)
lstn.mutex.Unlock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为啥不用defer

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds synchronization to the application-level metadata path to prevent data races and concurrent map write panics introduced after the metadata refactor (#2534). It primarily protects shared global maps and metadata state that are accessed concurrently during registration/subscription flows and service discovery events.

Changes:

  • Add sync.RWMutex protection to MetadataInfo internals and introduce GetServices() for safe external iteration.
  • Protect global metadata registries (registryMetadataInfo, metadata report instances) with sync.RWMutex.
  • Add missing listener-map locking in service discovery’s instance-changed listener and update call sites to use GetServices().

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
registry/servicediscovery/service_instances_changed_listener_impl.go Locks listener mutation paths and switches service iteration to MetadataInfo.GetServices() to avoid unsafe map iteration.
registry/servicediscovery/service_instances_changed_listener_impl_test.go Adds test coverage for listener removal behavior.
metadata/report/nacos/report_test.go Updates test expectations for MetadataInfo pointer usage.
metadata/report_instance.go Adds RWMutex protection around global metadata report instances map.
metadata/metadata.go Adds RWMutex protection around global registryMetadataInfo map and makes get-or-create atomic.
metadata/metadata_test.go Adds a concurrent access smoke test for Add/Read operations on global metadata.
metadata/metadata_service.go Adds read-locking while iterating metadata map and uses GetServices() for V2 conversion.
metadata/metadata_service_test.go Adds concurrent read-access test coverage for DefaultMetadataService.
metadata/info/metadata_info.go Adds per-MetadataInfo RWMutex, locks map accessors/mutators, and introduces GetServices() snapshot method.
metadata/info/metadata_info_test.go Adds tests validating GetServices() returns a snapshot copy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread metadata/info/metadata_info.go Outdated
Comment on lines +185 to +195
// GetServices returns a copy of the Services map for safe iteration by external callers.
func (info *MetadataInfo) GetServices() map[string]*ServiceInfo {
info.mu.RLock()
defer info.mu.RUnlock()

cp := make(map[string]*ServiceInfo, len(info.Services))
for k, v := range info.Services {
cp[k] = v
}
return cp
}
Comment on lines +52 to +54
instancesMu.Lock()
instances[registryId] = &DelegateMetadataReport{instance: fac.CreateMetadataReport(url)}
instancesMu.Unlock()
…ce and reduce lock granularity in report creation

Signed-off-by: jieguo-coder <1193249232@qq.com>
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 7, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add concurrency safety for application-level metadata state / 为应用级 metadata 全局状态补充并发安全

4 participants