[FEATURE] Migrate and Enhance Adaptive Service Throttling in dubbo-go#3347
[FEATURE] Migrate and Enhance Adaptive Service Throttling in dubbo-go#3347nagisa-kunhah wants to merge 9 commits into
Conversation
* fix(config): remove legacy protocol timeout fallback * fix(config): avoid default timeout allocation * fix(config): preserve consumer timeout in reference config * test(config): satisfy testifylint in reference timeout test * fix(config): make protocol timeout default explicit * fix(config): centralize consumer timeout default * fix(config): keep consumer timeout default in global
* feat(test):Add TestGetAddressWithProtocolPrefixKeepsContext and find the error when user bring context path * fix(apollo):Fix the test func(getAddressWithProtocolPrefix) fix(context_path):fix getAddressWithProtolPrefix didn't handle context path * refator(config_center):cleanup-redundant-test * fix(test):删除不应该存在的文件 * feat():恢复测试并添加多种case * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix:修复url.Path = /问题并添加边缘测试;修复原来的不符合gofmt格式以通过CI --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…apache#3345) * fix(logger): sync dubbo-go logger facade in LoggerConfig.Init() * fix(logger): sync dubbo-go logger facade in config_loader init() * fix(logger): sync dubbo-go logger facade in initGlobalLogger() * test(logger): verify dubbo-go facade is synced after logger initialization * style(logger): fix import formatting
) * refactor(logger): standardize logger format in graceful_shutdown - Add [GracefulShutdown] prefix to all logger calls - Remove decoration symbols (---) from log messages - Change key format from "error: %v" / "--- %v" to "err=%v" - Lowercase first letter of all log message bodies * refactor(logger): standardize logger format in internal, metadata, metrics, otel - Add module prefixes: [Internal], [Metadata], [MetadataRPC], [MetadataReport][Etcd/Nacos/Zookeeper], [Metrics], [Metrics][Probe/Prometheus/RPC], [OTel][Trace] - Unify key format: "error: %v" / ": %v" / "err: %s" → "err=%v", "url: %s" → "url=%s" - Lowercase first letter of all log message bodies - Fix bug: logger.Error with non-string or extra args → logger.Errorf (listener.go, server.go, exporter.go) - Fix bug: logger.Errorf with no format args → logger.Error (metadata_service.go) - Fix bug: logger.Infof/Debugf with no format args → logger.Info/Debug (config.go, report.go) - Fix: err.Error() + %s → err + %v (nacos/report.go x2) * refactor(logger): standardize logger format in protocol directory - Unify prefixes as [Protocol], [Dubbo], [Dubbo][Codec/Hessian2/Impl/Exporter/Invoker], [Dubbo3], [GRPC], [GRPC][Client/Server/Exporter/Invoker], [Jsonrpc], [Jsonrpc][Server/Exporter/Invoker], [ProtocolWrapper], [Rest], [Rest][Config/Exporter/Server], [Triple], [Triple][Client/Server/Exporter/Invoker/CORS/Codec/Handler/Negotiation/Protocol/Health/OpenAPI] - Unify key format: "error: %v" / ": %v" / "error:{%v}" → "err=%v", "err: %v" → "err=%v" - Lowercase first letter of all log message bodies - Remove %+v format for non-Debug levels - Fix bug: logger.Error with error type → logger.Errorf (dubbo_codec.go, dubbo_protocol.go) - Fix bug: logger.Error/Info with extra args → logger.Errorf/Infof (rpc_status.go, jsonrpc/server.go) - Fix bug: logger.Infof/Debugf without format args → logger.Info/Debug (multiple files) - Fix bug: logger.Debug without format args → logger.Debugf (openapi/service.go) - Fix: err.Error() + %s → err + %v (dubbo3_protocol.go) * refactor(logger):standardize logger prefixes in metadata, metadata-report, Dubbo, and Triple protocol modules
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #3347 +/- ##
===========================================
+ Coverage 46.76% 50.80% +4.03%
===========================================
Files 295 500 +205
Lines 17172 39102 +21930
===========================================
+ Hits 8031 19866 +11835
- Misses 8287 17634 +9347
- Partials 854 1602 +748 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
|
@Alanxtl hello,也麻烦看下,pr description里有说到,我测rtt_shrink这个case的时候发现,当延迟升高的时候(20ms升高到100ms,再升高到500ms),limiter对inflight的限制似乎并没有明显的减少,而是维持在原来的水平,不清楚这个是否符合原来的预期?测试用的代码放在presee_test/adaptive_service/rtt_shrink下。 |
这更像是暴露了当前 HillClimbing 实现的“不敏感/参数问题”,不太应该当成完全符合预期。 代码原因大概在这里:
这个可能是已知限制。预期上 adaptive concurrency 应该在 RTT 明显恶化、吞吐不再提升时收缩;但当前算法受历史 best metrics、硬编码阈值、update interval 和 shrink 幅度影响,在 另外uber的那个实现太过于复杂了,参考一下就行,不用实现 |



Description
Fixes #3336
Progress:
Pressure tests
Test scenario
samples/adaptive_service/protect_provider/{server,client}: verifies provider protection under high client concurrency by tracking rejects and server-side max active requests.samples/adaptive_service/rtt_shrink/{server,client}: verifies limiter behavior across fast/medium/slow RTT stages and records limitation/remaining/inflight changes.samples/adaptive_service/p2c_healthy/{server,client}: verifies multi-provider adaptive P2C routing by comparing per-provider hit ratio and remaining capacity.Test results
✅ protect_provider:
failed=0, reject rate was about 91%. The provider business handler only processed about 21,020 requests, and rejected requests did not enter the handler.fast:20ms:30s, medium:100ms:20s, slow:500ms:40s, 90s duration.failed=0throughout the run. The limiter was found and reported continuously by the provider stats endpoint. During the fast phase,limiter_limitationgrew from about 55 to a peak of about 123. However, after RTT increased to the 500ms slow phase,limiter_limitationstayed around 122 and did not drop meaningfully below the fast-phase peak. Rejections increased continuously under the high offered load, reaching about 888,000 total rejects by the end of the run.✅ p2c_healthy:
fast=20ms,medium=100ms,slow=300ms; 200 client concurrency; 90s duration; adaptive cluster + P2C load balancing enabled.fast=44%,medium=54%, andslow=1.8%, withfailed=0. The slow provider's interval traffic dropped to 0 later in the test. The medium provider was repeatedly selected as the healthiest node because its reportedremainingcapacity was higher; it reachedlimiter_limitation=500, while the fast provider stayed around 200.Checklist
develop