Is your feature request related to a problem?
brpc has several client-side load balancers (rr, wrr, random, wr, la, the consistent-hash family, _dynpart), but only one of them is latency-aware: la (locality-aware). la maintains a doubly-buffered weight tree over all servers with weights driven by inverse average latency, and samples a single point on the cumulative-weight line.
brpc currently has no Power-of-Two-Choices (P2C) sampling load balancer — neither Envoy-style least-request (P2C over active-request-biased weights) nor Finagle-style Peak-EWMA. P2C is the single most widely deployed tail-latency balancer in modern RPC stacks and service meshes (Finagle/Twitter p2cPeakEwma, linkerd, Envoy LEAST_REQUEST), and it is absent from brpc.
This matters in two regimes that rr/random ignore and where la's averaging window lags:
- a single slow/degraded backend (GC pause, noisy neighbor, cold cache) — should be shed within one sample, not after an averaging window;
- a heterogeneous fleet (mixed CPU generations/capacities) — selection should continuously bias toward faster, less-loaded nodes.
Describe the solution you'd like
Add a new client-side load balancer, registered as p2c (alias p2c_ewma): Power-of-Two-Choices with Peak-EWMA latency scoring.
Algorithm (per request):
- Sample two distinct random backends from the server list (
fast_rand_less_than, as rr/random already do).
- Route to the one with the lower load score:
score = ewma_us * (inflight + 1) / max(weight, 1)
ewma_us is a peak-sensitive EWMA of round-trip latency per backend: an upward latency spike sets the score to the spike value immediately; downward recovery decays with a configurable time constant TAU (Finagle default decay ≈ 10s). This is what sheds a stalled backend within one sample.
inflight is the backend's outstanding-request count (incremented at select, decremented at completion) — the Envoy LEAST_REQUEST (active+1) term.
- dividing by the static naming-service
weight makes it weighted P2C (degrades to unweighted when weights are equal).
- On RPC completion, update the chosen backend's EWMA and decrement its in-flight; failures/timeouts inflate the latency (
max(measured, timeout)) so a failing node is avoided — mirroring la's error handling.
Properties: O(1) selection (two score evaluations regardless of fleet size, vs la's O(log N) tree walk); spike-reactive via the peak term; herding-resistant via two-sample comparison.
Interface feasibility — no brpc core change required. Every hook P2C-EWMA needs already exists and is exercised by la:
src/brpc/load_balancer.h — SelectOut::need_feedback (load_balancer.h:50), virtual void Feedback(const CallInfo&) (load_balancer.h:100), and CallInfo { begin_time_us; server_id; error_code; controller; } (load_balancer.h:53-61) giving per-call latency, which backend, success/failure, and timeout.
src/brpc/controller.cpp:873-876 — the RPC completion path actually invokes the hook:
if (need_feedback && c->_lb) {
const LoadBalancer::CallInfo info =
{ begin_time_us, peer_id, error_code, c };
c->_lb->Feedback(info);
}
and controller.cpp:1120 propagates sel_out.need_feedback into the call.
- Membership lives in
DoublyBufferedData<Servers, TLS> (lock-free reads, copy-on-write modify) exactly as round_robin_load_balancer.cpp. Mutable per-node load state (inflight, ewma_us, stamp_us) is heap-allocated once per backend and referenced by stable pointer — the same technique locality_aware_load_balancer already uses for its per-node atomics.
- brpc's
Socket has no general app-level in-flight counter, so — like la — the LB owns its in-flight counters. Fully feasible.
Files to add / modify
- Add
src/brpc/policy/p2c_ewma_load_balancer.{h,cpp}.
- Modify
src/brpc/global.cpp: include the header, add a P2CEwmaLoadBalancer member to GlobalExtensions, and register it next to the existing balancers — LoadBalancerExtension()->RegisterOrDie("p2c", &g_ext->p2c_ewma_lb);.
- Add
test/brpc_p2c_ewma_load_balancer_unittest.cpp (modeled on the la/rr LB unit tests), registered in all three build systems (test/CMakeLists.txt, test/BUILD.bazel, test/Makefile).
- Add a
p2c row to docs/cn/lb.md / docs/en/lb.md.
Performance / testing plan (multi-protocol, multi-config)
A cross-protocol benchmark matrix to prove the gain is real on actual brpc-supported backends, not just a synthetic echo server.
- Protocols: baidu_std (protobuf RPC, controllable-sleep echo server — clean baseline), Memcached (
MemcacheRequest — tiny ops, very high QPS, exposes selection overhead/herding), Redis (RedisRequest — single-threaded backend, high load sensitivity), MySQL (variable query cost — biggest tail surface), HTTP. Integration backends spawn-or-skip using the existing brpc_redis_unittest.cpp system("which ...") precedent, with one node acting as the degraded backend.
- Configs: fleet size N ∈ {4, 16, 64, 256}; homogeneous vs heterogeneous latency; degradation injected (none / 1 node +50ms / 10% of fleet +20ms / transient spike-then-recover); offered load (low / knee / saturating); equal vs 1:2:4 naming-service weights; pooled vs single connection.
- Functional tests (must pass first): add/remove/batch, single-server,
ExcludedServers exclusion, feedback-driven score shift, in-flight inc/dec across select→feedback (incl. error/timeout path), weighted 1:2:4 split, and concurrency under TSan/ASan.
- Metrics: p50/p90/p99/p999/max latency; throughput at fixed SLO and saturating; per-node traffic share over time (divert latency after a spike); request-count variance (fairness/herding); selection CPU cost per op as N grows (P2C O(1) vs
la O(log N)).
- Headline result to validate: tail-latency reduction and throughput-at-SLO uplift of
p2c over rr/random/la on the heterogeneous and degraded configs, with selection cost no worse than rr.
Describe alternatives you've considered
- Extend
la — different algorithm (global weight tree, averaging window); does not give O(1) sampling or peak spike-reactivity. Keeping both gives users a real choice.
- Envoy-style least-request only (
weight / (active+1)) — a strict subset; P2C-EWMA folds the (inflight+1) active-request term and the peak-latency term into one score, so it covers least-request as the equal-latency special case.
- WRR + ORCA load reports (gRPC gRFC A58) — requires a server-reported load backchannel brpc does not have; out of scope for a client-only change.
Additional context
References (primary sources):
I'm happy to implement this end-to-end (balancer + unit tests in all three build systems + docs + the benchmark matrix). Would the maintainers be open to a p2c policy, and is a GitHub issue the right place to scope it, or would you prefer a [DISCUSS] thread on dev@brpc.apache.org first?
cc @zyearn (load-balancer policy owner / author of la) — would value your read on whether a p2c policy fits brpc.
Is your feature request related to a problem?
brpc has several client-side load balancers (
rr,wrr,random,wr,la, the consistent-hash family,_dynpart), but only one of them is latency-aware:la(locality-aware).lamaintains a doubly-buffered weight tree over all servers with weights driven by inverse average latency, and samples a single point on the cumulative-weight line.brpc currently has no Power-of-Two-Choices (P2C) sampling load balancer — neither Envoy-style least-request (P2C over active-request-biased weights) nor Finagle-style Peak-EWMA. P2C is the single most widely deployed tail-latency balancer in modern RPC stacks and service meshes (Finagle/Twitter
p2cPeakEwma, linkerd, EnvoyLEAST_REQUEST), and it is absent from brpc.This matters in two regimes that
rr/randomignore and wherela's averaging window lags:Describe the solution you'd like
Add a new client-side load balancer, registered as
p2c(aliasp2c_ewma): Power-of-Two-Choices with Peak-EWMA latency scoring.Algorithm (per request):
fast_rand_less_than, asrr/randomalready do).score = ewma_us * (inflight + 1) / max(weight, 1)ewma_usis a peak-sensitive EWMA of round-trip latency per backend: an upward latency spike sets the score to the spike value immediately; downward recovery decays with a configurable time constantTAU(Finagle default decay ≈ 10s). This is what sheds a stalled backend within one sample.inflightis the backend's outstanding-request count (incremented at select, decremented at completion) — the EnvoyLEAST_REQUEST(active+1)term.weightmakes it weighted P2C (degrades to unweighted when weights are equal).max(measured, timeout)) so a failing node is avoided — mirroringla's error handling.Properties: O(1) selection (two score evaluations regardless of fleet size, vs
la's O(log N) tree walk); spike-reactive via the peak term; herding-resistant via two-sample comparison.Interface feasibility — no brpc core change required. Every hook P2C-EWMA needs already exists and is exercised by
la:src/brpc/load_balancer.h—SelectOut::need_feedback(load_balancer.h:50),virtual void Feedback(const CallInfo&)(load_balancer.h:100), andCallInfo { begin_time_us; server_id; error_code; controller; }(load_balancer.h:53-61) giving per-call latency, which backend, success/failure, and timeout.src/brpc/controller.cpp:873-876— the RPC completion path actually invokes the hook:controller.cpp:1120propagatessel_out.need_feedbackinto the call.DoublyBufferedData<Servers, TLS>(lock-free reads, copy-on-write modify) exactly asround_robin_load_balancer.cpp. Mutable per-node load state (inflight,ewma_us,stamp_us) is heap-allocated once per backend and referenced by stable pointer — the same techniquelocality_aware_load_balanceralready uses for its per-node atomics.Sockethas no general app-level in-flight counter, so — likela— the LB owns its in-flight counters. Fully feasible.Files to add / modify
src/brpc/policy/p2c_ewma_load_balancer.{h,cpp}.src/brpc/global.cpp: include the header, add aP2CEwmaLoadBalancermember toGlobalExtensions, and register it next to the existing balancers —LoadBalancerExtension()->RegisterOrDie("p2c", &g_ext->p2c_ewma_lb);.test/brpc_p2c_ewma_load_balancer_unittest.cpp(modeled on thela/rrLB unit tests), registered in all three build systems (test/CMakeLists.txt,test/BUILD.bazel,test/Makefile).p2crow todocs/cn/lb.md/docs/en/lb.md.Performance / testing plan (multi-protocol, multi-config)
A cross-protocol benchmark matrix to prove the gain is real on actual brpc-supported backends, not just a synthetic echo server.
MemcacheRequest— tiny ops, very high QPS, exposes selection overhead/herding), Redis (RedisRequest— single-threaded backend, high load sensitivity), MySQL (variable query cost — biggest tail surface), HTTP. Integration backends spawn-or-skip using the existingbrpc_redis_unittest.cppsystem("which ...")precedent, with one node acting as the degraded backend.ExcludedServersexclusion, feedback-driven score shift, in-flight inc/dec across select→feedback (incl. error/timeout path), weighted 1:2:4 split, and concurrency under TSan/ASan.laO(log N)).p2coverrr/random/laon the heterogeneous and degraded configs, with selection cost no worse thanrr.Describe alternatives you've considered
la— different algorithm (global weight tree, averaging window); does not give O(1) sampling or peak spike-reactivity. Keeping both gives users a real choice.weight / (active+1)) — a strict subset; P2C-EWMA folds the(inflight+1)active-request term and the peak-latency term into one score, so it covers least-request as the equal-latency special case.Additional context
References (primary sources):
weight = lb_weight / (active+1)^bias, "resistance to herding"): https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/load_balancers.htmlp2cPeakEwma; Peak-EWMA = moving average over RTT highly sensitive to peaks, weighted by outstanding requests): https://twitter.github.io/finagle/guide/ApertureLoadBalancers.htmlBalancersAPI (p2cPeakEwmafactory): https://twitter.github.io/finagle/docs/com/twitter/finagle/loadbalancer/Balancers$.htmlI'm happy to implement this end-to-end (balancer + unit tests in all three build systems + docs + the benchmark matrix). Would the maintainers be open to a
p2cpolicy, and is a GitHub issue the right place to scope it, or would you prefer a[DISCUSS]thread ondev@brpc.apache.orgfirst?cc @zyearn (load-balancer policy owner / author of
la) — would value your read on whether ap2cpolicy fits brpc.