[Feature] Add Power-of-Two-Choices Peak-EWMA load balancer (p2c) — O(1) tail-latency-aware policy

**Is your feature request related to a problem?**

brpc has several client-side load balancers (`rr`, `wrr`, `random`, `wr`, `la`, the consistent-hash family, `_dynpart`), but only one of them is latency-aware: `la` (locality-aware). `la` maintains a doubly-buffered **weight tree over all servers** with weights driven by inverse average latency, and samples a single point on the cumulative-weight line.

brpc currently has **no Power-of-Two-Choices (P2C) sampling load balancer** — neither Envoy-style least-request (P2C over active-request-biased weights) nor Finagle-style Peak-EWMA. P2C is the single most widely deployed tail-latency balancer in modern RPC stacks and service meshes (Finagle/Twitter `p2cPeakEwma`, linkerd, Envoy `LEAST_REQUEST`), and it is absent from brpc.

This matters in two regimes that `rr`/`random` ignore and where `la`'s averaging window lags:
- a **single slow/degraded backend** (GC pause, noisy neighbor, cold cache) — should be shed within one sample, not after an averaging window;
- a **heterogeneous fleet** (mixed CPU generations/capacities) — selection should continuously bias toward faster, less-loaded nodes.

**Describe the solution you'd like**

Add a new client-side load balancer, registered as **`p2c`** (alias `p2c_ewma`): **Power-of-Two-Choices with Peak-EWMA latency scoring**.

Algorithm (per request):
1. Sample **two distinct random backends** from the server list (`fast_rand_less_than`, as `rr`/`random` already do).
2. Route to the one with the **lower load score**:
   - `score = ewma_us * (inflight + 1) / max(weight, 1)`
   - `ewma_us` is a **peak-sensitive EWMA of round-trip latency** per backend: an upward latency spike sets the score to the spike value immediately; downward recovery decays with a configurable time constant `TAU` (Finagle default decay ≈ 10s). This is what sheds a stalled backend within one sample.
   - `inflight` is the backend's outstanding-request count (incremented at select, decremented at completion) — the Envoy `LEAST_REQUEST` `(active+1)` term.
   - dividing by the static naming-service `weight` makes it **weighted P2C** (degrades to unweighted when weights are equal).
3. On RPC completion, update the chosen backend's EWMA and decrement its in-flight; failures/timeouts inflate the latency (`max(measured, timeout)`) so a failing node is avoided — mirroring `la`'s error handling.

Properties: **O(1) selection** (two score evaluations regardless of fleet size, vs `la`'s O(log N) tree walk); spike-reactive via the peak term; herding-resistant via two-sample comparison.

**Interface feasibility — no brpc core change required.** Every hook P2C-EWMA needs already exists and is exercised by `la`:
- `src/brpc/load_balancer.h` — `SelectOut::need_feedback` (load_balancer.h:50), `virtual void Feedback(const CallInfo&)` (load_balancer.h:100), and `CallInfo { begin_time_us; server_id; error_code; controller; }` (load_balancer.h:53-61) giving per-call latency, which backend, success/failure, and timeout.
- `src/brpc/controller.cpp:873-876` — the RPC completion path actually invokes the hook:
  ```cpp
  if (need_feedback && c->_lb) {
      const LoadBalancer::CallInfo info =
          { begin_time_us, peer_id, error_code, c };
      c->_lb->Feedback(info);
  }
  ```
  and `controller.cpp:1120` propagates `sel_out.need_feedback` into the call.
- Membership lives in `DoublyBufferedData<Servers, TLS>` (lock-free reads, copy-on-write modify) exactly as `round_robin_load_balancer.cpp`. Mutable per-node load state (`inflight`, `ewma_us`, `stamp_us`) is heap-allocated once per backend and referenced by stable pointer — the same technique `locality_aware_load_balancer` already uses for its per-node atomics.
- brpc's `Socket` has no general app-level in-flight counter, so — like `la` — the LB owns its in-flight counters. Fully feasible.

**Files to add / modify**
- Add `src/brpc/policy/p2c_ewma_load_balancer.{h,cpp}`.
- Modify `src/brpc/global.cpp`: include the header, add a `P2CEwmaLoadBalancer` member to `GlobalExtensions`, and register it next to the existing balancers — `LoadBalancerExtension()->RegisterOrDie("p2c", &g_ext->p2c_ewma_lb);`.
- Add `test/brpc_p2c_ewma_load_balancer_unittest.cpp` (modeled on the `la`/`rr` LB unit tests), registered in all three build systems (`test/CMakeLists.txt`, `test/BUILD.bazel`, `test/Makefile`).
- Add a `p2c` row to `docs/cn/lb.md` / `docs/en/lb.md`.

**Performance / testing plan (multi-protocol, multi-config)**

A cross-protocol benchmark matrix to prove the gain is real on actual brpc-supported backends, not just a synthetic echo server.
- **Protocols:** baidu_std (protobuf RPC, controllable-sleep echo server — clean baseline), **Memcached** (`MemcacheRequest` — tiny ops, very high QPS, exposes selection overhead/herding), **Redis** (`RedisRequest` — single-threaded backend, high load sensitivity), **MySQL** (variable query cost — biggest tail surface), HTTP. Integration backends spawn-or-skip using the existing `brpc_redis_unittest.cpp` `system("which ...")` precedent, with one node acting as the **degraded** backend.
- **Configs:** fleet size N ∈ {4, 16, 64, 256}; homogeneous vs heterogeneous latency; degradation injected (none / 1 node +50ms / 10% of fleet +20ms / transient spike-then-recover); offered load (low / knee / saturating); equal vs 1:2:4 naming-service weights; pooled vs single connection.
- **Functional tests (must pass first):** add/remove/batch, single-server, `ExcludedServers` exclusion, feedback-driven score shift, in-flight inc/dec across select→feedback (incl. error/timeout path), weighted 1:2:4 split, and concurrency under TSan/ASan.
- **Metrics:** p50/p90/p99/p999/max latency; throughput at fixed SLO and saturating; per-node traffic share over time (divert latency after a spike); request-count variance (fairness/herding); selection CPU cost per op as N grows (P2C O(1) vs `la` O(log N)).
- **Headline result to validate:** tail-latency reduction and throughput-at-SLO uplift of `p2c` over `rr`/`random`/`la` on the heterogeneous and degraded configs, with selection cost no worse than `rr`.

**Describe alternatives you've considered**
- **Extend `la`** — different algorithm (global weight tree, averaging window); does not give O(1) sampling or peak spike-reactivity. Keeping both gives users a real choice.
- **Envoy-style least-request only** (`weight / (active+1)`) — a strict subset; P2C-EWMA folds the `(inflight+1)` active-request term *and* the peak-latency term into one score, so it covers least-request as the equal-latency special case.
- **WRR + ORCA load reports (gRPC gRFC A58)** — requires a server-reported load backchannel brpc does not have; out of scope for a client-only change.

**Additional context**

References (primary sources):
- Envoy — Supported load balancers (LEAST_REQUEST = P2C, `weight = lb_weight / (active+1)^bias`, "resistance to herding"): https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/load_balancers.html
- Envoy — LEAST_REQUEST can pick the busier host (P2C rationale): https://github.com/envoyproxy/envoy/issues/14859
- Finagle — Aperture Load Balancers (`p2cPeakEwma`; Peak-EWMA = moving average over RTT highly sensitive to peaks, weighted by outstanding requests): https://twitter.github.io/finagle/guide/ApertureLoadBalancers.html
- Finagle — `Balancers` API (`p2cPeakEwma` factory): https://twitter.github.io/finagle/docs/com/twitter/finagle/loadbalancer/Balancers$.html
- linkerd — "Beyond Round Robin: Load Balancing for Latency" (Peak-EWMA; p99.9 wins over RR/least-loaded): https://linkerd.io/2016/03/16/beyond-round-robin-load-balancing-for-latency/
- gRPC — gRFC A58 Client-Side Weighted Round Robin (ORCA load reports): https://github.com/grpc/proposal/blob/master/A58-client-side-weighted-round-robin-lb-policy.md

I'm happy to implement this end-to-end (balancer + unit tests in all three build systems + docs + the benchmark matrix). Would the maintainers be open to a `p2c` policy, and is a GitHub issue the right place to scope it, or would you prefer a `[DISCUSS]` thread on `dev@brpc.apache.org` first?


cc @zyearn (load-balancer policy owner / author of `la`) — would value your read on whether a `p2c` policy fits brpc.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add Power-of-Two-Choices Peak-EWMA load balancer (p2c) — O(1) tail-latency-aware policy #3340

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Feature] Add Power-of-Two-Choices Peak-EWMA load balancer (p2c) — O(1) tail-latency-aware policy #3340

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions