Ubshm transport dev#9
Open
zchuango wants to merge 60 commits into
Open
Conversation
After PR apache#2949 changed Agent::combiner from a raw pointer to a weak_ptr, MultiDimensionTest.shared started to fail under ASAN with a heap-use-after-free in clear_all_agents() called from ~AgentCombiner. Root cause ---------- std::shared_ptr makes its weak_ptr expire the instant use_count drops to 0, while ~AgentCombiner only starts running after that. The two events therefore form a race window between two threads: * thread A: last shared_ptr<AgentCombiner> released use_count -> 0 -> weak_ptr expired ~AgentCombiner -> clear_all_agents() takes _lock and starts walking _agents * thread B: thread exits, _destroy_tls_blocks() deletes ThreadBlock, each ~Agent calls combiner.lock() which now returns null (weak_ptr already expired), so commit_and_erase() is skipped and the Agent is freed while still linked into A's _agents list * thread A: dereferences a freed LinkNode in clear_all_agents() -> heap-use-after-free Fix --- Don't traverse _agents from ~AgentCombiner. Letting _agents go out of scope is safe because: * butil::LinkedList and butil::LinkNode have trivial destructors and never traverse on destruction, so tearing down _agents does not dereference any agent node. * Surviving Agents simply observe combiner.expired() == true in ~Agent and skip commit_and_erase, leaving prev_/next_ dangling but never read. * If the freed agent_id is later reused by a new combiner and the same TLS slot is taken, get_or_create_tls_agent calls Agent::reset and Append's it into the new combiner's _agents. LinkNode::InsertBefore only writes prev_/next_ (it never reads their stale values), so the dangling pointers are safely overwritten. * Agent::element is destroyed together with its ThreadBlock, so any non-POD resource it holds is still released; if the agent slot is reused, Agent::reset rewrites the element value before it is observed again. The body of clear_all_agents() is kept (commented out) for reference, together with a NOTE explaining why it must not be re-enabled from ~AgentCombiner.
…che#3276) * fix: add acquire fence in bthread_join for ARM memory visibility Ensure memory visibility on ARM architecture after thread join. * use the fence unconditionally Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
On Linux, `pthread_attr_getstack(3)` returns the lowest address of the
stack, but brpc treats `StackStorage::bottom` as the highest address
(see `bthread/stack.h`). `TaskGroup::init` was registering the main
task's stack with the lowest address as `bottom`, so every per-jump
`BTHREAD_SCOPED_ASAN_FIBER_SWITCHER` told ASan a stack range
`stacksize` bytes below the real one. The first
`__asan_handle_no_return` (e.g. the C++ unwinder fired by
`throw bthread::ExitException` inside `bthread_exit`) then prints
WARNING: ASan is ignoring requested __asan_handle_no_return:
stack type: default top: ...; bottom ...; size: 0x62bdb000 (~1.5 GiB)
False positive error reports may follow
Fix `PthreadAttrGetStack` to translate the lowest address to the
highest address before returning, matching `StackStorage::bottom`.
ASan-only path; non-ASan builds are unaffected.
Otherwise there might segfault due to the race below:
```txt
Socket::OnInputEvent() |
`-- ProcessEvent (bthread) |
|
[ bthread queueed ] | QP error -> SetFailed -> HC -> WaitAndReset()
| Reset() -> _sbuf.clear()
| CheckHealth() -> Revive()
|
| Socket is now Addressable!
RdmaEndpoint:PollCq() |
Socket::Address() OK! |
RdmaEndpoint:HandleCompletion()
_sbuf[_sq_sent++].clear() <= BOOM! CQ is not drained but _sbuf is cleared.
```
Another possible fix is to add a _generation_ field in RdmaEndpoint, such that:
- each RdmaEndpoint::Reset() will advance the _generation_ by 1;
- the RdmaEndpoint::PollCq(m, orig_gen) will need to compare the _generation_.
But it will contaminate existing interface, and we need to drain CQ anyway.
Signed-off-by: David Lee <live4thee@gmail.com>
The previous go-like RWLock bumps the reader count first and rolls it back only on the full success path. The state is not reversible on a partial failure, so a read timeout while a writer holds the lock leaves a dangling reader credit and permanently blocks future writers (issue apache#3051). For the same reason the old implementation could not offer correct try_/timed_ APIs at all. Co-authored-by: hairet
…#3296) Socket::SSLHandshake polled the fd via unbounded bthread_fd_wait, so a peer that completed the TCP handshake but never sent a TLS Hello (e.g. server not actually configured for SSL) parked the handshake bthread forever. That bthread holds a Socket reference through the in-flight WriteRequest, so the underlying fd was never closed and the connection stayed ESTABLISHED until OS keepalive eventually broke it. Channel destruction does not help: ~Channel only removes the SocketMap ref; SetFailed/OnFailed wake _epollout_butex but not the per-fd butex used by bthread_fd_wait, and the fd close path runs only after the Socket's ref count drops to zero, which the stuck bthread prevents. With a periodic retry, ESTABLISHED sockets accumulate without bound. Switch the two bthread_fd_wait calls in SSLHandshake to bthread_fd_timedwait, with the deadline derived from a new gflag ssl_handshake_timeout_ms (default 5000ms; <=0 keeps the old behavior). On timeout the existing failure path runs: CheckConnected returns -1 -> AfterAppConnected calls SetFailed + ReleaseAllFailedWriteRequests, and the fd_guard in CheckConnectedAndKeepWrite closes the fd at scope exit. The server-side handshake invoked from DoRead benefits from the same bound. Co-authored-by: Ye Rixu <yerixu@expontech.com>
|
你好,请问你是HW的吗?是否可以认识下 |
…e#3308) * fix(http): apply auth check to builtin requests on public port Only skip auth for builtin HTTP requests received from internal_port. Builtin requests coming from the server listening port now follow the same auth flow as other public HTTP requests. Also add regression coverage for builtin HTTP auth behavior on both the public listener and internal_port. * Fix http_rpc_protocol unittest failed * Fix builtin_auth_policy_on_public_and_internal_port test case not stable * Fix builtin_auth_policy_on_public_and_internal_port asan run fail
…crash (apache#3297) libunwind ships its `src/unwind/*.c` (the GCC `_Unwind_*` ABI compatibility layer) as exported symbols of `libexternal_S~libunwind.so`. At runtime the dynamic loader resolves `_Unwind_*` lookups (from `pthread_exit`, libstdc++'s `__gxx_personality_v0`, etc.) to libunwind's DWARF-based implementation instead of `libgcc_s.so.1`, hitting an uninitialized internal context and crashing on the no-return cleanup chain triggered by pthread_exit / C++ exception unwinding -- e.g. it makes BthreadTest.bthread_exit segfault deterministically when `--define=with_bthread_tracer=true` is on. This is purely an ELF runtime symbol-resolution-order issue and reproduces identically on GCC and Clang, since both default to `libstdc++ + libgcc_s` on Linux.
Opt rdma flags
…che#3314) - Fix get_fuzz_socket() to only set initialized=true when both Socket::Create and Socket::Address succeed, allowing retry on failure - Add NULL socket checks in all 9 fuzz tests that use get_fuzz_socket() to prevent null pointer dereference when socket creation fails Fixes apache#3114
update butil::default_block_size when rdma_recv_block_type is set
…che#3315) AdaptiveMaxConcurrency has two class-static std::string members (UNLIMITED and CONSTANT) defined in adaptive_max_concurrency.cpp. Global AdaptiveMaxConcurrency objects in other translation units may be constructed before these static strings are initialized, causing undefined behavior. This issue was discovered during RISC-V porting and testing of BRPC. Different toolchains and linkers (GCC, Clang, cross-compilation toolchains for RISC-V, etc.) may produce different static initialization orders, making this bug manifest on some platforms but not others. Fix by replacing class-static std::string members with Meyers' Singleton pattern (function-local statics), which C++11 guarantees are initialized on first use in a thread-safe manner. This fix benefits all architectures including x86_64, ARM64, and RISC-V. Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn>
Co-authored-by: 郭业昌 <lvpengfei@MacBook-Air.local>
merge from release-1.17
…e#3312) * optimize crc32c for riscv64 with Zbc carry-less multiplication Implement hardware-accelerated CRC32C for RISC-V using the Zbc (carry-less multiplication) extension. The implementation uses 128-bit folding with 4-way parallelism and Barrett reduction, following the approach from Hadoop PR #8371. Key changes: - Add rv_clmul/rv_clmulh inline assembly wrappers - Implement 128-bit fold with 4-way parallel processing (64 bytes/iter) - Add Barrett reduction for final 128-bit to 32-bit conversion - Runtime CPU feature detection via /proc/cpuinfo - Compile-time guard: #ifdef __riscv_zbc - CMake option: WITH_RISCV_ZBC (default OFF) Performance: 3-4x speedup over table-based 8-byte unrolled baseline, ~1.1 GB/s throughput on 1MB data. Correctness: Verified against RFC 3720 B.4 and bitwise reference. Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn> * fix: wrap isSSE42() in #ifdef __SSE4_2__ to fix -Wunused-function isSSE42() is only called from within #ifdef __SSE4_2__ blocks, but the function definition was unconditional, causing -Wunused-function errors on non-x86 builds with -Werror. Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn> * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * Fix CRC32C Zbc: add missing ^ 0xFFFFFFFF pre/post processing rv_crc32c_clmul was missing the standard CRC32C pre/post XOR conversion. ExtendImpl does crc ^ 0xFFFFFFFF at entry and result ^ 0xFFFFFFFF at exit, but rv_crc32c_clmul did neither, causing wrong CRC values on RISC-V with Zbc extension. Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn> --------- Signed-off-by: Felix-Gong <gongxiaofei24@iscas.ac.cn> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…he#3323) * ci(linux): bring up redis-server and mysql-server for unittests The clang-unittest and clang-unittest-asan jobs run the full unit test suite via test/run_tests.sh, which includes backend integration tests (e.g. brpc_redis_unittest) that fork a real server when its binary is present and otherwise silently short-circuit to a passing result. Since CI never installed those servers, the redis backend tests reported PASSED while doing nothing (7 of 14 RedisTest cases skip-as-pass). Install redis-server and mysql-server before running the tests in both unittest jobs so these backend tests execute against a live server. The binaries are added only in the unittest jobs, not in the shared install-essential-dependencies action used by compile-only jobs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(linux): skip flaky redis client tests under ASan brpc_redis_unittest forks a real redis-server and waits a fixed 50ms before connecting; under ASan redis starts too slowly, causing flaky connection-refused. Skip RedisTest.* in the ASan job only (still covered by clang-unittest). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(linux): install redis-server and mysql-server in install-essential-dependencies Move the redis-server/mysql-server install out of the individual clang-unittest and clang-unittest-asan jobs and into the shared install-essential-dependencies composite action, so every job that installs dependencies has the servers available (and the unittest jobs no longer carry a bespoke install step). Under ASan the redis integration tests (sanity, keys_with_spaces, incr_and_decr, by_components, auth) fork a real redis-server and connect after a fixed 50ms wait; redis starts too slowly there and they flake with connection refused. Filter just those out under ASan -- the redis codec/server tests still run, and the full suite runs in clang-unittest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * ci(linux): run forked-server integration tests under bazel The bazel unittest jobs ran //test/... without redis-server/mysql-server installed, so brpc_redis_unittest forked nothing and its integration cases (sanity/keys_with_spaces/incr_and_decr/by_components/auth) early-returned as passes -- green but vacuous. - Install redis-server/mysql-server in the (non-ASan) bazel test jobs via the shared install-essential-dependencies action (the same one the make jobs use), so the servers are on PATH for the forked tests. - Tag brpc_redis_unittest "external" + "local" so bazel always re-runs it (never serves a cached skip) and runs it outside the sandbox where the PATH-located redis-server is visible and loopback works. Threaded through generate_unittests via a new per_test_tags arg. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: rajvarun77 <287367605+rajvarun77@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Fix HTTPS pooled client crash on unexpected SSL EOF When OpenSSL 3.x detects unexpected EOF (peer closed without close_notify), SSL_read returns 0 with SSL_ERROR_SSL. The code didn't return -1, causing error_code=0 to propagate to Controller::SetFailed() which triggers CHECK(false). Fix by returning -1 with errno=ESSL when SSL errors are detected in DoRead(), instead of falling through and returning nr. Discovered during RISC-V porting and integration testing. Fixes apache#3307 Signed-off-by: Felix Gong <gongxiaofei24@iscas.ac.cn> * Add unit test for unexpected SSL EOF handling Test that Socket::DoRead() returns -1 with errno=ESSL when the remote side closes the TCP connection without sending close_notify, verifying the fix in commit 7a1dfe7. --------- Signed-off-by: Felix Gong <gongxiaofei24@iscas.ac.cn>
…TY.md → THREAT_MODEL.md) (apache#3324) * Add threat model + security-model discoverability (AGENTS.md → SECURITY.md → THREAT_MODEL.md) Generated-by: Claude Code * THREAT_MODEL.md: renumber out-of-scope threats list to start at 1 Per review (chenBright) — the "Threats explicitly out of scope" list was continuing the previous list's numbering (8-15); restart at 1-8. Generated-by: Claude Code
Background
==========
The legacy v2 handshake ("RDMA" magic + 36B fixed binary HelloMessage)
had correctness bugs that made the wire format effectively
unevolvable.
v2 bugs
=======
A. Client never drained the "unknown tail" bytes when a peer sent
msg_len > HELLO_MSG_LEN_MIN(40). Leftover bytes stayed in the
socket recv buffer and silently corrupted the next ReadFromFd
(the ACK).
B. Server had the symmetric version of A.
C. Server computed the body read length from its LOCAL
g_rdma_hello_msg_len, implicitly assuming the peer's hello is the
same length as its own. A longer peer left bytes behind; a shorter
peer made the read block. Client used the correct compile-time
constant; the two sides were not symmetric.
Combined, A/B/C meant v2 could not safely append a single byte to the
hello -- even an "optional hint" appended by a newer sender would
mis-align an older receiver's next read.
v3 design
=========
Wire format, magic-namespace-isolated from v2:
[ "RDM3" 4B ][ pb_size 4B big-endian ][ RdmaHello protobuf bytes ]
with pb_size in (0, 4096]. RdmaHello carries the same 6 base fields
as v2 plus room to append future capabilities.
Why protobuf (and not "v2 plus length prefix")
----------------------------------------------
- Variable-length fields are coming. Future capability fields will
include strings (rdma_device_name, netdev_name, ...) and other
variable-length data. Supporting them on a fixed-binary protocol
forces us to invent and maintain a TLV layer (per-field type +
length + value framing, plus version-aware deserialization). That
is reimplementing protobuf badly. Using protobuf from day one
costs nothing and is the canonical answer.
- Fixes v2 bug A/B/C generically: pb_size makes the wire
self-describing, so the receiver never needs to guess the length
or know the peer's schema version to read the body cleanly.
- Append-only field evolution out of the box: new optional fields
cost old receivers nothing -- they're skipped as unknown protobuf
fields. v2 with a hand-rolled length prefix would still need
per-field opt-in code on every side.
- Built-in validation: ParseFromArray fails fast on malformed input;
required-field presence is enforced at the parse layer, not by
ad-hoc has_xxx() checks scattered through wire code.
- bRPC already depends on protobuf -- no new build dependency.
Why a NEW MAGIC rather than a version field inside protobuf
-----------------------------------------------------------
- Forces "breaking change" to be a deployment decision visible at
the wire level. You cannot accidentally ship a backwards-
incompatible patch via a field-semantics tweak.
- Server-side dispatch routes by magic to fully independent state
machines that can't entangle (no `if (version == X)` branches
anywhere -- this is the abstraction's red line).
- Any future breaking change bumps the magic ("RDM4", "RDM5", ...).
v3 fields, once shipped, never change semantics.
Rollout
=======
Server-side ALWAYS accepts both v2 and v3 (no gflag, no kill-switch);
magic routes to fully independent code paths. A single rolling upgrade
enables v3 fleet-wide.
Client-side picks the wire protocol via gflag with a safe default:
FLAGS_rdma_client_handshake_version (default 2)
2 = "RDMA" legacy (zero-regression default)
3 = "RDM3" protobuf (opt-in once target servers support v3)
Sub-second rollback is one flag flip away. v3 client to v2-only legacy
server is NOT guaranteed to transparently fall back on the same
connection -- the supported migration is "upgrade servers first, then
opt-in clients".
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: resolve
Problem Summary:
What is changed and the side effects?
Changed:
Side effects:
Performance effects:
Breaking backward compatibility:
Check List: