Commit 72b928f
committed
feat(message_bus): async fire-and-forget transport for VSR consensus
The previous cache/connection.rs was a single-stream blob that
held a RefCell<TcpStream> across .await, awaited kernel write
completion in the send path, and serialized fan-out through one
shared lock. Under VSR pipelining a slow peer stalled sends to
every other peer in the same dispatch round, killing parallel
quorum collection. Reentrant sends to the same peer would also
panic on BorrowMutError.
Rebuilt the crate around a per-connection writer task that
drains a bounded mpsc and submits batches via a single
write_vectored_all syscall, with an independent read half
handled by a dedicated reader task. send_to_* is now sync
try_send under the async fn shell: zero awaits, returns
SendError::Backpressure on full so VSR can recover via WAL
retransmission or view-change timeouts.
The lifecycle module owns a root Shutdown / ShutdownToken plus
a ConnectionRegistry<K> that tracks the per-peer Sender and
both task handles for graceful drain. The directional rule
(lower id dials, higher id accepts) eliminates the dialed-
both-ways race without a tiebreaker. Message::into_frozen()
removes the per-send memcpy.
Socket split and lifecycle plumbing (C1, C2, Cx2)
Swapped the dup(2)-based socket_split module for compio's
native TcpStream::into_split(), deleting ~50 lines of unsafe,
three dup_stream copies, and an entire module file. Reader and
writer tasks now take OwnedReadHalf / OwnedWriteHalf directly.
The writer task, on write_vectored_all error, removes its own
registry entry so a stale Sender cannot accept further sends;
on graceful shutdown path it defers removal to the root drain
so DrainOutcome.clean counts both halves. The reader self-remove
path went through a new ConnectionRegistry::close_peer helper
that orders "close sender, await writer, drop reader handle"
to prevent a mid-writev cancellation landing a truncated frame.
All listeners + the outbound connector apply TCP SO_KEEPALIVE
via socket2 (TCP_KEEPIDLE=10s, TCP_KEEPINTVL=5s, TCP_KEEPCNT=3)
so a silently dead peer is detected within ~25s, well inside
the VSR view-change window.
Shutdown loop-drain (C3, N5)
track_background is back to an unconditional push. shutdown
now loop-drains the background-task vec until empty, so a
task pushed mid-shutdown (e.g. a reader that observed the
token and registered its own cleanup) is still awaited.
DrainOutcome gains background_clean / background_force fields
to make the bg count observable.
Cluster config wedge fixes (C5, C6)
replica_id is now Option<u8> on both CurrentNodeConfig and
OtherNodeConfig. Startup-time ClusterConfig::validate rejects
missing ids, duplicate ids, and ids >= total replica count
so a misconfigured cluster cannot wedge into a permanent view
change on boot. CurrentNodeConfig gained the TransportPorts
field that was already present on OtherNodeConfig; the bus
now has a place to read its own tcp_replica bind port from.
Per-connection body-byte limiter (C7)
A new FrameRateLimiter (token bucket, 32 MiB/s sustained, 256
MiB burst by default) gates the body-read allocation in
framing::read_message. A peer claiming a valid 64 MiB body
goes through once then has to wait seconds before the next
burst; the hard ceiling on MAX_MESSAGE_SIZE stays as-is.
Zero-copy framing (W1, W3)
framing::read_message was allocating three Vecs and copying
the header + body twice to reassemble the final Owned. It is
now a single Owned<MESSAGE_ALIGN>::with_capacity(HEADER_SIZE)
handed straight to compio's read_exact, grown in-place via
reserve_exact after the size field is parsed, then a second
read_exact into owned.slice(HEADER_SIZE..total_size). The
backing AVec's buffer is reused across both reads: one alloc
(with at most one in-place realloc) and zero memcpys of the
data. iggy_binary_protocol's Owned<ALIGN> now implements
IoBuf / IoBufMut / SetLen so compio can drive the read
directly. A compile-time static assert guards the hardcoded
offset_of!(GenericHeader, size) == 48 the reader relies on.
Router blocking-send fix (C8)
core/shard/src/router.rs was calling the blocking
crossfire::MTx::send from the compio reactor, which could park
the io_uring thread and stall every other connection on the
shard when an inbox filled up. Both dispatch and dispatch_request
now use try_send and log-drop on Full / Disconnected; consensus
recovers via WAL retransmit or view change.
Misc (Cx1, W4, W6, W7, W8, W9, W10, N3, N4)
- connector.rs's dead is_err branch on insert() is replaced
with an expect() documenting the directional-rule invariant.
- SendError::Io is removed (no constructor anywhere).
- replica_listener doc adds an explicit trusted-network-only
warning for the handshake (no mTLS / shared secret).
- writer_task MAX_BATCH doc drops the "tunable per-deployment"
claim.
- Duplicate client id insert is now unreachable!() with a
helpful diagnostic.
- chain-replication failure paths use structured
tracing::warn!(error = ?e, ...) so oncall can grep the
SendError variant.
- MessageBus trait doc advertises the no-yield property of
send_to_* in the production impl.
- writer_task's license-header "czpressed" typo fixed.
- transports/mod.rs 40-line sketch compressed to a tracking
reference for IGGY-112.
Deferred with TODO(hubcio)
- Chain-replicate-before-journal-append ordering (C4) is
flagged at both metadata and partitions sites; an ordering
decision issue is needed, not a patch in this PR.
- Fan-out deep_copy (W2) in plane_helpers and the three shard
send sites is flagged; requires a trait-level change to
MessageBus::send_to_* taking Frozen<MESSAGE_ALIGN> so the
primary freezes once and fan-out is refcount bumps.
Scope
Consensus, metadata, partitions, shard, and simulator are
updated to the new MessageBus shape; core/server is left
untouched as legacy code. MESSAGE_ALIGN promoted to pub in
iggy_binary_protocol::consensus is consistent with the
existing Message<GenericHeader> leak through the MessageBus
trait and is accepted for this PR (N8).
Follow-up issues to file:
- W2 MessageBus::send_to_*(.., Frozen<MESSAGE_ALIGN>) trait-
level fan-out fix (benchmark-driven)
- W5 SendError::Backpressure(Message) carrying payload back
to the caller (benchmark-driven)
- W11 SimOutbox bounded mode (simulator coverage)
- N6 ConnectionRegistry<u8> -> Vec<Option<Entry>> micro-opt
- N7 drain parallel via FuturesUnordered
- C4 VSR chain-replicate-vs-append ordering decision
- IGGY-112 Transport trait family + mTLS handshake1 parent f5350d9 commit 72b928f
42 files changed
Lines changed: 3623 additions & 691 deletions
File tree
- core
- binary_protocol/src/consensus
- configs/src/server_config
- consensus/src
- message_bus
- src
- cache
- lifecycle
- transports
- tests
- common
- metadata/src/impls
- partitions/src
- server
- shard/src
- simulator/src
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
150 | | - | |
| 150 | + | |
151 | 151 | | |
152 | 152 | | |
153 | 153 | | |
| |||
451 | 451 | | |
452 | 452 | | |
453 | 453 | | |
454 | | - | |
| 454 | + | |
455 | 455 | | |
456 | 456 | | |
457 | 457 | | |
| |||
532 | 532 | | |
533 | 533 | | |
534 | 534 | | |
535 | | - | |
| 535 | + | |
536 | 536 | | |
537 | 537 | | |
538 | 538 | | |
| |||
654 | 654 | | |
655 | 655 | | |
656 | 656 | | |
657 | | - | |
| 657 | + | |
658 | 658 | | |
659 | 659 | | |
660 | 660 | | |
661 | | - | |
| 661 | + | |
662 | 662 | | |
663 | 663 | | |
664 | 664 | | |
| |||
708 | 708 | | |
709 | 709 | | |
710 | 710 | | |
711 | | - | |
| 711 | + | |
712 | 712 | | |
713 | 713 | | |
714 | 714 | | |
| |||
761 | 761 | | |
762 | 762 | | |
763 | 763 | | |
764 | | - | |
| 764 | + | |
765 | 765 | | |
766 | 766 | | |
767 | 767 | | |
768 | 768 | | |
769 | 769 | | |
770 | | - | |
| 770 | + | |
771 | 771 | | |
772 | 772 | | |
773 | 773 | | |
| |||
826 | 826 | | |
827 | 827 | | |
828 | 828 | | |
829 | | - | |
| 829 | + | |
830 | 830 | | |
831 | 831 | | |
832 | 832 | | |
| |||
907 | 907 | | |
908 | 908 | | |
909 | 909 | | |
910 | | - | |
| 910 | + | |
911 | 911 | | |
912 | 912 | | |
913 | 913 | | |
| |||
1063 | 1063 | | |
1064 | 1064 | | |
1065 | 1065 | | |
1066 | | - | |
1067 | | - | |
1068 | | - | |
1069 | | - | |
1070 | | - | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
1071 | 1071 | | |
1072 | 1072 | | |
1073 | 1073 | | |
1074 | 1074 | | |
1075 | 1075 | | |
1076 | 1076 | | |
1077 | 1077 | | |
1078 | | - | |
| 1078 | + | |
1079 | 1079 | | |
1080 | 1080 | | |
1081 | 1081 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
24 | | - | |
| 25 | + | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
| |||
91 | 92 | | |
92 | 93 | | |
93 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
94 | 105 | | |
95 | 106 | | |
96 | 107 | | |
| |||
133 | 144 | | |
134 | 145 | | |
135 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
136 | 195 | | |
137 | 196 | | |
138 | 197 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
24 | 24 | | |
25 | 25 | | |
26 | 26 | | |
27 | | - | |
| 27 | + | |
28 | 28 | | |
29 | 29 | | |
30 | 30 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
70 | | - | |
| 69 | + | |
| 70 | + | |
71 | 71 | | |
72 | 72 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
39 | 51 | | |
40 | 52 | | |
41 | 53 | | |
42 | 54 | | |
43 | 55 | | |
44 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
45 | 61 | | |
46 | 62 | | |
47 | 63 | | |
| |||
51 | 67 | | |
52 | 68 | | |
53 | 69 | | |
| 70 | + | |
| 71 | + | |
54 | 72 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
583 | 583 | | |
584 | 584 | | |
585 | 585 | | |
| 586 | + | |
| 587 | + | |
586 | 588 | | |
587 | 589 | | |
588 | 590 | | |
| |||
592 | 594 | | |
593 | 595 | | |
594 | 596 | | |
| 597 | + | |
595 | 598 | | |
596 | 599 | | |
597 | 600 | | |
598 | 601 | | |
599 | 602 | | |
| 603 | + | |
600 | 604 | | |
601 | 605 | | |
602 | 606 | | |
| |||
0 commit comments