|
| 1 | +# Zero-Allocation Sender-to-Awaitable Bridge |
| 2 | + |
| 3 | +Every IoAwaitable ever written - timers, mutexes, channels, semaphores, |
| 4 | +file I/O, sockets, database queries, HTTP clients - is now consumable |
| 5 | +by a sender pipeline. No coroutine frame. No heap allocation. Zero cost. |
| 6 | + |
| 7 | +## What This Is |
| 8 | + |
| 9 | +`as_sender` wraps any IoAwaitable in a P2300-compliant sender. A |
| 10 | +receiver attaches to the sender through `connect`. When `start` is |
| 11 | +called, the operation state drives the awaitable protocol directly - |
| 12 | +`await_ready`, `await_suspend`, `await_resume` - without ever creating |
| 13 | +a coroutine. |
| 14 | + |
| 15 | +The awaitable does not know it is talking to a sender. It sees a |
| 16 | +`coroutine_handle<>` and an `io_env*`, exactly as it would from a |
| 17 | +coroutine. The awaitable's code does not change. Not one line. |
| 18 | + |
| 19 | +```cpp |
| 20 | +// Wrap any IoAwaitable as a sender |
| 21 | +auto sndr = as_sender(stream.read_some(buf)); |
| 22 | + |
| 23 | +// Attach a receiver and start the operation |
| 24 | +auto op = connect(std::move(sndr), my_receiver); |
| 25 | +start(op); |
| 26 | +``` |
| 27 | +
|
| 28 | +## How It Works |
| 29 | +
|
| 30 | +The bridge rests on a single observation: all three major compilers |
| 31 | +(MSVC, GCC, Clang) lay out a coroutine frame with two function |
| 32 | +pointers at the front: |
| 33 | +
|
| 34 | +```cpp |
| 35 | +struct coroutine_frame { |
| 36 | + void (*resume)(coroutine_frame*); |
| 37 | + void (*destroy)(coroutine_frame*); |
| 38 | + // ... promise, locals, state ... |
| 39 | +}; |
| 40 | +``` |
| 41 | + |
| 42 | +When you call `handle.resume()`, the compiler calls the function |
| 43 | +pointer at offset zero. That is all it does. |
| 44 | + |
| 45 | +The bridge defines a lightweight struct that matches this layout: |
| 46 | + |
| 47 | +```cpp |
| 48 | +struct frame_cb { |
| 49 | + void (*resume)(frame_cb*); |
| 50 | + void (*destroy)(frame_cb*); |
| 51 | + void* data; |
| 52 | +}; |
| 53 | +``` |
| 54 | +
|
| 55 | +Three pointers. Twenty-four bytes on a 64-bit platform. The `resume` |
| 56 | +pointer holds the sender's completion callback. The `destroy` pointer |
| 57 | +is a no-op - the sender owns its own lifetime. The `data` pointer |
| 58 | +points back to the operation state. |
| 59 | +
|
| 60 | +`std::coroutine_handle<>::from_address(&cb)` produces a handle that, |
| 61 | +when `.resume()` is called, invokes the function pointer at offset |
| 62 | +zero - which is our callback. The awaitable receives this handle. It |
| 63 | +cannot tell the difference. It does not need to. |
| 64 | +
|
| 65 | +## The Flow |
| 66 | +
|
| 67 | +Here is what happens, step by step: |
| 68 | +
|
| 69 | +- **`as_sender(awaitable)`** stores the awaitable inside a sender. |
| 70 | + Nothing runs yet. Senders are lazy. |
| 71 | +
|
| 72 | +- **`connect(sender, receiver)`** produces an operation state. The |
| 73 | + operation state holds the awaitable, the receiver, an `io_env`, and |
| 74 | + a `frame_cb`. Everything lives on the operation state. No allocation. |
| 75 | +
|
| 76 | +- **`start(op_state)`** begins the operation: |
| 77 | +
|
| 78 | + 1. The executor and stop token are pulled from the receiver's |
| 79 | + environment and stored in the `io_env`. |
| 80 | +
|
| 81 | + 2. `await_ready()` is checked. If the awaitable is immediately |
| 82 | + ready, the result is harvested and the receiver is signaled |
| 83 | + inline. |
| 84 | +
|
| 85 | + 3. Otherwise, the `frame_cb` is filled in: `resume` points to the |
| 86 | + completion callback, `destroy` is a no-op, `data` points to the |
| 87 | + operation state. A `coroutine_handle<>` is manufactured from the |
| 88 | + `frame_cb`'s address. `await_suspend(handle, &env)` is called on |
| 89 | + the awaitable. |
| 90 | +
|
| 91 | +- **The awaitable runs.** It submits work to the reactor - a timer |
| 92 | + fires, bytes arrive on a socket, a mutex unlocks. When the operation |
| 93 | + completes, the reactor calls `executor.post(handle)` or |
| 94 | + `executor.dispatch(handle)`. |
| 95 | +
|
| 96 | +- **The executor calls `handle.resume()`.** Because the handle points |
| 97 | + at the `frame_cb`, this calls the `resume` function pointer. The |
| 98 | + callback recovers the operation state from `data`, calls |
| 99 | + `await_resume()` to harvest the result, and signals the receiver |
| 100 | + through `set_value`, `set_error`, or `set_stopped`. |
| 101 | +
|
| 102 | +The awaitable went through its entire lifecycle - ready check, |
| 103 | +suspension, reactor submission, executor resumption, result harvest - |
| 104 | +without a coroutine ever existing. |
| 105 | +
|
| 106 | +## What This Means |
| 107 | +
|
| 108 | +The awaitable ecosystem and the sender ecosystem are no longer |
| 109 | +separate worlds. They are one world. |
| 110 | +
|
| 111 | +Every IoAwaitable anyone has written becomes a sender with a single |
| 112 | +function call. Awaitable authors gain a new consumer base without |
| 113 | +modifying a single line of their code. Sender authors gain access to |
| 114 | +every I/O primitive the awaitable ecosystem has produced - and will |
| 115 | +produce - at zero allocation cost. |
| 116 | +
|
| 117 | +- **One I/O implementation.** The library implements each operation |
| 118 | + once as an IoAwaitable. Coroutines `co_await` it. Sender pipelines |
| 119 | + consume it through `as_sender`. Both go through the same reactor, |
| 120 | + the same executor, the same platform code. |
| 121 | +
|
| 122 | +- **Zero allocation.** The `frame_cb` lives on the operation state. |
| 123 | + No coroutine frame. No heap allocation. No bridge coroutine. The |
| 124 | + previous implementation allocated a coroutine frame per I/O |
| 125 | + operation just to obtain a `coroutine_handle<>`. That tax is gone. |
| 126 | +
|
| 127 | +- **Full protocol fidelity.** The bridge respects `await_ready` for |
| 128 | + synchronous fast-paths. It normalizes `await_suspend` return types |
| 129 | + (`void`, `bool`, `coroutine_handle<>`). It propagates the executor |
| 130 | + and stop token through `io_env`. It routes results to `set_value`, |
| 131 | + errors to `set_error`, and cancellation to `set_stopped`. |
| 132 | +
|
| 133 | +- **Transparent to the awaitable.** The awaitable sees a |
| 134 | + `coroutine_handle<>` and an `io_env const*`. It does not know |
| 135 | + whether the handle points at a coroutine frame or a `frame_cb`. It |
| 136 | + does not need to know. The handle is the abstraction boundary, and |
| 137 | + the abstraction holds. |
| 138 | +
|
| 139 | +- **Works today.** This is not a proposal. It is shipping code. It |
| 140 | + compiles and passes tests on MSVC, GCC, and Clang. The ABI |
| 141 | + compatibility that makes it work is the same ABI reality documented |
| 142 | + in P3203R0 and relied upon by Boost.Cobalt in production. |
| 143 | +
|
| 144 | +## Example |
| 145 | +
|
| 146 | +```cpp |
| 147 | +namespace capy = boost::capy; |
| 148 | +namespace ex = beman::execution; |
| 149 | +
|
| 150 | +// A Capy IoAwaitable - a 500ms timer |
| 151 | +auto sndr = capy::as_sender(capy::delay(500ms)); |
| 152 | +
|
| 153 | +// Connect a receiver whose environment carries a Capy executor |
| 154 | +auto op = ex::connect( |
| 155 | + std::move(sndr), |
| 156 | + my_receiver{ |
| 157 | + {pool.get_executor(), stop_source.get_token()}, |
| 158 | + &done}); |
| 159 | +
|
| 160 | +// Start the operation - no coroutine frame allocated |
| 161 | +ex::start(op); |
| 162 | +``` |
| 163 | + |
| 164 | +The receiver's environment provides the executor and stop token. The |
| 165 | +bridge threads them into the `io_env` that the awaitable expects. The |
| 166 | +timer fires, the executor resumes the handle, the receiver gets |
| 167 | +`set_value()`. Twenty-four bytes of `frame_cb` on the operation state. |
| 168 | +That is the entire cost. |
| 169 | + |
| 170 | +Welcome to the awaitable universe. The door is open. |
0 commit comments