[multicast] M2P forwarding, OPTE port subscription, and sled-agent propagation#10070
[multicast] M2P forwarding, OPTE port subscription, and sled-agent propagation#10070zeeshanlakhani wants to merge 5 commits intomulticast-e2efrom
Conversation
3efe7fa to
98e9742
Compare
|
Note: Both |
98e9742 to
b510a9f
Compare
b510a9f to
1402d4d
Compare
|
|
102d7fb to
043af28
Compare
FelixMcFelix
left a comment
There was a problem hiding this comment.
I'm going to be posting comments in chunks given the apparent size of the PR, so please bear with me there if I miss something that's covered by a later file. Apart from that I'm seeing various changes from main interspersed here, so I don't know if something is up with the branch targeting.
…main builds Bump maghemite and OPTE to versions with the latest multicast support. OPTE now has the option to be installed via p5p package override from buildomat rather than directly downloading xde/opteadm binaries. The override mechanism (tools/opte_version_override) is sourced and packaged for use with install_opte.sh, deploy.sh, releng, and CI to install the unpublished OPTE build until it lands in the helios pkg repo. Note: CI check added to reject OPTE_COMMIT override on PRs targeting main.
043af28 to
9cdf741
Compare
…opagation
This completes the multicast data path by adding per-sled M2P (multicast-to-
physical) mapping, forwarding entry management, and OPTE port subscription
for multicast group members.
## Sled-agent + API update(s)
- Add multicast endpoints at API v33 (MCAST_M2P_FORWARDING) for M2P,
forwarding, and per-VMM subscribe/unsubscribe
- Version v7 join/leave endpoints to v7..v33 with shim conversion
- Move multicast types from omicron-common to sled-agent-types-versions
v33 module (mcast_m2p_forwarding) with re-exports through sled-agent-types
- OPTE port_manager gains set/clear operations for M2P and forwarding
- Port subscription cleanup on PortTicket release
- Consolidate per-port mutable state (eip_gateways, mcast) into PortState
- Seed eip_gateways from global map on port creation to prevent stale
gateway state on newly created ports
- Lock ordering documented for ports, routes, eip_gateways
## Nexus
- New `sled.rs` (MulticastSledClient) encapsulating all sled-agent
multicast interactions: subscribe/unsubscribe, M2P/forwarding
propagation and teardown
- Groups RPW propagates M2P and forwarding entries to all member sleds
after DPD configuration, with convergent retry on failure
- Members RPW uses MemberReconcileCtx to thread shared reconciliation
state. Handles subscribe on join, unsubscribe on leave, and
re-subscribe on migration
- `subscribe_vmm` gracefully handles missing propolis (mirrors unsubscribe)
- `lookup_propolis_id` returns Ok(None) for missing instance
- `lookup_and_update_member_sled_id` surfaces DB errors instead of
swallowing them
- Order-independent forwarding comparison to avoid spurious dataplane churn;
always create forwarding entries for active groups even with empty next-hops
- Dataplane client updated for bifurcated replication groups
## illumos-utils
- Remove CIDR allow rules for multicast (handled by OPTE gateway layer)
- Reject Reserved replication mode in `list_mcast_fwd` with
InvalidMcastForwardingState error
- Consolidate error variants into InvalidMcastUnderlay
## Tests
- Integration tests for M2P/forwarding/subscribe lifecycle
- Instance migration multicast re-convergence
9cdf741 to
18943d0
Compare
FelixMcFelix
left a comment
There was a problem hiding this comment.
Thanks Zeeshan; I haven't looked at integration_tests/multicast/networking_integration.rs, but otherwise some thoughts going through the work.
| check-opte-override: | ||
| if: github.base_ref == 'main' | ||
| runs-on: ubuntu-22.04 | ||
| steps: | ||
| - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 | ||
| with: | ||
| ref: ${{ github.event.pull_request.head.sha }} # see omicron#4461 | ||
| - name: Reject OPTE override on main | ||
| run: | | ||
| source tools/opte_version_override | ||
| if [[ "x$OPTE_COMMIT" != "x" ]]; then | ||
| echo "::error::OPTE_COMMIT is set in tools/opte_version_override." | ||
| echo "::error::The OPTE override must be cleared before merging to main." | ||
| exit 1 | ||
| fi |
There was a problem hiding this comment.
I assume this isn't firing since we're pointed at multicast-e2e right now, but I think this is good!
There was a problem hiding this comment.
Exactly, will fire on main.
| // Clear M2P/forwarding from all sleds before DPD cleanup. | ||
| // This must succeed before deleting DB records, otherwise | ||
| // stale OPTE state would persist on failed sleds with no | ||
| // source of truth to drive a later cleanup pass. | ||
| sled_client | ||
| .clear_m2p_and_forwarding(opctx, group) | ||
| .await | ||
| .context("failed to clear M2P/forwarding from sleds")?; | ||
|
|
There was a problem hiding this comment.
I've not looked into this reconciler RPW before, but what do you mean by 'failed sleds' here? I feel like I'm missing a little context.
There was a problem hiding this comment.
I cleared up the comment. This meant sleds where the clear operation failed to be reached or fully complete (and we lose the source of truth).
|
|
||
| source $OMICRON_TOP/tools/opte_version_override | ||
|
|
||
| if [[ "x$OPTE_COMMIT" != "x" ]]; then | ||
| set +x | ||
| curl -fOL https://buildomat.eng.oxide.computer/public/file/oxidecomputer/opte/module/$OPTE_COMMIT/xde | ||
| pfexec rem_drv xde || true | ||
| pfexec mv xde /kernel/drv/amd64/xde | ||
| pfexec add_drv xde || true | ||
| fi |
There was a problem hiding this comment.
The new way is probably a bit safer than this, thanks for chasing this down 😅
Changes: - Remove global eip_gateways map from PortManagerInner, as the VPC route manager RPW activates after instance start - Refactor member reconciler methods to take &MemberReconcileCtx - Change forwarding next hop from member sleds to a single switch zone IP - Add resolver to MulticastSledClient for switch zone address lookup
ee88e45 to
46eb139
Compare
46eb139 to
6b07a46
Compare
Complete the multicast data path by adding per-sled M2P (multicast-to- physical) mapping, forwarding entry management, and OPTE port subscription for multicast group members.
Sled-agent:
multicast_subscribe/multicast_unsubscribeendpoints (API v29) that configure M2P, forwarding, and OPTE port subscription for a VMMNexus:
sled.rs(MulticastSledClient) encapsulating all sled-agent multicast interactions: subscribe/unsubscribe, M2P/forwarding propagation and teardownTests: