Skip to content

[multicast] M2P forwarding, OPTE port subscription, and sled-agent propagation#10070

Open
zeeshanlakhani wants to merge 5 commits intomulticast-e2efrom
zl/multicast-m2p-forwarding
Open

[multicast] M2P forwarding, OPTE port subscription, and sled-agent propagation#10070
zeeshanlakhani wants to merge 5 commits intomulticast-e2efrom
zl/multicast-m2p-forwarding

Conversation

@zeeshanlakhani
Copy link
Copy Markdown
Collaborator

Complete the multicast data path by adding per-sled M2P (multicast-to- physical) mapping, forwarding entry management, and OPTE port subscription for multicast group members.

Sled-agent:

  • Add multicast_subscribe / multicast_unsubscribe endpoints (API v29) that configure M2P, forwarding, and OPTE port subscription for a VMM
  • OPTE port_manager gains set/clear operations for M2P and forwarding
  • Port subscription cleanup on PortTicket release

Nexus:

  • New sled.rs (MulticastSledClient) encapsulating all sled-agent multicast interactions: subscribe/unsubscribe, M2P/forwarding propagation and teardown
  • Groups RPW propagates M2P and forwarding entries to all member sleds after DPD configuration, with convergent retry on failure
  • Members RPW uses MemberReconcileCtx to thread shared reconciliation state. This handles subscribe on join, unsubscribe on leave, and re-subscribe on migration
  • Dataplane client updated for bifurcated replication groups

Tests:

  • Integration tests for M2P/forwarding/subscribe lifecycle
  • Instance migration multicast re-convergence

@zeeshanlakhani zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch 4 times, most recently from 3efe7fa to 98e9742 Compare March 17, 2026 05:42
@zeeshanlakhani
Copy link
Copy Markdown
Collaborator Author

zeeshanlakhani commented Mar 17, 2026

Note: Both helios/deploy check-opte-ver/check-opte-ver in CI will fail for now. After R19 ships and OPTE PR #924 merges, we can switch from branch = "zl/filter-mcast-srcs" to rev = "<merged-sha>" and bump tools/opte_version + deploy.sh target to 0.40.

@zeeshanlakhani zeeshanlakhani marked this pull request as ready for review March 17, 2026 09:02
@zeeshanlakhani zeeshanlakhani self-assigned this Mar 23, 2026
@zeeshanlakhani zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch from 98e9742 to b510a9f Compare March 24, 2026 02:40
@zeeshanlakhani
Copy link
Copy Markdown
Collaborator Author

zeeshanlakhani commented Mar 26, 2026

@jgallagher I started down the better path for network types in relation to #10139, where we could expand on the initial version in a separate PR. For the moment, I'm going to move the types back to omicron_common until #10158 finalizes.

@zeeshanlakhani zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch 2 times, most recently from 102d7fb to 043af28 Compare March 26, 2026 11:51
Copy link
Copy Markdown
Contributor

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to be posting comments in chunks given the apparent size of the PR, so please bear with me there if I miss something that's covered by a later file. Apart from that I'm seeing various changes from main interspersed here, so I don't know if something is up with the branch targeting.

…main builds

Bump maghemite and OPTE to versions with the latest multicast support.

OPTE now has the option to be installed via p5p package override from buildomat
rather than directly downloading xde/opteadm binaries. The override
mechanism (tools/opte_version_override) is sourced and packaged for use with
install_opte.sh, deploy.sh, releng, and CI to install the unpublished OPTE
build until it lands in the helios pkg repo.

Note: CI check added to reject OPTE_COMMIT override on PRs targeting main.
@zeeshanlakhani zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch from 043af28 to 9cdf741 Compare March 28, 2026 09:36
…opagation

This completes the multicast data path by adding per-sled M2P (multicast-to-
physical) mapping, forwarding entry management, and OPTE port subscription
for multicast group members.

## Sled-agent + API update(s)

  - Add multicast endpoints at API v33 (MCAST_M2P_FORWARDING) for M2P,
    forwarding, and per-VMM subscribe/unsubscribe
  - Version v7 join/leave endpoints to v7..v33 with shim conversion
  - Move multicast types from omicron-common to sled-agent-types-versions
    v33 module (mcast_m2p_forwarding) with re-exports through sled-agent-types
  - OPTE port_manager gains set/clear operations for M2P and forwarding
  - Port subscription cleanup on PortTicket release
  - Consolidate per-port mutable state (eip_gateways, mcast) into PortState
  - Seed eip_gateways from global map on port creation to prevent stale
    gateway state on newly created ports
  - Lock ordering documented for ports, routes, eip_gateways

## Nexus

  - New `sled.rs` (MulticastSledClient) encapsulating all sled-agent
    multicast interactions: subscribe/unsubscribe, M2P/forwarding
    propagation and teardown
  - Groups RPW propagates M2P and forwarding entries to all member sleds
    after DPD configuration, with convergent retry on failure
  - Members RPW uses MemberReconcileCtx to thread shared reconciliation
    state. Handles subscribe on join, unsubscribe on leave, and
    re-subscribe on migration
  - `subscribe_vmm` gracefully handles missing propolis (mirrors unsubscribe)
  - `lookup_propolis_id` returns Ok(None) for missing instance
  - `lookup_and_update_member_sled_id` surfaces DB errors instead of
    swallowing them
  - Order-independent forwarding comparison to avoid spurious dataplane churn;
    always create forwarding entries for active groups even with empty next-hops
  - Dataplane client updated for bifurcated replication groups

## illumos-utils

  - Remove CIDR allow rules for multicast (handled by OPTE gateway layer)
  - Reject Reserved replication mode in `list_mcast_fwd` with
    InvalidMcastForwardingState error
  - Consolidate error variants into InvalidMcastUnderlay

## Tests

  - Integration tests for M2P/forwarding/subscribe lifecycle
  - Instance migration multicast re-convergence
Copy link
Copy Markdown
Contributor

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Zeeshan; I haven't looked at integration_tests/multicast/networking_integration.rs, but otherwise some thoughts going through the work.

Comment on lines +22 to +36
check-opte-override:
if: github.base_ref == 'main'
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
with:
ref: ${{ github.event.pull_request.head.sha }} # see omicron#4461
- name: Reject OPTE override on main
run: |
source tools/opte_version_override
if [[ "x$OPTE_COMMIT" != "x" ]]; then
echo "::error::OPTE_COMMIT is set in tools/opte_version_override."
echo "::error::The OPTE override must be cleared before merging to main."
exit 1
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this isn't firing since we're pointed at multicast-e2e right now, but I think this is good!

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, will fire on main.

Comment on lines +974 to +982
// Clear M2P/forwarding from all sleds before DPD cleanup.
// This must succeed before deleting DB records, otherwise
// stale OPTE state would persist on failed sleds with no
// source of truth to drive a later cleanup pass.
sled_client
.clear_m2p_and_forwarding(opctx, group)
.await
.context("failed to clear M2P/forwarding from sleds")?;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not looked into this reconciler RPW before, but what do you mean by 'failed sleds' here? I feel like I'm missing a little context.

Copy link
Copy Markdown
Collaborator Author

@zeeshanlakhani zeeshanlakhani Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleared up the comment. This meant sleds where the clear operation failed to be reached or fully complete (and we lose the source of truth).

Comment on lines -100 to -109

source $OMICRON_TOP/tools/opte_version_override

if [[ "x$OPTE_COMMIT" != "x" ]]; then
set +x
curl -fOL https://buildomat.eng.oxide.computer/public/file/oxidecomputer/opte/module/$OPTE_COMMIT/xde
pfexec rem_drv xde || true
pfexec mv xde /kernel/drv/amd64/xde
pfexec add_drv xde || true
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new way is probably a bit safer than this, thanks for chasing this down 😅

Changes:

- Remove global eip_gateways map from PortManagerInner, as the VPC route manager RPW activates after instance start
- Refactor member reconciler methods to take &MemberReconcileCtx
- Change forwarding next hop from member sleds to a single switch zone IP
- Add resolver to MulticastSledClient for switch zone address lookup
@zeeshanlakhani zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch from ee88e45 to 46eb139 Compare April 1, 2026 07:39
@zeeshanlakhani zeeshanlakhani force-pushed the zl/multicast-m2p-forwarding branch from 46eb139 to 6b07a46 Compare April 1, 2026 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants