Skip to content

Commit fc3bde8

Browse files
[multicast] Add multicast replication support for softnpu/a4x2
Introduce two-group multicast replication (mcast_grp_a / mcast_grp_b) modeled after dendrite's replication contract and P4 sidecar expectations. When egress metadata declares the multicast field triplet, the generated pipeline replicates packets to group members with per-copy attribution tags, ingress-port exclusion, and mcast-over-broadcast precedence. ## Slice codegen fix Fix a latent bug where non-byte-aligned multi-byte slices (e.g., field[11:4] on bit<32>) produced incorrect bitvec ranges after header byte-reversal. Validation moves to HLIR so these are now rejected at compile time with a diagnostic rather than silently generating wrong code. ## Multicast contract The codegen path activates when egress_metadata_t declares all three fields: mcast_grp_a, mcast_grp_b, and mcast_replication. Partial declarations of these fields are caught at codegen time. A separate softnpu_mcast.p4 test platform definition keeps multicast opt-in. ## Runtime support McastReplicationTag tracks per-copy group attribution (External / Underlay / Both) via bitwise OR, distinct from dendrite's MULTICAST_TAG_* wire encoding. Five new required methods on the Pipeline trait expose group management. ## Tests Integration tests covering varying multicast workflows. Two end-to-end HLIR tests verify slice alignment diagnostics through the full compiler pipeline.
1 parent 132cdc3 commit fc3bde8

9 files changed

Lines changed: 1178 additions & 150 deletions

File tree

codegen/rust/src/expression.rs

Lines changed: 245 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -111,22 +111,37 @@ impl<'a> ExpressionGenerator<'a> {
111111
}
112112
ExpressionKind::Index(lval, xpr) => {
113113
let mut ts = self.generate_lvalue(lval);
114-
ts.extend(self.generate_expression(xpr.as_ref()));
114+
// When the index expression is a slice (e.g. `field[31:28]`),
115+
// we need the parent field's bit width to adjust for the
116+
// byte reversal in header.rs. The HLIR resolves lvalue types
117+
// during type checking, so we look up the width here and
118+
// pass it to generate_slice. For non-bit types (or if the
119+
// lvalue is missing from the HLIR, which should not happen
120+
// for well-typed programs), we fall back to width 0 which
121+
// skips the adjustment.
122+
if let ExpressionKind::Slice(begin, end) = &xpr.kind {
123+
let field_width = self
124+
.hlir
125+
.lvalue_decls
126+
.get(lval)
127+
.map(|ni| match &ni.ty {
128+
p4::ast::Type::Bit(w)
129+
| p4::ast::Type::Varbit(w)
130+
| p4::ast::Type::Int(w) => *w,
131+
_ => 0,
132+
})
133+
.unwrap_or(0);
134+
ts.extend(self.generate_slice(begin, end, field_width));
135+
} else {
136+
ts.extend(self.generate_expression(xpr.as_ref()));
137+
}
115138
ts
116139
}
117140
ExpressionKind::Slice(begin, end) => {
118-
let l = match &begin.kind {
119-
ExpressionKind::IntegerLit(v) => *v as usize,
120-
_ => panic!("slice ranges can only be integer literals"),
121-
};
122-
let l = l + 1;
123-
let r = match &end.kind {
124-
ExpressionKind::IntegerLit(v) => *v as usize,
125-
_ => panic!("slice ranges can only be integer literals"),
126-
};
127-
quote! {
128-
[#r..#l]
129-
}
141+
// Bare slice outside ExpressionKind::Index context (should not
142+
// occur in well-typed programs per hlir.rs validation). No
143+
// field width available, so no byte-reversal adjustment.
144+
self.generate_slice(begin, end, 0)
130145
}
131146
ExpressionKind::Call(call) => {
132147
let lv: Vec<TokenStream> = call
@@ -158,6 +173,76 @@ impl<'a> ExpressionGenerator<'a> {
158173
}
159174
}
160175

176+
/// Generate a bitvec range for a P4 bit slice `[hi:lo]`.
177+
///
178+
/// # Confused-endian byte reversal
179+
///
180+
/// header.rs stores multi-byte fields (width > 8 bits) with bytes
181+
/// reversed relative to wire order. For a 32-bit field containing
182+
/// wire bytes `[0xE0, 0x01, 0x01, 0x01]` (224.1.1.1), the bitvec
183+
/// holds `[0x01, 0x01, 0x01, 0xE0]`. Bit positions within each
184+
/// byte are preserved (Msb0) and only the byte order is flipped.
185+
///
186+
/// P4 uses MSB-first bit numbering: in a W-bit field, bit W-1 is
187+
/// the MSB (first on wire) and bit 0 is the LSB. A naive mapping
188+
/// from P4 bit `b` to bitvec index `W-1-b` is correct for wire
189+
/// order but wrong after byte reversal.
190+
///
191+
/// The correct mapping for a reversed W-bit field:
192+
///
193+
/// ```text
194+
/// wire_idx = W - 1 - b // P4 bit -> wire-order bitvec index
195+
/// wire_byte = wire_idx / 8 // which byte on the wire
196+
/// bit_in_byte = wire_idx % 8 // position within that byte (Msb0)
197+
/// storage_byte = W/8 - 1 - wire_byte // byte reversal
198+
/// bitvec_idx = storage_byte * 8 + bit_in_byte
199+
/// ```
200+
///
201+
/// # Why full-byte MSB-aligned slices worked without this previously
202+
///
203+
/// For a slice like `[127:120]` on a 128-bit field (extracting wire
204+
/// byte 0), the mapping sends byte 0 to storage byte 15, producing
205+
/// bitvec range `[120..128]`. The naive codegen also emits
206+
/// `[120..128]`. The numeric coincidence holds whenever the slice
207+
/// spans complete bytes at the MSB end: the reversed byte position
208+
/// `W - 8 - start` equals the original `end` index. This breaks
209+
/// for sub-byte slices (e.g. nibble extraction) or slices not
210+
/// aligned to the MSB boundary.
211+
///
212+
/// This confusion was discovered while implementing multicast support.
213+
/// Multicast routing classifies packets by extracting the top nibble
214+
/// of `ipv4.dst` via `ipv4.dst[31:28]` to check for the 0xE prefix
215+
/// (IPv4 multicast range 224.0.0.0/4). The naive codegen emitted
216+
/// `bitvec[28..32]`, which reads the low nibble of storage byte 3
217+
/// (= 0x0 for 224.x.x.x) instead of the high nibble (= 0xE). The
218+
/// correct range after byte-reversal adjustment is `bitvec[24..28]`.
219+
fn generate_slice(
220+
&self,
221+
begin: &Expression,
222+
end: &Expression,
223+
field_width: FieldWidth,
224+
) -> TokenStream {
225+
let hi: P4Bit = match &begin.kind {
226+
ExpressionKind::IntegerLit(v) => *v as usize,
227+
_ => panic!("slice ranges can only be integer literals"),
228+
};
229+
let lo: P4Bit = match &end.kind {
230+
ExpressionKind::IntegerLit(v) => *v as usize,
231+
_ => panic!("slice ranges can only be integer literals"),
232+
};
233+
234+
if field_width > 8 {
235+
let (r, l) = reversed_slice_range(hi, lo, field_width);
236+
quote! { [#r..#l] }
237+
} else {
238+
// Fields <= 8 bits are not byte-reversed by header.rs,
239+
// so the naive P4-to-bitvec mapping is correct.
240+
let l = hi + 1;
241+
let r = lo;
242+
quote! { [#r..#l] }
243+
}
244+
}
245+
161246
pub(crate) fn generate_bit_literal(
162247
&self,
163248
width: u16,
@@ -223,3 +308,150 @@ impl<'a> ExpressionGenerator<'a> {
223308
}
224309
}
225310
}
311+
312+
/// P4 bit position (MSB-first index within a field).
313+
type P4Bit = usize;
314+
315+
/// Width of a P4 header field in bits.
316+
type FieldWidth = usize;
317+
318+
/// Half-open bitvec range `(start, end)` into the storage representation.
319+
type BitvecRange = (usize, usize);
320+
321+
/// Compute the bitvec range `(start, end)` for a P4 slice `[hi:lo]` on a
322+
/// byte-reversed field of the given width.
323+
///
324+
/// header.rs reverses byte order for multi-byte fields. The mapping from
325+
/// P4 bit positions to storage positions depends on whether the slice
326+
/// stays within a single wire byte or spans multiple bytes.
327+
///
328+
/// For a single-byte slice, we map each endpoint through the byte reversal:
329+
/// the target byte moves to a new position but bit ordering within the byte
330+
/// is preserved (Msb0). This handles sub-byte nibble extractions like
331+
/// `ipv4.dst[31:28]`.
332+
///
333+
/// For a multi-byte slice, the byte reversal makes the endpoints
334+
/// non-contiguous (byte 0 maps to the far end, byte 1 maps next to it,
335+
/// etc.). However, if the slice is byte-aligned, the reversed bytes form
336+
/// a contiguous block at a different offset. We compute the storage byte
337+
/// range directly. Non-byte-aligned multi-byte slices cannot be represented
338+
/// as a single contiguous range after reversal and will panic.
339+
fn reversed_slice_range(
340+
hi: P4Bit,
341+
lo: P4Bit,
342+
field_width: FieldWidth,
343+
) -> BitvecRange {
344+
// Wire byte indices for the slice endpoints. P4 bit W-1 is in wire
345+
// byte 0 (MSB-first), so higher bit numbers map to lower byte indices.
346+
let wire_byte_hi = (field_width - 1 - hi) / 8;
347+
let wire_byte_lo = (field_width - 1 - lo) / 8;
348+
349+
if wire_byte_hi == wire_byte_lo {
350+
// Single-byte slice: map each endpoint individually.
351+
let map_bit = |bit_pos: usize| -> usize {
352+
let wire_idx = field_width - 1 - bit_pos;
353+
let wire_byte = wire_idx / 8;
354+
let bit_in_byte = wire_idx % 8;
355+
let storage_byte = field_width / 8 - 1 - wire_byte;
356+
storage_byte * 8 + bit_in_byte
357+
};
358+
359+
let mapped_hi = map_bit(hi);
360+
let mapped_lo = map_bit(lo);
361+
(mapped_hi.min(mapped_lo), mapped_hi.max(mapped_lo) + 1)
362+
} else {
363+
// Multi-byte slice: the HLIR rejects non-byte-aligned cases
364+
// during validation.
365+
assert!(
366+
(hi + 1).is_multiple_of(8) && lo.is_multiple_of(8),
367+
"non-byte-aligned multi-byte slice [{hi}:{lo}] on \
368+
{field_width}-bit field reached codegen",
369+
);
370+
371+
// Reversed storage bytes form a contiguous block.
372+
let storage_byte_start = field_width / 8 - 1 - wire_byte_lo;
373+
let storage_byte_end = field_width / 8 - 1 - wire_byte_hi;
374+
(storage_byte_start * 8, (storage_byte_end + 1) * 8)
375+
}
376+
}
377+
378+
#[cfg(test)]
379+
mod test {
380+
use super::*;
381+
382+
// Verify the reversed slice range mapping against the byte reversal
383+
// in header.rs. For each case we check that the bitvec range lands
384+
// on the correct bits in the reversed storage layout.
385+
386+
// Sub-byte slices within a single wire byte.
387+
388+
#[test]
389+
fn slice_32bit_top_nibble() {
390+
// P4 [31:28] on 32-bit: top nibble of wire byte 0.
391+
// Storage: wire byte 0 -> storage byte 3.
392+
// High nibble of storage byte 3 = bitvec [24..28].
393+
assert_eq!(reversed_slice_range(31, 28, 32), (24, 28));
394+
}
395+
396+
#[test]
397+
fn slice_32bit_bottom_nibble() {
398+
// P4 [3:0] on 32-bit: bottom nibble of wire byte 3.
399+
// Storage: wire byte 3 -> storage byte 0.
400+
// Low nibble (Msb0) of storage byte 0 = bitvec [4..8].
401+
assert_eq!(reversed_slice_range(3, 0, 32), (4, 8));
402+
}
403+
404+
#[test]
405+
fn slice_16bit_top_nibble() {
406+
// P4 [15:12] on 16-bit: top nibble of wire byte 0.
407+
// Storage: wire byte 0 -> storage byte 1.
408+
// High nibble of storage byte 1 = bitvec [8..12].
409+
assert_eq!(reversed_slice_range(15, 12, 16), (8, 12));
410+
}
411+
412+
// Full-byte slices (single byte).
413+
414+
#[test]
415+
fn slice_128bit_top_byte() {
416+
// P4 [127:120] on 128-bit: wire byte 0 -> storage byte 15.
417+
// bitvec [120..128].
418+
assert_eq!(reversed_slice_range(127, 120, 128), (120, 128));
419+
}
420+
421+
#[test]
422+
fn slice_16bit_low_byte() {
423+
// P4 [7:0] on 16-bit: wire byte 1 -> storage byte 0.
424+
// bitvec [0..8].
425+
assert_eq!(reversed_slice_range(7, 0, 16), (0, 8));
426+
}
427+
428+
#[test]
429+
fn slice_32bit_middle_byte() {
430+
// P4 [23:16] on 32-bit: wire byte 1 -> storage byte 2.
431+
// bitvec [16..24].
432+
assert_eq!(reversed_slice_range(23, 16, 32), (16, 24));
433+
}
434+
435+
// Multi-byte byte-aligned slices.
436+
437+
#[test]
438+
fn slice_128bit_top_two_bytes() {
439+
// P4 [127:112] on 128-bit: wire bytes 0-1 -> storage bytes 14-15.
440+
// bitvec [112..128].
441+
assert_eq!(reversed_slice_range(127, 112, 128), (112, 128));
442+
}
443+
444+
#[test]
445+
fn slice_32bit_top_three_bytes() {
446+
// P4 [31:8] on 32-bit: wire bytes 0-2 -> storage bytes 1-3.
447+
// bitvec [8..32].
448+
assert_eq!(reversed_slice_range(31, 8, 32), (8, 32));
449+
}
450+
451+
#[test]
452+
fn slice_32bit_bottom_two_bytes() {
453+
// P4 [15:0] on 32-bit: wire bytes 2-3 -> storage bytes 0-1.
454+
// bitvec [0..16].
455+
assert_eq!(reversed_slice_range(15, 0, 32), (0, 16));
456+
}
457+
}

0 commit comments

Comments
 (0)