- Azure Local - Two machines switchless dual link network architecture
- ToR: Top of Rack network switch
- p-NIC: Physical Network Card attached to a node.
- v-Switch: Virtual Switch configured on the Azure Local Cluster.
- VLAN: Virtual Local Area Network.
- SET: Switch Embedded Teaming, supporting switch-independent teaming.
- MLAG: Multi-Chassis Link Aggregation, a technique that lets two or more network switches work together as if they were one logical switch.
- Border Router: Uplink device with the ToR switches, providing routing to endpoints external to the Azure Local environment.
- AS: Autonomous System number used to define a BGP neighbor.
- Make: Cisco
- Model: Nexus 98180YC-FX
- Firmware: 10.3(4a)
This document assists administrators in designing a network architecture that aligns with Azure Local cluster requirements. It provides reference architectures and sample configurations for network devices supporting cluster deployments. Equipment such as switches, firewalls, or routers located on customer premises is considered out of scope, as it is assumed to be part of the existing infrastructure. The focus is on Node-to-ToR, ToR-to-ToR, and ToR uplink configurations.
This solution deploys a two-node environment using an Azure Local switchless storage architecture. Storage traffic is directly connected between the two nodes, bypassing network switches to reduce latency and complexity for storage workloads. The ToR switches are dedicated to compute and management traffic, providing redundancy and high availability for cluster operations. Compute and management p-NICs are configured with Switch Embedded Teaming (SET), enabling switch-independent teaming and seamless integration with the physical network.
This approach ensures storage traffic is isolated and optimized through direct node-to-node connections, while compute and management traffic benefit from the resiliency and scalability of dual ToR switches and MLAG/vPC configurations.
The two-node converged networking configuration has the following attributes:
- Each node is equipped with two physical network interface cards, each with two physical interfaces (p-NICs).
- p-NICs A and B handle both compute and management traffic.
- p-NICs A and B are configured as part of a Switch Embedded Teaming (SET) team, transmitting compute and management traffic using VLAN tags. These NICs are assigned to a virtual switch (v-Switch) to support multiple network intents.
- p-NICs C and D are dedicated to storage traffic and support RDMA protocol workloads.
- p-NICs C and D are directly connected between node 1 and node 2, with traffic transmitted using VLAN tags over this direct connection.
| Device | Interface | Device | Interface | |
|---|---|---|---|---|
| Node1 | p-NIC A | <==> | TOR1 | Ethernet1/1 |
| Node1 | p-NIC B | <==> | TOR2 | Ethernet1/1 |
| Node1 | p-NIC C | <==> | Node2 | p-NIC C |
| Node1 | p-NIC D | <==> | Node2 | p-NIC D |
| Device | Interface | Device | Interface | |
|---|---|---|---|---|
| Node2 | p-NIC A | <==> | TOR1 | Ethernet1/2 |
| Node2 | p-NIC B | <==> | TOR2 | Ethernet1/2 |
| Node2 | p-NIC C | <==> | Node1 | p-NIC C |
| Node2 | p-NIC D | <==> | Node1 | p-NIC D |
| Device | Interface | Device | Interface | |
|---|---|---|---|---|
| TOR1 | Ethernet1/1 | <==> | Node1 | p-NIC A |
| TOR1 | Ethernet1/2 | <==> | Node2 | p-NIC A |
| TOR1 | Ethernet1/41 | <==> | TOR2 | Ethernet1/41 |
| TOR1 | Ethernet1/42 | <==> | TOR2 | Ethernet1/42 |
| TOR1 | Ethernet1/47 | <==> | Border1 | Ethernet1/x |
| TOR1 | Ethernet1/48 | <==> | Border2 | Ethernet1/x |
| TOR1 | Ethernet1/49 | <==> | TOR2 | Ethernet1/49 |
| TOR1 | Ethernet1/50 | <==> | TOR2 | Ethernet1/50 |
| TOR1 | Ethernet1/51 | <==> | TOR2 | Ethernet1/51 |
| Device | Interface | Device | Interface | |
|---|---|---|---|---|
| TOR2 | Ethernet1/1 | <==> | Node1 | p-NIC B |
| TOR2 | Ethernet1/2 | <==> | Node2 | p-NIC B |
| TOR2 | Ethernet1/41 | <==> | TOR1 | Ethernet1/41 |
| TOR2 | Ethernet1/42 | <==> | TOR1 | Ethernet1/42 |
| TOR2 | Ethernet1/47 | <==> | Border1 | Ethernet1/x |
| TOR2 | Ethernet1/48 | <==> | Border2 | Ethernet1/x |
| TOR2 | Ethernet1/49 | <==> | TOR1 | Ethernet1/49 |
| TOR2 | Ethernet1/50 | <==> | TOR1 | Ethernet1/50 |
| TOR2 | Ethernet1/51 | <==> | TOR1 | Ethernet1/51 |
- Two physical switches are used as Top of Rack (ToR) devices (Cisco Nexus 93180YC-FX).
- Management and compute p-NICs connect to the ToR devices.
- Storage p-NICs do not connect to the ToR devices.
- The ToRs are configured in MLAG for redundancy.
- The core network (switch/router/firewall) is considered out of scope.
Switch QoS is not required in this configuration, as storage traffic is direct node-to-node. QoS is typically only needed if storage interfaces connect to a switch.
Compute and management intents are separated into VLANs (e.g., 7 and 8). These VLANs are configured as Layer 3 SVIs on both ToR switches. MLAG is used to support east-west traffic for compute and management.
Enable the following features on Cisco Nexus 93180YC-FX to support this example Azure Local environment:
Sample Cisco Nexus 93180YC-FX Configuration:
feature bgp
feature interface-vlan
feature hsrp
feature lacp
feature vpc
feature lldpIn this configuration, VLANs are used to logically separate management and compute traffic within the Azure Local environment:
- VLAN 7 is designated for management traffic. It is configured as the native VLAN on trunk ports and supports Azure Local backend services, including cluster management and infrastructure communication.
- VLAN 8 is assigned for compute traffic, which carries customer or tenant workloads.
Both VLANs are configured as Layer 2 and Layer 3 interfaces, with each assigned to a Switch Virtual Interface (SVI). The gateway IP address for each VLAN SVI ends in ".1" to provide a consistent and predictable default gateway for devices within each subnet.
Hot Standby Router Protocol (HSRP) is implemented on both VLAN interfaces to provide gateway redundancy and high availability. HSRP version 2 is used, and each VLAN is assigned an HSRP group number that matches its VLAN ID for maintainablity. The HSRP configuration ensures that if one ToR switch becomes unavailable, the other can immediately take over gateway responsibilities, maintaining uninterrupted network connectivity for both management and compute networks.
The VLAN Maximum Transmission Unit (MTU) is set to 9216 bytes to support jumbo frames, which is required for SDN workloads. If SDN is not used, the default MTU (typically 1500 bytes) is sufficient, and customers should select the MTU size based on their workload requirements.
Sample Cisco Nexus 93180YC-FX Configuration:
vlan 7
name Management_7
vlan 8
name Compute_8
!
interface Vlan7
description Management
no shutdown
mtu 9216
no ip redirects
ip address 10.101.176.2/24
no ipv6 redirects
hsrp version 2
hsrp 7
priority 150 forwarding-threshold lower 1 upper 150
ip 100.101.176.1
!
interface Vlan8
description Compute
no shutdown
mtu 9216
no ip redirects
ip address 10.101.177.2/24
no ipv6 redirects
hsrp version 2
hsrp 8
priority 150 forwarding-threshold lower 1 upper 150
ip 100.101.177.1The following configuration example demonstrates how to configure the ToR switch interfaces that connect directly to the Azure Local cluster nodes. These interfaces are responsible for carrying both management and compute traffic between the nodes and the network infrastructure.
Key Configuration Elements:
-
Interface Assignment:
Ethernet1/1andEthernet1/2are connected to Node 1 and Node 2. Each interface is clearly labeled with a descriptive name. -
Trunk Mode with VLAN Tagging:
Both interfaces are configured as trunk ports, allowing them to carry multiple VLANs (VLAN 7 for management and VLAN 8 for compute). The native VLAN is set to 7, which is typically used for management traffic. This setup enables the use of Switch Embedded Teaming (SET) on the host side, supporting multiple network intents over the same physical connection. Additional compute VLANs tags can be included to support tenant workloads. -
Spanning Tree and Edge Port Configuration:
Thespanning-tree port type edge trunkcommand is applied to optimize convergence times and protect against accidental loops, as these ports connect directly to servers rather than other switches. -
Performance Optimization:
The MTU is set to 9216 bytes to support jumbo frames, which is recommended for Azure Local and SDN workloads to maximize throughput and reduce CPU overhead. The interface speed is explicitly set to 10 Gbps to match the physical NIC capabilities. -
Operational Enhancements:
CDP (Cisco Discovery Protocol) is disabled to minimize unnecessary protocol traffic, and link status logging is enabled to monitor and record changes in the status of server-facing ports.
Sample Cisco Nexus 93180YC-FX Interface Configuration:
interface Ethernet1/1
description AzLocalNode1
no cdp enable
switchport
switchport mode trunk
switchport trunk native vlan 7
switchport trunk allowed vlan 7-8
spanning-tree port type edge trunk
mtu 9216
speed 10000
logging event port link-status
no shutdown
!
interface Ethernet1/2
description AzLocalNode2
no cdp enable
switchport
switchport mode trunk
switchport trunk native vlan 7
switchport trunk allowed vlan 7-8
spanning-tree port type edge trunk
mtu 9216
speed 10000
logging event port link-status
no shutdownSpanning Tree Configuration:
The configuration begins by enabling BPDU Guard on all edge ports to protect against accidental network loops. Multiple Spanning Tree (MST) is used to support multiple VLANs, with specific priorities set for each MST instance to influence root bridge selection. The MST region is defined with a unique name and revision, and VLANs are mapped to specific MST instances for optimal traffic management.
vPC Domain and Peer Link: The vPC domain is established to allow the two ToR switches to operate as a single logical switch for downstream devices. Key parameters such as role priority, peer-keepalive (for monitoring the health of the vPC peer), and auto-recovery are configured to enhance resiliency. The peer-gateway feature is enabled to allow seamless gateway operations across both switches.
Port-Channel and Interface Configuration:
- Port-Channel 50 is configured as a dedicated Layer 3 link for vPC heartbeat and iBGP communication between the ToR switches, with jumbo frames enabled for high-performance workloads.
- Port-Channel 101 serves as the vPC peer link, configured as a trunk to carry multiple VLANs, with priority flow control and spanning tree network port settings applied for reliability and performance.
Physical Interface Bundling: Interfaces Ethernet1/49, Ethernet1/50, and Ethernet1/51 are bundled into Port-Channel 101 using LACP in active mode. These interfaces are configured as trunk ports, with the native VLAN set to 99 and priority flow control enabled. Logging is enabled for link status changes, and CDP is disabled to reduce unnecessary protocol traffic.
Cisco Nexus 93180YC-FX Configuration:
spanning-tree port type edge bpduguard default
spanning-tree mst 0-1 priority 8192
spanning-tree mst 2 priority 16384
spanning-tree mst configuration
name AzureStack
revision 1
instance 1 vlan 1-1999
instance 2 vlan 2000-4094
vpc domain 1
role priority 1
peer-keepalive destination 10.100.19.18 source 10.100.19.17 vrf default
delay restore 150
peer-gateway
auto-recovery
interface port-channel50
description VPC:Heartbeat_P2P_IBGP_Link
logging event port link-status
mtu 9216
ip address 10.10.19.17/30
interface port-channel101
description VPC:MLAG_PEER
switchport
switchport mode trunk
switchport trunk native vlan 99
priority-flow-control mode on
spanning-tree port type network
logging event port link-status
vpc peer-link
interface Ethernet1/49
description MLAG_Peer
no cdp enable
switchport
switchport mode trunk
switchport trunk native vlan 99
priority-flow-control mode on
logging event port link-status
channel-group 101 mode active
no shutdown
interface Ethernet1/50
description MLAG_Peer
no cdp enable
switchport
switchport mode trunk
switchport trunk native vlan 99
priority-flow-control mode on
logging event port link-status
channel-group 101 mode active
no shutdown
interface Ethernet1/51
description MLAG_Peer
no cdp enable
switchport
switchport mode trunk
switchport trunk native vlan 99
priority-flow-control mode on
logging event port link-status
channel-group 101 mode active
no shutdownThe configuration begins by defining the router's BGP process under autonomous system 64511, using a static router-id assigned to a loopback interface. This ensures a stable identifier, crucial for consistent BGP operation even if physical interfaces change state. The command bestpath as-path multipath-relax allows the router to consider multiple paths to the same destination—even when AS paths are not identical—thus providing greater flexibility in equal-cost multipath routing. The log-neighbor-changes command ensures that any changes in neighbor status are logged, aiding in operational troubleshooting.
Under the IPv4 unicast address family, several networks are explicitly advertised, including the loopback (used for the router ID), point-to-point links (for border and port channel connectivity), and VLAN-specific subnets (supporting internal segments such as VLAN7 and VLAN8). The use of maximum-paths 8 (for both eBGP and iBGP with the maximum-paths ibgp command) enables the router to handle up to eight equal-cost paths, enhancing load balancing and redundancy within the network.
Prefix lists are utilized within this configuration to enforce clear routing policies: DefaultRoute Prefix List: This list is designed to only advertise the default route. It explicitly permits the default route (0.0.0.0/0) and then denies any other routes within the same address space, ensuring that no extraneous subnets are erroneously advertised as default routes. FROM-BORDER Prefix List: Applied to incoming BGP updates from border neighbors, this list allows only the default route to be received. It permits 0.0.0.0/0 while denying any other prefixes that fall under the 0.0.0.0/0 umbrella, thereby filtering out unwanted routes from entering the local routing table. TO-BORDER Prefix List: Used when advertising routes to border neighbors, this list prevents the default route from being advertised. The early deny rule (seq 5) filters out the 0.0.0.0/0 prefix, while the subsequent permit rule allows all other routes (any IPv4 prefix with a mask length less than or equal to 32) to be advertised.
Neighbor relationships are then established for three interfaces. Two of these neighbors (with IPs <Border1-IP> and <Border2-IP>) are external peers in AS 64404—each configured with descriptive labels and both employing the FROM-BORDER and TO-BORDER prefix lists within the IPv4 unicast address family. These peers also have a safeguard with maximum-prefix 12000 warning-only to notify if the count of received prefixes nears a risky threshold. The third neighbor, associated with <Port-Channel50-IP>, is an internal (iBGP) peer in AS 64511 and is configured similarly, though without prefix filtering in the outbound direction
Cisco Nexus 93180YC-FX Configuration:
!!! Only advertise the default route
ip prefix-list DefaultRoute seq 10 permit 0.0.0.0/0
ip prefix-list DefaultRoute seq 50 deny 0.0.0.0/0 le 32
!!! Receive BGP Advertisements for 0.0.0.0/0, deny all others.
ip prefix-list FROM-BORDER seq 10 permit 0.0.0.0/0
ip prefix-list FROM-BORDER seq 30 deny 0.0.0.0/0 le 32
!!! Advertise any network except for 0.0.0.0/0
ip prefix-list TO-BORDER seq 5 deny 0.0.0.0/0
ip prefix-list TO-BORDER seq 10 permit 0.0.0.0/0 le 32
router bgp 64511
router-id <Loopback-IP>
bestpath as-path multipath-relax
log-neighbor-changes
address-family ipv4 unicast
network <Loopback-IP>/32
network <Border1-IP>/30
network <Border2-IP>/30
network <Port-Channel50-IP>/30
! VLAN7
network 10.101.176.0/24
! VLAN8
network 10.101.177.0/24
maximum-paths 8
maximum-paths ibgp 8
neighbor <Border1-IP>
remote-as 64404
description TO_Border1
address-family ipv4 unicast
prefix-list FROM-BORDER in
prefix-list TO-BORDER out
maximum-prefix 12000 warning-only
neighbor <Border2-IP>
remote-as 64404
description TO_Border2
address-family ipv4 unicast
prefix-list FROM-BORDER in
prefix-list TO-BORDER out
maximum-prefix 12000 warning-only
neighbor <Port-Channel50-IP>
remote-as 64511
description TO_TOR2
address-family ipv4 unicast
maximum-prefix 12000 warning-onlyThis section of the BGP configuration is tailored to support an Azure Local SDN scenario using VLAN8.
Dynamic BGP Neighbor Definition: A BGP neighbor is defined using the 10.101.177.0/24 subnet, which corresponds to VLAN8 and is reserved for the SLBMUX. The SLBMUX can use any IP address within this subnet, so the configuration specifies the entire subnet as the neighbor. The remote AS is set to 65158, and the neighbor is labeled TO_SDN_SLBMUX for clarity. When a subnet is used as the BGP neighbor, the switch operates in passive mode and waits for the SLBMUX to initiate the BGP connection.
Peering and Connectivity:
neighbor 10.101.177.0/24Defines the BGP neighbor as the entire 10.101.177.0/24 subnet, which is associated with VLAN8. This allows any device within that subnet—such as a SLBMUX to establish a BGP session with the switch, provided it matches the remote AS number.remote-as 65158
Specifies the remote Autonomous System (AS) number that the switch will allow to form a BGP session. In this case, the remote AS is set to 65158, which should match the AS number configured on the Gateway VM. Only eBGP sessions are supported in this configuration.update-source loopback0This ensures that BGP traffic originates from the stable loopback interface on the TOR, which helps maintain consistent peering even if physical interfaces change.ebgp-multihop 3Allows the BGP session to traverse up to three hops, accommodating scenarios where the SLBMUX is not directly connected.prefix-list DefaultRoute outWithin the IPv4 unicast address family for this neighbor, the outbound route policy is governed by the DefaultRoute prefix list. This list is designed to advertise only the default route (0.0.0.0/0) to the SLBMUX. This aligns with the design goal of having the SLBMUX receive only the default route from the switch.maximum-prefix 12000 warning-onlyThis command serves as a safeguard, issuing warnings if the number of received prefixes approaches a set limit, thereby helping maintain stability in the peer session.
Cisco Nexus 93180YC-FX Configuration:
neighbor 10.101.177.0/24
remote-as 65158
description TO_SDN_SLBMUX
update-source loopback0
ebgp-multihop 3
address-family ipv4 unicast
prefix-list DefaultRoute out
maximum-prefix 12000 warning-onlyThere are two primary methods for supporting Layer 3 Forwarding Gateways in Azure Local configurations: BGP and static routing.
With BGP, the Layer 3 Gateway establishes a BGP session with the ToR switch and advertises its V-NET routes directly into the ToR routing table. This dynamic approach allows the routing table to be automatically updated as new networks are added or removed, reducing manual intervention and supporting scalable, automated network operations.
In contrast, static routing requires the Forwarding Gateway to be a member of the VLAN, with the ToR switch manually configured with static routes for each required network. This method is more manual and requires the network team to update the ToR configuration whenever new internal networks are introduced. While BGP is recommended for environments that require flexibility and automation, static routing may be preferred in scenarios where a controlled, predictable routing configuration is needed.
For both methods, the subnet used for the gateway can be much smaller than the examples shown—typically as small as a /30 or /28, depending on the number of required IP addresses.
As mentioned above, BGP provides a dynamic way to add internal networks to the ToR switch routing table. When an administrator adds new networks in the portal, the Layer 3 Gateway advertises these networks to the switch using BGP, allowing the routing table to update automatically.
In the sample configuration below
-
neighbor 10.101.177.0/24
Defines the BGP neighbor as the entire 10.101.177.0/24 subnet, which is associated with VLAN8. This allows any device within that subnet—such as a Layer 3 Gateway VM to establish a BGP session with the switch, provided it matches the remote AS number. -
remote-as 65158
Specifies the remote Autonomous System (AS) number that the switch will allow to form a BGP session. In this case, the remote AS is set to 65158, which should match the AS number configured on the Gateway VM. Only eBGP sessions are supported in this configuration. -
update-source Vlan8
Ensures that BGP peering uses the VLAN8 interface as the source IP address. The source IP can be any address assigned to the VLAN8 configuration. Refer to the VLAN section above for more details. -
ebgp-multihop 5
Allows the BGP session to be established even if the Gateway VM is up to five hops away from the switch. This is useful in scenarios where the VM is not directly connected. -
prefix-list DefaultRoute out
Within the IPv4 unicast address family, this command restricts the switch to only advertise the default route (0.0.0.0/0) to the Layer 3 Gateway. The Gateway VM must receive at least the default route from the ToR switch. -
maximum-prefix 12000 warning-only
Sets a safeguard by issuing a warning if the number of received prefixes approaches 12,000. This helps prevent routing table overload and maintains network stability.
This configuration enables the ToR switch to dynamically learn and advertise routes as the network evolves, reducing manual intervention and supporting scalable, automated network operations.
Cisco Nexus 93180YC-FX Configuration:
neighbor 10.101.177.0/24
remote-as 65158
description TO_L3Forwarder
update-source Vlan8
ebgp-multihop 5
address-family ipv4 unicast
prefix-list DefaultRoute out
maximum-prefix 12000 warning-onlyIn static routing mode, the network team must plan in advance which subnet will be used by the V-NET and which IP address will serve as the gateway for internal routing. For example, in the configuration below, 10.101.177.226 is designated as the gateway VM. This IP address acts as the Layer 3 peering point with the ToR switch and serves as the gateway to the internal subnet 10.68.239.0/24.
It is recommended to configure the required static routes on the ToR switch before deploying the gateway VM. If additional internal networks are needed in the future, the ToR configuration must be updated to include static routes for those networks prior to their deployment. This ensures that traffic destined for the internal subnet is correctly forwarded to the gateway VM, supporting seamless connectivity within the larger Azure Local environment.
This approach is particularly useful in environments where dynamic routing protocols like BGP are not used, or where a more controlled, manual routing configuration is preferred.
Cisco Nexus 93180YC-FX Configuration:
ip route 10.101.177.226/32 10.68.239.0/24