Skip to content

Latest commit

 

History

History
285 lines (233 loc) · 19.9 KB

File metadata and controls

285 lines (233 loc) · 19.9 KB

Azure Local - QoS Policy

Below is a sample Cisco Nexus QoS configuration tailored for Azure Local environments. This policy is designed to ensure that storage (RDMA) and cluster heartbeat traffic are consistently prioritized and protected from congestion, while allowing efficient bandwidth sharing for all other traffic classes. The configuration defines traffic classes, sets bandwidth guarantees, enables congestion management, and configures MTU settings to meet Azure Local requirements.

Implementing QoS is mandatory for Azure Local deployments that support Storage intent workloads on network switches. For QoS to be effective, the policy must be applied consistently across all devices and interfaces that carry Storage intent traffic, ensuring end-to-end protection and performance for critical workloads.

Requirements

  1. Support three CoS values will be utilized within the Azure Local environment, default values are as follows:
    • CoS 3: Storage, also referred to as RDMA.
    • CoS 7: Cluster Heartbeat
    • CoS 0: Default traffic
  2. Support Storage and Cluster heartbeat traffic with Priority Flow Control (802.1Qbb)
    • Establish Storage as a no-drop traffic class.
    • Cluster heartbeat traffic will have the highest Priority to protect against packet loss.
    • Default traffic is the lowest priority, in the event of congestion. Default will be dropped to protect Storage and Cluster.
  3. Bandwidth Reservations utilizing ETS (802.1Qaz)
    • Storage assigned a minimum 50% of the interface bandwidth.
    • Cluster assigned a minimum 1 - 2% of the interface bandwidth. The percentage is based on the Interface speed
      • 10G: 2%
      • 25G or Greater: 1%
  4. Congestion Notification
    • Support for Explicit Congestion Notification (ECN) with Storage traffic.

Azure Local Defaults

Network ATC Data Center Bridging (DCB) and VLAN Defaults

Setting Default Value Description
DCBX Enabled Data Center Bridging Exchange protocol is enabled for LLDP configuration notification only.
Priority Flow Control Enabled PFC (IEEE 802.1Qbb) is enabled for lossless transport on storage traffic.
ETS (Bandwidth) Storage 50%
Cluster 1-2%
Default (Remainder)
Bandwidth reservations
Cluster Heartbeat:
2% if the adapter are <=10Gbps
1% if the adapter are >10 Gbps
ECN Enabled Explicit Congestion Notification is enabled for RDMA/Storage traffic.
VLAN 711
712
Default Storage Intent VLAN assignments. These values can be customized.
CoS (Class of Service) Storage: 3
Cluster: 7
Default: 0
Default CoS values for traffic classification. These values can be customized.

Note

These defaults can be overridden using Network ATC custom settings. For more details, see Manage Network ATC.

In Scope network patterns

This QoS policy is applicable to the following Azure Local deployment models:

  • Fully hyperconverged: Compute, management, and storage traffic all share the same network interface.
  • Disaggregated: Compute and management traffic are assigned to dedicated interfaces, while storage traffic is isolated on its own separate interface.
  • Rack Aware Cluster: Based on Disaggregated design with room to room to storage links.

Out of Scope network patterns

Switchless configurations do not require a switch QoS policy, as the switch is not used to transport storage traffic. In these scenarios, storage traffic is handled directly between endpoints without traversing a network switch, making switch-based QoS settings unnecessary.

QOS Policy Overview

flowchart TD
  A[Packet Ingress]:::ingress --> B{Ingress Queue}:::ingressqueue
  B -- CoS 3 --> C[Class Map: RDMA]:::cos3
  B -- CoS 7 --> D[Class Map: CLUSTER]:::cos7
  B -- Other --> E[Class Map: Default]:::defaultclass

  C --> F[Policy Map<br>Type: qos<br>AZLocal_SERVICES]:::qosmap
  D --> F
  E --> F

  F -- RDMA --> F3[set qos-group 3]:::cos3
  F -- CLUSTER --> G7[set qos-group 7]:::cos7
  F -- Default --> H0[default qos-group 0]:::defaultclass

  F3 --> X[policy-map type network-qos<br>QOS_NETWORK]:::networkqos
  G7 --> X
  H0 --> X

  X -- qos-group 3 --> J[Queue 3<br>RDMA<br>Buffer carving<br>lossless transport]:::cos3
  X -- qos-group 7 --> K[Queue 7<br>Cluster Heartbeat<br>Buffer carving]:::cos7
  X -- qos-group 0 --> L[Default Queue<br>Buffer carving]:::defaultclass

  J --> M{Egress Queue<br>QOS_EGRESS_PORT}:::egressqueue
  K --> M
  L --> M

  M -- Queue 3 --> N[50% Bandwidth<br>WRED/ECN<br>Congestion: Mark]:::cos3
  M -- Queue 7 --> O[1% Bandwidth]:::cos7
  M -- Default --> P[48% Bandwidth<br>Congestion: Drop]:::defaultclass

  N --> Q[Packet Egress]
  O --> Q
  P --> Q

  %% Annotations
  classDef cos3 fill:#e6ffe6,stroke:#2ecc40,stroke-width:2px;
  classDef cos7 fill:#e6e6ff,stroke:#5b5bd6,stroke-width:2px;
  classDef defaultclass fill:#f7f7f7,stroke:#aaaaaa,stroke-width:2px;
  classDef networkqos fill:#fff3e6,stroke:#ff9900,stroke-width:2px;
  classDef qosmap fill:#e6f0ff,stroke:#0074d9,stroke-width:2px;
  classDef egressqueue fill:#fbeeff,stroke:#b300b3,stroke-width:2px;
Loading

ClassMap

class-map type qos match-all RDMA
  match cos 3
class-map type qos match-all CLUSTER
  match cos 7

ClassMap identification is performed by matching the packet's CoS (Class of Service) value. If the CoS value is 3 (for RDMA/Storage) or 7 (for Cluster Heartbeat), the traffic is classified into the corresponding class. All other traffic is automatically assigned to the implicit default class. Matching solely on CoS ensures accurate classification and prevents critical traffic from being misclassified as default. This approach simplifies policy management.

Policy Map (QoS)

policy-map type qos AZLocal_SERVICES
  class RDMA
    set qos-group 3
  class CLUSTER
    set qos-group 7

This policy map assigns classified traffic to internal QoS groups. All other traffic classes not represented are placed in an implicit default class (qos-group 0).

Policy Map (Network QoS)

policy-map type network-qos QOS_NETWORK
  class type network-qos c-8q-nq3
    mtu 9216
    pause pfc-cos 3
  class type network-qos c-8q-nq-default
    mtu 9216
  class type network-qos c-8q-nq7
    mtu 9216

This policy map sets global Layer 2 properties for each traffic class by configuring the MTU and enabling Priority Flow Control (PFC) for storage traffic (CoS 3). The pause pfc-cos 3 command activates PFC on CoS 3, ensuring lossless transport for RDMA and storage traffic. On Cisco NX-OS, this command alone is sufficient to achieve lossless behavior for the specified class, and the no-drop keyword is optional and can be added for clarity if needed. The mtu 9216 command applies a consistent jumbo frame size to all classes, which is recommended for uniformity and optimal support of high-throughput workloads. On Cisco Nexus switches, setting the MTU to 9216 also initiates buffer carving for the ingress queue, which helps optimize buffer allocation for demanding, low-latency applications. Buffer management and MTU configuration may vary on other switch platforms, so it is important to review vendor documentation for platform-specific recommendations.

Policy Map (Queuing)

policy-map type queuing QOS_EGRESS_PORT
  class type queuing c-out-8q-q3
    bandwidth remaining percent 50
    random-detect minimum-threshold 300 kbytes maximum-threshold 300 kbytes drop-probability 100 weight 0 ecn
  class type queuing c-out-8q-q-default
    bandwidth remaining percent 48
  class type queuing c-out-8q-q7
    bandwidth percent 1
  class type queuing c-out-8q-q1
    bandwidth remaining percent 0
  class type queuing c-out-8q-q2
    bandwidth remaining percent 0
  class type queuing c-out-8q-q4
    bandwidth remaining percent 0
  class type queuing c-out-8q-q5
    bandwidth remaining percent 0
  class type queuing c-out-8q-q6
    bandwidth remaining percent 0
  • Only queues 3, 7, and default are actively used in this policy. All other queues are configured with 0% bandwidth and remain unused.
  • Bandwidth reservations are explicitly configured for queues 3 and 7. Queue 3 (RDMA) is guaranteed a minimum of 50% of the interface bandwidth and can use up to 98% if available. When congestion occurs, tail drop is performed and default traffic may be randomly dropped as needed. Queue 7 (Cluster Heartbeat) is reserved 1% of bandwidth for 25G interfaces and 2% for 10G interfaces. This ensures reliable delivery of critical heartbeat traffic.
  • The random-detect ... ecn command enables Explicit Congestion Notification (ECN) marking for congestion management in queue 3 (RDMA traffic). When congestion is detected, the switch marks packets instead of dropping them, which improves performance for lossless traffic.
  • The random-detect minimum-threshold 300 kbytes maximum-threshold 300 kbytes drop-probability 100 weight 0 configuration sets the minimum and maximum queue thresholds for WRED (Weighted Random Early Detection). When the queue depth reaches 300 kbytes, packets are marked or dropped with a probability of 100%. The weight parameter influences how quickly the average queue size responds to changes in traffic, with a lower value making the response immediate. RDMA traffic can spike in micro second bursts and having the immediate response ensure the best protection of the lossless traffic.
  • Because class 3 (RDMA) is configured as lossless, the switch will not drop packets from this class during congestion. Instead, when the interface is congested, packets from the default class will be dropped to maintain lossless delivery for class 3 traffic.

Summary Table

Traffic Type CoS Bandwidth Guarantee Features Enabled MTU Notes
RDMA (Storage) 3 minimum 50% PFC, ECN/WRED 9216 Lossless, congestion-aware
Cluster Heartbeat 7 1% (or 2% for 10G) Dedicated Queue 9216 Strict minimum for reliability
Default/Other - Remaining (48%) - 9216 Shared among all other traffic classes

This policy ensures that storage and cluster heartbeat traffic are always prioritized, minimizing latency and packet loss, while still allowing efficient use of available bandwidth for other traffic types.

policy-map type network-qos QOS_NETWORK
  class type network-qos c-8q-nq3
    mtu 9216
    pause pfc-cos 3
  class type network-qos c-8q-nq-default
    mtu 9216
  class type network-qos c-8q-nq7
    mtu 9216
!
class-map type qos match-all RDMA
  match cos 3
class-map type qos match-all CLUSTER
  match cos 7
!
policy-map type qos AZLocal_SERVICES
  class RDMA
    set qos-group 3
  class CLUSTER
    set qos-group 7
!
policy-map type queuing QOS_EGRESS_PORT
  class type queuing c-out-8q-q3
    bandwidth remaining percent 50
    random-detect minimum-threshold 300 kbytes maximum-threshold 300 kbytes drop-probability 100 weight 0 ecn
  class type queuing c-out-8q-q-default
    bandwidth remaining percent 48
  class type queuing c-out-8q-q7
    bandwidth percent 1
  class type queuing c-out-8q-q1
    bandwidth remaining percent 0
  class type queuing c-out-8q-q2
    bandwidth remaining percent 0
  class type queuing c-out-8q-q4
    bandwidth remaining percent 0
  class type queuing c-out-8q-q5
    bandwidth remaining percent 0
  class type queuing c-out-8q-q6
    bandwidth remaining percent 0

System QoS Application

system qos
  service-policy type queuing output QOS_EGRESS_PORT
  service-policy type network-qos QOS_NETWORK

This applies the defined queuing and network QoS policies globally to all interfaces.

Interface Application of QOS

Example of a storage interface supporting a disaggregated Azure Local environment.

interface Ethernet1/17
  description Storage Intent
  switchport
  switchport mode trunk
  switchport trunk native vlan 99
  switchport trunk allowed vlan 711
  priority-flow-control mode on send-tlv
  spanning-tree port type edge trunk
  mtu 9216
  no logging event port link-status
  service-policy type qos input AZLocal_SERVICES
  no shutdown

In this example, the key points are the use of priority-flow-control and service-policy.

  • priority-flow-control mode on send-tlv: PFC (IEEE 802.1Qbb) allows you to pause traffic on specific CoS (Class of Service) lanes instead of pausing all traffic on the link. This is crucial for lossless Ethernet, especially for storage traffic (like RDMA), which is sensitive to packet loss.
  • service-policy type qos input AZLocal_SERVICES: Applies a QoS policy, which maps storage and cluster traffic to a specific CoS value that PFC will act upon.

Terminology

  • ToR: Top of Rack network switch. Supports Management, Compute, and Storage intent traffic.
  • WRED: Weighted Random Early Detection, a congestion avoidance mechanism used in QoS policies.
  • ECN: Explicit Congestion Notification, a congestion notification mechanism used to mark packets when congestion is encountered in the communication path. A DSCP bit is modified in the packet to identify congestion.
  • RDMA: Remote Direct Memory Access. A technology that enables direct memory access from the memory of one computer into that of another without involving either one's operating system or CPU. This allows for high-throughput, low-latency networking, which is especially beneficial for storage and high-performance computing workloads.

Reference