Why QoS? Because we have more data/traffic on the wire then there is capacity Designs typically use an oversubscribe model 10:1, 40:1, etc... More data on the line that we have the capability to switch it Undertand you business Requirements HPC/Grid Computing RDMA/RoCE, iWARP (RDMA over WAN) HFT High Frequency/Algorithmic Trading Nanoseconds now matter in QoS (e.g. 500ns late is a problem) Storage/Virtualized Data Center FCOE, iSCSI, NFS, CIFS, vMotion, Voice, and Video (Jumbo Frames) MSDC (Google and Amazon) Massively Scaled Data Center Vecoming more common Hadoop/HDFS/IncastTCP (highly oversubscribed) Data Center TCP using ECN (Explicit Congestion Notifier) Understanding your applications 6500 Series QoS Was developed by the module group, you could have different QoS commands for a 6148 vs a 6748 or even different generation of line cards QoS is off by default Nothing is trusted Nexus QoS is on by default everything is trusted at first Classify at the ingress Queue mapping has to be done at Layer 2 COS Granular hardware ASIC queuing capabilities, but all the CLI now gets standardized Functions stay the same - CLI Changes - but only once Data Needs for QoS Voice: 10ms Tc (Time Interval Value) small, not bursty, time sensitive and drop sensitive 150ms RTT Jitter CoS 5, DECP EF (CoS 3 Signaling) Video: 33ms Tc, big, variable data rate, time sensitive, drop sensitive CoS 4, DSCP CS5 , CS4, AF4x, AF3x (CoS 3 Signaling) FCOE: Consistent, giant frames, somewhat time sensitive, and drop intolerant (2112 MTU) T11-FC-BB5 CoS 3 Recommendation no DSCP value not L3 New QoS Protocols Data Center Bridging Exchange (DCBX) Uses LLDP with new TLV (type layer value) fields Standard progressed from CIN (Cisco, Intel, Nuova) then CEE (Converged Enhanced Ethernet), and finally standard IEEE DCBX Proper negotiation between Switch-to-Switch and Switch-to-CNA resfults in PFC, ETS, CoS values, per-priority-pause frame support as well as vFC interface bring-up "sh lldp dcbx int e1/1" 802.1Qbb Priority Flow Control (PFC) Pause Frames Pause frame basically says "My Rx buffer is full buffer your Tx buffer" Historically recommendation has been to use TCP drop mechanism Rather then PAUSE TCP is resilient and can tolerate drops. SCSI drops are devastating and and cause lost data and OS stops FC used something called buffer-to-buffer credits for lossless architecture FC moving to Ethernet net (FCOE) deal with Ethernet's underlying capabilities and short comings With PFC, PAUSE frame are not based on CoS value Provide a unique "service lane" for FCOE traffic larger buffers Results in "loseless" or "no-drop" behavior for SCSI/FCOE 802.1Qaz Enhanced Transmission Selection (ETS) CB-Queuing or QoS within Cisco Strict Priority ad defined in 802.1p Credit-Based shaping as defined in 802.1Qav (think token bucket) Set amount of BW on an interface in times of congestion Traffic is based on classes which are based on CoS priorities Ability to place multiple CoS values in a class or queue, bot no more than 8 classes (3 bits 0-7) Industry standard means that now CNA's participate Queuing form the server up to the switch 8021Qau Explicit Congestion Notification (ECN or CN) Data Center TCP largely driven by Google Uses bits in DiffServ field to allow switch to markup TCP and indicate that interfaces are becoming congesting Uses RED to detect to mark ECN rather than drop Uses LSB in DSCP 0x00 Non-ECN Capable Transport 0x10 ECN-Capable Transport and No Congestion Encountered 0x11 ECN-Capable Transport and Congestion Encountered Order of QoS Operations in Nexus Port ASIC or SoC (Switch on Chip) EARL is Cisco’s ‘Enhanced Address Recognition Logic’ This is the Fabric – Supervisor decisions pushed down into Fabric Cards Nexus 7000 performs different functions at different locations Ingress Port ASIC Performs Ingress Queuing and Scheduling CoS-to-Q Mapping Bandwidth Allocation (DWRR) Buffer Allocation (Memory) Congestion Avoidance (WRED) Set CoS Value Ingress to EARL CoS/DSCP Mutation Classification ACL/SMAC/DMAC/SA/DA/L4 ports/CoS/DSCP Marking DSCP/QoS Group/Discard Class Policing 1-Rate/2-Color, 2-Rate/3-Color, Aggregate/Flow/Shared Actions to Drop/Transmit/ReMark/Markdown Set QoS Group, Set Discard Class M series modules share 4 Ports per ASIC and F Series Share 2 Ports per ASIC F Series Modules primarily do Ingress gueueing (Completely new Mod) smaller buffers(memory) on the egress Larger buffers(memory) on the ingress Same on the Nexus 5K's This type of queueing architecture that Cisco is going with Think Hadoop 1000's of request to 100's of servers M Series module primarily do Egress queueing (Extension of Cat6K Mod) larger buffers(memory) on the egress Smaller buffers(memory) on the ingress Use Memory to store frames 7K's Take the forwarding from the Supervisor and write it to the individual line cards. This allows for greeter throughput and granularity . Distributed forwarding is sent to the SoC (Switch on Chip ASIC) to each line card Egress to EARL Classification ACL/SMAC/DMAC/SA/DA/L4 ports/CoS/DSCP • QoS Group, Discard Class Marking DSCP/QoS Group/Discard Class Policing 1-Rate/2-Color, 2-Rate/3-Color, Aggregate/Flow/Shared Actions to Drop/Transmit/ReMark/Markdown CoS/DSCP Mutation Performs Ingress Queuing and Scheduling CoS-to-Q Mapping Bandwidth Allocation Buffer Allocation Congestion Avoidance (WRED and Tail-Drop based on Thresholds Priority Queuing SRR (Shaped Round Robin) will disable Priority Queue Switch and Module Buffering What is buffering? Storing frames in memory until wire/switch is ready to Tx/Rx them Ingress Buffering vs Egress Buffering Historically, we’ve always been able to do both, but emphasis has been placed on egress buffering Now we want incredibly powerful and fast switches, but we don’t want to drive up cost (any more than is necessary) Egress buffering is very costly Ingress buffering drives down cost per port (less SRAM needed at ingress to create aggregate buffers), less power, less heat Statistically speaking, ingress queuing provides the same advantages as a shared memory buffer architecture Problem with Ingress buffering is introduced If I am buffering (necessary) traffic destined for a congested egress port, but a frame comes in behind my buffer that is destined for a non- congested egress port – that frame has to wait until buffer is emptied to be forwarded across fabric to egress port It is (Head of Line Blocked) HoLB To mitigate this, we utilize something called the Virtual Output Queue (VoQ) 8 virtual output queues for every egress port for unicast, and 8 more per egress port for multicast – per ingress port If you had 256 ports in a system, this means 2048 VoQs per ingress port (these are pointer lists, not physical buffers)’ Ingress Buffering Devices Nexus F-series line cards L2 only and built for fabric performance Nexus 5000/5500 Egress Buffering Devices Nexus 7000 M-series line cards L3 and rich feature set Basically next-gen Cat6K linecards Queuing structure is largely based on Cat6K linecard model e.g.1p3q2t, 1p7q7t, etc (p= priority queue, q= standard queue, t= trail drop thresholds) CoS, DSCP, Trust and Trust Boundaries Catalyst IOS – Not Trusted by default Nexus NX-OS – Trusted by default Trust boundary is also changing Now we assume that either the CNAs, Phone, TP Video Endpoints, or other switches are appropriately marking We can still remark if need be In NX-OS, we primarily use CoS Why? We now have non-IP based traffic FCoE, RoCE Disadvantage is now we have less granularity with service lanes Mappings and Mutations are certainly possible Defaults for 7K: Bridge Unicast = CoS trusted, DSCP preserved Routed Unicast = CoS copied from 3-Most significant bit's of ToS e.g. DSCP 101(CoS Mapping) 110 > CoS 101 Routed Multicast = CoS copied from 3-MSB of ToS Bridge Multicast with L3 state for group = CoS copied from 3-MSB of ToS Bridge Multicast with no L3 state for group = CoS trusted, DSCP preserved CoS/DSCP to Queue & Threshold Mapping Today in Nexus you can: Perform CoS-to-Q Mapping Perform DSCP-to-Threshold Mapping Today in Nexus you cannot: Perform DSCP-to-Q Mapping (like you can in Cat IOS) QoS Configuration and Comparisons Catalyst IOS uses ‘MLS’ nomenclature along with some MQC Nexus NX-OS everything uses ‘MQC’ nomenclature Still 3-step model with Class-Maps (which go into) Policy-Maps (which are applied with) Service-Policies (Applied to interfaces) But no 3 type of Class-Maps/Policy-Maps (actually 4 when you count CoPP) QoS = Configures classification/marking Queuing = configures port-based hardware queuing Same as Cas-OS mls qos srr-queue mls qos queue-set output buffers mls qos-queue-set output threshold Network QoS = configures system wide fabric queuing Queuing inside the fabric QoS Group (NeW!!!) Allows for simple mapping e.g. map one or more classes of traffic to a QoS-Group, then apply QoS-Group to Queue When configuring "queuing" (or CoPP) type, you have to configure in the default VDC which could momentarily disrupt traffic Type QoS Defines traffic classification Type Queuing Here is where we configure what will affect system wide cards with these ASIC capabilities Strict Priority Queue DWWR "show eth*/* capabilities" You will notice a pause when applying a cos-value to a Queue This is the switch writing that instruction into the hardware SoC ASIC Sanity check is performed when applied to an interface Queuing Attributes in Policy-Map Priority (level) - defines queue as priority queue Bandwidth - defines WRR weights for each queue Shape - defines SRR weights for each queue enabling shaping disables PQ for that port Queue-Limit - Defines Q size/depth and defines tail-drop thresholds Random-Detect - defines WRED thresholds per Queue Tail-Drop and WRED are mutually exclusive on a per-Q basis Type Network QoS System class characteristics Drop/NO-Drop MTU (layer 2)(2112 FCOE, 1500 Ethernet, 9000 iSCI) Buffer Size Marking Three attache points for Policy Types Think of this how the frame forwards across the switch Ingress Port, Crossbar Fabric, Egress Port Ingress Interfaces QoS Type Queuing Type Unifed Crossbar Fabric QoS Type Queuing Type Network-QoS Type Egress Interface Queuing Type 9 Step Configuration Process
Hardware Specifics N5K has a 4 system default QoS Group classes Two for Control COS 6 & 7 Class-Default FCOE Have 4 left to play with N55K No FCoE denied as default QoS-Group This allow's memory/buffer allocation if not using FCOE If you are using FCOE you must allocate this to the QoS group N55K L3 Daughter Card (DON'T USE) When traffic passes up through L3 Daughter Card the CoS is lost Must reclassify/remark or Voice, Video, Signaling traffic could end up in the default Queue If you always set the CoS value even if your are just calling a class-map from a policy -map where you had just classified based on CoS you could steer clear of this issue Nexus 5K/7K FCoE Treatment FCoE is CoS 3, but Nexus 5K matches on both CoS 3 and EtherType for FIP and FCoE FCoE = 0x8906 FIP = 0x8914 There is a hardware override to ensure they make it to QoS-Group 1 You can’t get FCoE out of QoS-Group 1 You can misconfigure and put other things (like signaling) in qos-group 1 You can tune FCoE ONLY for Distance 300m is default mtu 2158 pause no-drop 3km example mtu 2158 pause no-drop buffer-size 152000 pause-threshold 103360 resume-threshold 83520 L2 only line card - not a switch Match on CoS values only Nexus 7K FCoE is only supported on F series line cards Existing, non-changeable policy templates for FCoE Catalyst QoS vs. Nexus QoS Enable and Trust Catalyst manually turn on and trust Nexus on and trusted by default COS only in a 802.1Q trunk 3 bit value 0 Best Effort 1 Scavenger 2 3 FCOE (voice and video signaling) 4 Video 5 Voice 6 Reserved Layer 3 IGP Intra network control 7 Reserved Network over LAN Spanning Tree 12 Class Model DSCP Layer 3 ToS 12 Bit Value EF Voice Router primarily do EGREES QOS Done is software Less Queues Switching Done is hardware Queues Typically done in hardware Remember that QoS does not come into effect unless there is congestion on the network |
Blog >