Blog‎ > ‎

Advanced vPC and Troubleshooting.

posted Nov 12, 2013, 10:21 AM by Rick McGee   [ updated Nov 12, 2013, 1:18 PM ]
vPC Reload Restore....
    In previous versions of NX-OS when when Nexus switch would come online after a power outage (lets say the secondary in the vPC)         and the primary never comes back up. With Reload Restore the secondary Nexus switch will assume the primary role for STP and             LACP functions. The default time delay is 360 seconds (6 min) to wait for the primary to come back online

In NX-OS 5.2(1) 
    vPC auto-recovery replaces Reload-Restore
    This will allow the coverage of multiple failures 
    E.G. peer-link failure , primary switch goes down, both vPC per are reloaded and only one comes back up 
How it works 
    If peer-link is down on secondary switch, with (3) three consecutive missing peer-keepalives will trigger auto-recovery
    After reload (role is NONE ESTABLISHED) auto-recovery time (240 sec) expires while peer-link and peer-keepalive still down, auto-recovery 
    kicks in
        Switch Assumes primary role
        vPC's are brought up BYPASSING CONSISTENCY CHECKS

vPC Keepalive link
    Hearth beat to prevent dual-active scenario 
    Keepalives are sent every second by default on UDP port 3200
    3 second "hold timeout" on peer-link loss, how long we ignore keep lives after peer-link loss
    5 seconds "keepalive timeout" (starts after hold timeout after peer-link down), how long we wait for failure after hold timeout
    Use a dedicated link, although the NX-OS doesn't enforce this- just IP connectivity is verified (port-channel is recommended)
    Management interface can be used as keepalive link, but don't connect interfaces together directly (only active supervisor management 
    interface is up). Have in a dedicated VRF connecting to an intermediary switch

vPC Keepalvie flow chart with auto-recovery
If Peer-Link and Keepalive both fail (and primary peer is still alive)
    Dual-Active situation
        There will be two primary switches sending independent BPDU's
            vPC port-channels on upstream/downstream switches will error-disable after 90 seconds
            if Nexus 7K/5K is on the other end of the vPC ---> no errdisable as NX-OS does not support EtherChannel Guard
      Always always provision redundancy for Keepalive link and make sure it doesn't share the data path with peer-link
     This situation is really bad with VSS because of the shared control plane
    This issues becomes a mute point with auto-recovery
    Cisco Fabric Services
        Transport mechanism for control-plane messaging between vPC peer
            Consistency validation
            MAC address synchronization
            vPC member port status signaling
            IGMP snooping synchronization
            vPC status signaling
        vPC CFS messages are encapsulated in Ethernet frames and delivered between the switch peers via the peer-link
Swapping Primary and Secondary roles
    Sometimes it's preferred for operational reasons to have a specific switch as the primary
     vPC's are down for ~1 minute after primary changes to secondary
        Change the role priority  
        Bounce peer-link
When and outage occurs instead of power cycling take time to capture the following command outputs
    term len 0
    sh int
    sh vPC
    sh port-channel summary
    sh spanning-tree
    sh mac-address table
    sh routing vrf all
    sh ip arp vrf all
    sh tech detail will take around 10 minutes to collect might be better for a sh tech breif
vPC config considerations
    vPC domain # must be unique for each layer-2-adjacent vPC domain, otherwise issues will arise with multicast
    forwarding and LACP negotiation of corss-vpc links may arise
    Set logging level for vPC to 5
      This will allow vPC operations easier to follow
    Use LACP for the peer-link (channel-group # mode active) this will make it more resilient to separate link failures
    (fiber/sfp going bad) or switch control-plane failures
    Use auto-recovery (if not available use reload-restore)
        This is useful for case of multiple failures, more graceful recovery
vPC and spanning-tree operations
    Yes you have to blocking ports, but it's still recommended to run spanning-tree
    STP runs on both switches
        There are two control planes, but only the primary switch dives STP of vPC's. Port changes are communicated to the secondary                 switch via CFS messages
    For non-vPC ports domain appears as two bridges
    Peer-link is part of STP, BPDU handling is modified such that the Peer-link will not be blocked (similar to MST implementation of IST)
    Non-vPC port are managed independently by local STP process on each switch
STP upon vPC Primary failure
  1. Primary switch (STP Root) fails
  2. Secondary switch becomes the operational primary and STP root
  3. STP root port doesn't change nor any STP port states for vPC's, forwarding will continue
  4. Depending on control plane load it might take a few seconds fro the operational primary to start sending BPDU's
    1. This might cause STP reconvergence on the connected switches, one might consider increasing hello timer or peer-switch failure might be considered in very large deployments
When primary Switch Comes back online
  1. Primary comes back online
  2. Peer-Link comes back up
  3. vPC role is resolved as Operational-Secondary
  4. Primary switch has better STP priority and then becomes the STP root
  5. STP Root port will change from the secondary to primary and that will trigger a SYNC: all non-edge STP ports will be temporarily blocked
Once SYNC is complete ports will resume forwarding

vPC Peer-Switch
    Both the vPC switches originate BPUD's with preconfigured information. This allows to keep the same BPDU's when                                    primary fails/recovers. NO extra SYNC required.
    Both the Primary and Secondary switches consider themselves root and send BPDU's all the time
In peer-Switch mode bridge-ID comes from system-mac (vPC) as opposed to local mac in normal mode
Peer-Link STP inconsistencies on Secondary Switch
    When a peer link STP inconsistency is dedicated on secondary switch the peer-link will be put in a "inconsistent" STP state (blocking)
    All VLAN's of MST instances are also blocked on all vPC's
    The above behavior will depend on STP Bridge Assurance on peer-link (default) as a way to signal the secondary switch peer about
    With out BA on the peer-link any inconsistency on the Primary will lead to a Peer-Link FLAP (big NO NO)
PES/SPS and BG redirection
    Primary vPC peer control the port states on the secondary peer by the mans of SPS (set-port-state) messages
    Changes in STP information are synchronized between the peers using PES (port-event-sync) messages
    Can see with "sh spanning-tree internal info vPC , these counters should be stable unless a reconvergence event keeps happening
    BPDU's are sent to vPC's out of the primary switch. If vPC leg connected to the primary is down, BPDU's are sent over peer-link                  and sent by secondary
    Can use the "sh system internal frame traffic" to see the counter for the internal STP traffic
    "sh spanning-tree vlan 4 you will see two root ports
    "debug spanning-tree bpud_tx tree 101
    "used debut logfile <file>
    Can also sho the history
BA is default enabled on Peer-Link, not recommended for vPC unless Peer-Switch feature is also operational
Dispute is default enabled (for both RSTP and MST on vPC)
UDLD (normal mode) is recommended to take out bad links from channels (otherwise LACP takes 100SEC vs 20SEC with UDLD)
    Preferred BA+ UDLD+ Dispute on all inter switch links when using Peer-Switch of course all switches have to support this
    Nexus 7K/5K
    Without Peer-Switch BA should be kept only on a Peer-Link (no BA our LoopGuard on vPC's use UDLD+Dispute
    Used Loop Guard + UDLD on all non all Nexus (CATALYST) switch links
    Could use UDLD aggressive much more prone to false positives
Traffic Forwarding
Important to remember that when PC A is sent primary (let) it will flood, but Frames received on the Peer-Link will not be flooded out of vPC's
This is called vPC check stands for al L2,L3,unicast, multicast, broadcast and flooded traffic.
IF the frame comes in on the left switch it will stay on the left switch unless the upstream port-channel link is down (orphaned link)
This is not supported, must add a routed cross-link.
MAC address learning
The MAC is not flooded over the peer-link but via CFS message it is synced with the right switch which updates it MAC addr. table
Always learn on lower vPC
Datapath Drops
   "sh hardware internal errors all"
    Number 1 command to look for hardware packet drops.  
    Run several times to see if any counters increase
   To clear module counters "clear statistics module-all device all"
1st hop redundancy
Will forward on both peers via virtual mac
Troubleshooting HSRP
    sh hsrp brief
    sh mac address-table address
Install MAC address of Left and Right switches on each other so the can respond to quarries sent to either
What happens when you need to just talk to the left router, have to use separate routed interface
Can use "peer-gateway exclude vlan" to turn off on certain vlans
vPC multicast considerations
Goal is to allow the peer receiving the source traffic to forward it to receivers behind vPC without crossing the peer-link
    (vPC will drop such traffic otherwise)
What if the source is behind the vPC
Uses Proxy DR and will send the request to the same switch
When vPC is configured on a N7K-F248XP-25 (F2) there is no proxy-DR function due to hardware limitations. Packets will be bridged to DR over peer-link (vPC check is modified accordingly for L3 multicast packets on F2 linecards)
Peers do metrics exchange over CFS for each new source
Peer that has better metric to source or primary will be forwarder
    "sh ip pim internal vPC rpf"
For sources behind the vPC both peers will forward as they have to control on which one will get the traffic
    "sh ip pim internal vPC rpf"