In this first post on VSS I’m just dumping my notes from a breakout session on VSS at Networkers back in January, mostly for my own reference.
- Makes two switches look like one switch
- although in theory a VSS domain could contain many switches – only 2 are allowed today
- Requires a dedicated link between the switches called a VSL (Virtual Switch Link)
- Note: virtual SWITCHING system, this isn’t a router technology.
- One time conversion involving changes to rommon (or conf-reg?)
- The switch will find the VSL config before parsing the startup-config file fully
- Switches referred to as Switch1 and Switch2, nomenclature fixed at conversion
- One config
- Ports are renumbered like when you stack 3750s e.g. Te1/1/1 and Te2/4/4
- Control Plane -> only one box active (the other supervisor has state STANDBY_HOT)
- Data Plane -> both boxes active
- VSS has a considerably longer boot time
Deployment considerations and best practices
- Never ever just type
reload(you will get a warning). Use
redundancy reload peer | selfor
redundancy force-switchover. If you are on the console you’ll need to connect to the other sup. If you go ahead with the reload then both switches will reboot at the same time – probably something you never want to do in a redundant setup.
- The console will be disabled on switch 2, but cable it up in case of failover (like a 6500 with dual sup)
- Never ever use
write erase. It will wipe the rommon var which sets VSS at startup. Use
- NSF is off by default – switch this on. It replicates the RIB to the standby chassis and greatly speeds up failover as forwarding to non directly attached routes can continue
router ospf 1 nsf
- Etherchannel, CEF forwarding and L3 ECMP (Equal Cost Multipath) have both been modified to always favour local links.
- In a DC the traffic isn’t very random so we may want a L4 EC hash algorithm
- Sup720 has 3-bit RBH (result bundle has), Sup2T has 8-bit so the algorithm can be more even..
- Use unique domain IDs for each VSS pair. Unique across entire campus network.
- some MAC addresses as well as the system-id are derived from this
- h/w swap outs between domains could break things.
- avoid issues with sup swaps with
mac-address use-virtual. This will require a reboot so build it into the boiler plate config
- Switch MAC addresses are taken from the active chassis but retained on failover
- Use out of band mac sync:
mac address-table synchronize
- Always dual attach in and out of the VSS or you create a SPOF
- VSL is there principally for virtualisation and will only be used for data if there isn’t a local path
- If you understand dual sup SSO (Stateful SwitchOver) you can think of VSS as this, but with the redundant sup in its own chassis and with the line cards in the second chassis available to the active sup.
- SSO EOBC (100M Ethernet Out of Band Channel) replaced by VSL
- To be SSO adjacent (fully standby hot on second sup) requires certain conditions to be true
- We still need to run STP in the background in case a loop is accidentally introduced
- Mechanisms exist to prevent split brain
- LMP (Link Management Protocol), a bit like UDLD for the VSL
- RRP (Role Resolution Protocol), decide who is active (lowest MAC by default), never force a failover. This is what makes the boot time so slow.
- the split brain state (active-active) is a disaster – duplicate MAC addresses, router IDs etc.
- VSL is main defence against this so important for it to be as resilient as possible
- VSL ports must be 10G
- use at least one of the 10G port on the Supervisor card since this boots before the line cards
- have a minimum of 2 x 10G links (can have upto 8)
- use a 10G port on a line card (both 10G ports on the Supervisor share an ASIC)
- line card 10G ports must be VSL capable (note: the X6704 is not capable)
- VSL takes control and data traffic between the chassis
- the bandwidth of the VSL should be at least equal to the uplink bandwidth of each individual switch
- Don’t change the VSL hashing algorithm in production networks since you will cut off some live flows
- something about the QoS queues being different on the Sup 10G ports if you also use the Sup 1G ports – check my notes and write something sensible
This is a one time process which doesn’t need to be symaltaneously on each switch, but probably should be
! VSS Domain is globally significant switch virtual domain 100 switch 1 exit int po 1 switch virtual link 1 exit int ra ten 1/5/4-5 channel-group 1 mode on exit switch convert mode virtual switch virtual domain 100 switch 2 exit int po 2 switch virtual link 2 int ra ten 2/5/4-5 channel-group 2 mode on
switch convert mode virtual
- This will reboot the switch and change config to tell the switch it is a VSS
- The switch will pre-parse the config for the VSL info so chatter can commence – on boot you can see which is ACTIVE or STANDBY
Ponder this: the port channels need different numbers as this will be one logical switch at the end.
! switch 1 int Po 1 no switchport no ip address switch virtual link 1 mls trust cos no mls qos channel-consistancy ! switch 2 int Po 2 no switchport no ip address switch virtual link 2 mls trust cos no mls qos channel-consistancy
show switch virtual redundancy - which switch am I? - is control plane active? - fabric (data plane) will be .. show switch virtual role - active switch always first
VSL Failure recovery
There are three methods we can use more than one.
- Enhanced PAgP
- VSLP “Fast hello”
- IP-BFD (Bi-Directional Forwarding detection) (deprecated feature)
We are interested in the first two. We need to detect the failure, recover from it and then reload the previously active sup.
While in recovery mode, avoid config changes (don’t even type conf t). This marks the config as modified and will require manual intervention to bring the VSS back.
- 1. Enhanced PAgP
- been around the longest
- only on 3750 (12.2(46)SE, 4500, 6500 (with min software release)
- new TLV field in PAgP message with active switch ID
- sub-second convergence
- If they see two different switch-ids then feed them back up the port channel and trigger the process
- 2. VSLP “Fast Hello”
- Virtual Switch Link Protocol
- dedicated L2 link between the two switches
- on all the time
- sub-second hello
- can be 100M link, no sync, just there as a heartbeat mechanism
To reload only one VSS member use one of these commands: redundancy reload shelf <shelf-ID> redundancy force-switchover (switch to standby and reload active) redundancy reload peer (reload standby)
Software upgrade considerations
- With VSS, the 6500 can be synced across different s/w releases so you can reboot one at a time
- a message translation mechanism exists but this is limited to compatible versions
- You have some time with 50% bandwidth but *no outage*
- If something is broken by the upgrade and we cannot connect, there is a rollback Timer (45 minutes by default)
- need to run
issu acceptversionwithin that time to stop the timer
- if there is a problem use
issu rejectversionto bring forward
- no unique features are available until you do
- this allows you trial the existing features and make sure nothing broke before upgrading the second sup
- need to run
- s/w compatibilty matrix on cisco.com
- 15.X train is the only way to get EFSU
ISSU History lesson
- ISSU available across platforms
- It is hitless, except on the 6500
- The 6500 can do ISSU in standalone (non-VSS) mode, but the line cards have to reload
- ISSU was renamed EFSU on the 6500 because of the hit
- same commands are used though
- pre SXI ‘Fast Software Upgrade’ is all we had, which resulted in an outage
- 12.2(33)SXI – brough in Enhanced FSU