Category Archives: Study Notes

CML, VIRL or GNS3?

Cisco have confused me a little in the way they have brought their simulator to market. First it was VIRL and free, then CML was announced and the name change blamed on marketing. Now we have both. Here is a birds eye view of each for reference. I’ve also included GNS3 as it is worth considering if you’re in the market for an emulation tool.

VIRL

  • Part of the /dev/innovate program at Cisco
  • In Beta at the time of writing
  • Community supported
  • Runs on Linux KVM Hypervisor / OpenStack frontend
    • This can be installed bare metal or run via VMWare Fusion / Workstation / Player etc
  • Local hardware model (laptop / desktop)
  • Supports IOS, IOS-XE, IOS-XR and NX-OS
  • No Layer 2
  • Annual Licence either Academic or Personal ($80 or $200)
  • Capped at 15 nodes

CML

  • Corporate Edition
  • TAC Supported
  • Client / Server Model
  • IOSv only today
  • NXOS support roadmapped
  • Heavy server requirements
    • Min: 4 Core CPU / 16 GB RAM for basic 15 node limit
    • 16 Core CPU / 128 GB RAM for full 50 nodes
  • Server runs Ubuntu image
  • Java Client (oh Cisco, why do you try my patience so?)
  • $13,000 Base Install (15 node limit)
  • 5% discount for 50 nodes, 10% for 100
  • More pricing detail and other useful information here

All-in-One Virtual Machine

  • Free
  • Limited version of VIRL
  • 3 Node Development Environment for the Open Network Environment Platform Kit (onePK)
  • Designed to allow would-be app writers to develop and test their Apps

GNS3

  • Free
  • Community Supported via the GNS3 Jungle
  • Multi-Vendor – Cisco, Juniper, HP, Arista, Citrix and Brocade
  • L2 Supported using Cisco L2IOU Images (native on Linux / Solaris only, VMs on Windows / Mac)
    • Although features relying on L3 Hardware such as  L3 Etherchannel, ISL trunks, DHCP snooping, Private VLAN, SPAN/RSPAN/ERSPAN, Port-security, Voice VLANs, MLS QoS and QinQ won’t work
  • Supports any Cisco platforms available as a virtual machine using VitualBox or Qemu

 

 

6500 VSS

In this first post on VSS I’m just dumping my notes from a breakout session on VSS at Networkers back in January, mostly for my own reference.

Summary

  • Makes two switches look like one switch
  • although in theory a VSS domain could contain many switches – only 2 are allowed today
  • Requires a dedicated link between the switches called a VSL (Virtual Switch Link)
  • Note: virtual SWITCHING system, this isn’t a router technology.
  • One time conversion involving changes to rommon (or conf-reg?)
    • The switch will find the VSL config before parsing the startup-config file fully
  • Switches referred to as Switch1 and Switch2, nomenclature fixed at conversion
  • One config
  • Ports are renumbered like when you stack 3750s e.g. Te1/1/1 and Te2/4/4
  • Control Plane -> only one box active (the other supervisor has state STANDBY_HOT)
  • Data Plane -> both boxes active
  • VSS has a considerably longer boot time

Deployment considerations and best practices

  • Never ever just type reload (you will get a warning). Use redundancy reload peer | self or redundancy force-switchover. If you are on the console you’ll need to connect to the other sup. If you go ahead with the reload then both switches will reboot at the same time – probably something you never want to do in a redundant setup.
    • The console will be disabled on switch 2, but cable it up in case of failover (like a 6500 with dual sup)
  • Never ever use write erase. It will wipe the rommon var which sets VSS at startup. Use erase nvram instead.
  • NSF is off by default – switch this on. It replicates the RIB to the standby chassis and greatly speeds up failover as forwarding to non directly attached routes can continue
router ospf 1
 nsf
  • Etherchannel, CEF forwarding and L3 ECMP (Equal Cost Multipath) have both been modified to always favour local links.
    • In a DC the traffic isn’t very random so we may want a L4 EC hash algorithm
    • Sup720 has 3-bit RBH (result bundle has), Sup2T has 8-bit so the algorithm can be more even..
  • Use unique domain IDs for each VSS pair.  Unique across entire campus network.
    • some MAC addresses as well as the system-id are derived from this
    • h/w swap outs between domains could break things.
    • avoid issues with sup swaps with mac-address use-virtual. This will require a reboot so build it into the boiler plate config
    • Switch MAC addresses are taken from the active chassis but retained on failover
  • Use out of band mac sync: mac address-table synchronize
  • Always dual attach in and out of the VSS or you create a SPOF
    • VSL is there principally for virtualisation and will only be used for data if there isn’t a local path
  • If you understand dual sup SSO (Stateful SwitchOver) you can think of VSS as this, but with the redundant sup in its own chassis and with the line cards in the second chassis available to the active sup.
    • SSO EOBC (100M Ethernet Out of Band Channel) replaced by VSL
    • To be SSO adjacent (fully standby hot on second sup) requires certain conditions to be true
  • We still need to run STP in the background in case a loop is accidentally introduced
  • Mechanisms exist to prevent split brain
    • LMP (Link Management Protocol), a bit like UDLD for the VSL
    • RRP (Role Resolution Protocol), decide who is active (lowest MAC by default), never force a failover. This is what makes the boot time so slow.
  • VSL
    • the split brain state (active-active) is a disaster – duplicate MAC addresses, router IDs etc.
    • VSL is main defence against this so important for it to be as resilient as possible
    • VSL ports must be 10G
    • use at least one of the 10G port on the Supervisor card since this boots before the line cards
    • have a minimum of 2 x 10G links (can have upto 8)
    • use a 10G port on a line card (both 10G ports on the Supervisor share an ASIC)
      • line card 10G ports must be VSL capable (note: the X6704 is not capable)
    • VSL takes control and data traffic between the chassis
      • the bandwidth of the VSL should be at least equal to the uplink bandwidth of each individual switch
    • Don’t change the VSL hashing algorithm in production networks since you will cut off some live flows
    • something about the QoS queues being different on the Sup 10G ports if you also use the Sup 1G ports – check my notes and write something sensible

Sample Config

Conversion

This is a one time process which doesn’t need to be symaltaneously on each switch, but probably should be

! VSS Domain is globally significant 

switch virtual domain 100 
  switch 1
  exit
int po 1
 switch virtual link 1
 exit
int ra ten 1/5/4-5
 channel-group 1 mode on 
 exit
switch convert mode virtual

switch virtual domain 100 
  switch 2
  exit
int po 2
 switch virtual link 2 
int ra ten 2/5/4-5
 channel-group 2 mode on
switch convert mode virtual
  • This will reboot the switch and change config to tell the switch it is a VSS
  • The switch will pre-parse the config for the VSL info so chatter can commence – on boot you can see which is ACTIVE or STANDBY

Ponder this: the port channels need different numbers as this will be one logical switch at the end.

VSL

! switch 1
int Po 1 
 no switchport
 no ip address
 switch virtual link 1
 mls trust cos
 no mls qos channel-consistancy

! switch 2
int Po 2
 no switchport
 no ip address
 switch virtual link 2
 mls trust cos
 no mls qos channel-consistancy

Verification

show switch virtual redundancy
- which switch am I?
- is control plane active?
- fabric (data plane) will be ..
show switch virtual role - active switch always first 

VSL Failure recovery

There are three methods we can use more than one.

  1. Enhanced PAgP
  2. VSLP “Fast hello”
  3. IP-BFD (Bi-Directional Forwarding detection) (deprecated feature)

We are interested in the first two. We need to detect the failure, recover from it and then reload the previously active sup.

While in recovery mode, avoid config changes (don’t even type conf t). This marks the config as modified and will require manual intervention to bring the VSS back.

  • 1. Enhanced PAgP
    • been around the longest
    • only on 3750 (12.2(46)SE, 4500, 6500 (with min software release)
    • new TLV field in PAgP message with active switch ID
    • sub-second convergence
    • If they see two different switch-ids then feed them back up the port channel and trigger the process
  • 2. VSLP “Fast Hello”
    • Virtual Switch Link Protocol
    • dedicated L2 link between the two switches
    • on all the time
    • sub-second hello
    • can be 100M link, no sync, just there as a heartbeat mechanism

Reboots

To reload only one VSS member use one of these commands:
redundancy reload shelf <shelf-ID>
redundancy force-switchover (switch to standby and reload active)
redundancy reload peer (reload standby)

 Software upgrade considerations

  • With VSS, the 6500 can be synced across different s/w releases so you can reboot one at a time
  • a message translation mechanism exists but this is limited to compatible versions
  • You have some time with 50% bandwidth but *no outage*
  • If something is broken by the upgrade and we cannot connect, there is a rollback Timer (45 minutes by default)
    • need to run issu acceptversion within that time to stop the timer
    • if there is a problem use issu rejectversion to bring forward
    • no unique features are available until you do issu commitversion
    • this allows you trial the existing features and make sure nothing broke before upgrading the second sup
  • s/w compatibilty matrix on cisco.com
  • 15.X train is the only way to get EFSU

ISSU History lesson

  • ISSU available across platforms
  • It is hitless, except on the 6500
  • The 6500 can do ISSU in standalone (non-VSS) mode, but the line cards have to reload
    • ISSU was renamed EFSU on the 6500 because of the hit
    • same commands are used though
  • pre SXI ‘Fast Software Upgrade’ is all we had, which resulted in an outage
  • 12.2(33)SXI – brough in Enhanced FSU

Notes on Multicast 2: Addresses

This is the second post in a series on IP Multicast. The first can be found here. In this post I’ll be looking at the different addresses used to identify Multcast groups at L2 and L3. First, a key point:

Hosts are not assigned Multicast addresses. Sources send traffic to a multicast address which members of a Group subscribe to.

You could think of the IPv4 and ethernet broadcast addresses (both are just all binary 1s) as a special case multicast address which every host is forced to subscribe to. To send a broadcast packet, a host populates the destination address IP header field with 255.255.255.255. The TCP/IP stack will then encapsulate that packet into a frame with destination MAC address FFFF.FFFF.FFFF. No host will ever have this address so no switch will have an entry for it in its MAC address table, which means the traffic is flooded out of every port in the VLAN. As an aside, you can see how if a NIC fails hot, it is going to hammer all the hosts on that VLAN – which is a good reason to keep VLANs small.

As with unicast and broadcast traffic, addresses are needed at Layer 3 and Layer 2. With both IPv4 and IPv6 the Layer 2 address of a group is a function of its Layer 3 address.

So, to send a multicast packet, the host selects the Group address it wishes to direct traffic to and populates the destination address in the IP header with the Group address. The TCP/IP stack then encapsulates the packet into a frame with the appropriate destination MAC address, as derived from the Group address. This is why switches need to be multicast compatible to forward traffic efficiently. If they are not, the traffic is just flooded out of every port in the VLAN since the destination MAC address will be unknown – which looks just like broadcast traffic.

Layer 3 address – IPv4

Multicast addresses are from IANA Class D range:
  • those starting with binary 110
  • 224.0.0.0–239.255.255.255
Common network control plane multicast reserved addresses are within 224/8. Within that, 224.0.0.0/24 is reserved for multicasting on a local segment. Routers will not forward traffic to this range. Some examples follow but see RFC 1700 page 56 for further details:
  • 224.0.0.1 – all ipv4 systems on the segment
  • 224.0.0.2 – all ipv4 routers on the segment
  • 224.0.0.5 – all OSPFv2 routers on a multiaccess network
  • 224.0.0.6 – OSPFv2 designated routers on a multiaccess network
  • 224.0.0.9 – RIPv2 routers
  • 224.0.0.10 – EIGRP routers
  • 224.0.0.13 – PIM routers
  • 224.0.0.18 – VRRP routers
  • 224.0.0.19-21 – ISIS over IP
  • 224.0.0.22 – IGMPv3
  • 224.0.0.102 – HSRP
  • 224.0.0.251 – mDNS
224.0.1.0/24 is reserved by IANA and designated the Internetwork Control Block. It is used for traffic which must be routed through the public Internet.
  • 224.0.1.1 – Multicast NTP
  • 224.0.1.39 – Cisco multicast router AUTO-RP-ANNOUNCE
  • 224.0.1.40 – Cisco multicast router AUTO-RP-DISCOVERY
  • 224.0.1.41 – H.323 Gatekeeper discovery
239.0.0.0/8 is reserved for administrative scoping and, like RFC1918 addresses should be filtered at site border routers.
A nice summary and further assigned address ranges can be found on the Wikipedia Multicast page.
Note: Although I used CIDR notation above, there are no subnets in the Class D range. Hence there are 228 possible multicast groups.

Layer 3 addresses – IPv6

These are described fully in RFC 4921. They are the addresses starting with binary 11111111. Here are some common network control plane IPv6 multicast addresses:
  • FF01::1  – All IPv6 nodes on a segment
  • FF02::2 – All IPv6 routers on a segment
  • FF02::5 – All OSPF routers on a segment
  • FF02::6 – All OSPF designated routers on a segment
  • FF02::9 – All RIPng routers on a segment
  • FF02::A – All EIGRP routers on a segment
  • FF02::D – PIM routers on a segment
  • FF02::1:2 – DHCP Servers and Relay agents on a segment
  • FF05::1:3 – DHCP Server (site scope)
  • FF05::101 – NTP
  • FF0x::FB – mDNS, where x is the flags bit

Layer 2 addresses – Ethernet

If the Group Member’s NIC is multicast ready, it will derive a multicast MAC address from the IPv4 layer 3 address using the reserved MAC address 0100.5E00.0000 and the lower 23 bits of the multicast IPv4 Group address. As an example, the MAC address used to forward traffic to the 224.0.0.5 would be 0100.5E00.0005.
IANA owns the OUI 0100.5E00.0000 – 0100.5EFF.FFFF (see RFC 5342). From this they allocated 223 addresses IPv4 multicast.
  • The eighth (Individual/Group) bit is set 1 to indicate a multicast MAC address. See RFC 1112.
  • The 25th bit is always zero, leaveing 00.0000 – 7F.FFFF for use.
This scheme does leave some scope for ambiguity as we are using 223 ethernet addresses to represent 228 Multicast Groups. Each MAC address could represent 25 Multicast Groups. If two groups happen to exist on a LAN then all hosts in each group will receive traffic for both groups. This is still an improvement on broadcast traffic, which is always received by all hosts on the subnet, especially given that the chances of it happening are small.
If the Layer 3 protocol is IPv6 the the lowest two nibbles are OR’d with the MAC address 33:33:00:00:00:00. For example, 2000:BAAD:CAFE::2:1 would become 33:33:00:02:00:01.

There a a couple of benefits to this method of mapping L3 to L2 addresses. Firstly we don’t need ARP or any multicast equivalent. Secondly, at L2 a source host only needs to send frames to one MAC address and all hosts subscribed to the Group will receive the traffic – which is the whole point.

Summary

This post is intended as a quick round up of multicast addresses. In the next post in this series I’ll look at Multicast Groups.

Notes on Multicast: 1 Introduction and Glossary

Introduction

This series of posts is intended as a summary of the basics behind IP Multicast. Unlike Unicast, which is one-to-one traffic, or broadcast, which is one-to-all traffic, multicast is one-to-subscribers traffic. If broadcast traffic is like hearing a Tannoy announcement at a public event, multicast is like carrying a walkie talkie tuned into the correct frequency.

Multicast: Image source: http://en.wikipedia.org/wiki/File:Multicast.svg

In a routed network this multicast the potential to save lots of bandwidth and host processing power. Since most of us don’t spend a lot of time configuring it, multicast is often poorly understood or once understood, quickly forgotten.

Special addresses are reserved for this process and dedicated routing protocols exist to build and maintain a ‘multicast tree’, with the source of the multicast transmission at the root and the subscriber hosts on the leaves.

Another reason to get into multicast is that there is no broadcast in IPv6, so we need to understand the basics if we are going to be ready to run an IPv6 network.

To offer multicast on a routed network you need the following:

  1. Addresses to identify multicast groups
  2. Group join and leave methods for hosts
  3. Efficient multicast routing protocol for delivering traffic to multicast group members
I’ll look at these in subsequent posts. For now, I’ll just list some of the common terms associated with IP Multicast.

Glossary

  • CGMP: Cisco Group Management Protocol (legacy). Distributes multicast sessions to relevant switch ports only. Runs on switches which cannot inspect IP header or have no IGMP ASIC.
  • Densely Populated: Many multicast group members in relation to number of nodes on a network (found on LANs). Make use of implicit join / broadcast-and-prune tree management.
  • Downstream: Direction away from Source which all multicast traffic should flow
  • G: Group receiving data
  • GLOP addressing: middle two octets in 233.0.0/8 are determined by the organisations 16 bit ASN. This gives 265 public, static Multicast addresses to all organisations with a unique ASN.
  • IGMP: Internet Group Management Protocol. How routers determine whether to deliver multicast sessions to a given subnet. Multicast routing protocol independent.
    • Querier: Router on a subnet (with lowest IP address) designated to forward multicast traffic to hosts, used by IGMPv2
    • DR: Designated Router: Router on a subnet (with highest IP address) used by IGMPv1 to manage groups.
    • Leave Group:  IGMP2 only message sent by host to 224.0.0.2 iff it was the last host to respond to a Membership Query.
    • Membership Query: Message sent by Router to verify whether any hosts on an attached subnet are members of a given Group.
      • General Queries (like keepalives) sent every 60 seconds to 224.0.0.1. New router issues GQ to force election.
      • Specific Query sent in response to Leave Group message (IGMP2 only)
    • Unsolicited Membership Report: Group join request sent from host to router via group multicast address
    • Solicited Membership Report: Reports also sent in response to a Membership Query by all hosts after $RAND seconds unless one already sent.
  • IGMP Snooping – mechanism for switch to inspect IP header for Group information
  • Loop avoidance: Packets not received on the upstream port (closest to source) are dropped.
  • MBone: regional multicast networks connected by IPinIP tunnels.
  • Multicast: transmitting data to multiple receivers
  • Multicast Forwarding Table: holds (S,G) pairs.
  • Multicast Router: Forwards packest away from the source to multiple destinations
    • Determine the upstream interface to the source
    • Determine the downstream interfaces for each (S,G) pair
    • Manage the (dynamic) tree as hosts join and leave the Group by grafting an pruning branches.
  • Multicast Routing Protocol: determines the upstream and downstream ports for any (S,G).
  • Multicast Storm: Situation where a forwarded packet returns to the incoming interface, causing a loop until the TTL expires.
  • Permanent Group: groups with a permanent multicast address
  • PIM: Protocol Independent Multicast
    • The two multicast routing protocols supported by Cisco are:
      • PIM-SP: Spare Mode
      • PIM-DM: Dense Mode
  • RP: Rendezvous Point. Single router used in shared tress which the sources must register with.
  • RPF: Reverse Path Forwarding. Multicast Routing Protocols calculate the shortest path to the source, not destination.
  • RPM: Reverse Path Multicast. The process of forwarding multicast packets only out of interfaces leading to group members.
  • S: Source of data
  • Scoping: Limiting the reach of multicast traffic. 239.0.0.0/8 reserved for scoped IPs.
  • (S,G): how a router identifies a group
  • Shared Tree: Used instead of souce based tree when >1 source exists in a Group. If all possible trees share a common, strategically placed router (the RP), the trees can all be rooted via the RP rather than a host.  The router uses a (*, G) state to reflect that it has multiple sources. More efficient multicast forwarding table; used in sparse topologies.
  • Source based tree: separate multicast tree per source. Use (S,G) state. Scales at (SG * GN) so no good in environments with many groups and sources. More efficient forwarding; used in dense topologies.
  • Sparsely populated: few multicast group members relative to the number of hosts on a network (about ratio not numbers). Found on WANs. Make use of explicit join ‘graft’ messages from routers which have hosts wanting to join the group.
  • Transient Group: groups with a temporarily assigned address from the unreserved pool
  • Unicast Router: forwards traffic based on longest matched route on destination address. Source address not usually relevant.
  • Upstream: direction towards the Source which no multicast traffic should flow