Monthly Archives: July 2012

Notes on Multicast 2: Addresses

This is the second post in a series on IP Multicast. The first can be found here. In this post I’ll be looking at the different addresses used to identify Multcast groups at L2 and L3. First, a key point:

Hosts are not assigned Multicast addresses. Sources send traffic to a multicast address which members of a Group subscribe to.

You could think of the IPv4 and ethernet broadcast addresses (both are just all binary 1s) as a special case multicast address which every host is forced to subscribe to. To send a broadcast packet, a host populates the destination address IP header field with 255.255.255.255. The TCP/IP stack will then encapsulate that packet into a frame with destination MAC address FFFF.FFFF.FFFF. No host will ever have this address so no switch will have an entry for it in its MAC address table, which means the traffic is flooded out of every port in the VLAN. As an aside, you can see how if a NIC fails hot, it is going to hammer all the hosts on that VLAN – which is a good reason to keep VLANs small.

As with unicast and broadcast traffic, addresses are needed at Layer 3 and Layer 2. With both IPv4 and IPv6 the Layer 2 address of a group is a function of its Layer 3 address.

So, to send a multicast packet, the host selects the Group address it wishes to direct traffic to and populates the destination address in the IP header with the Group address. The TCP/IP stack then encapsulates the packet into a frame with the appropriate destination MAC address, as derived from the Group address. This is why switches need to be multicast compatible to forward traffic efficiently. If they are not, the traffic is just flooded out of every port in the VLAN since the destination MAC address will be unknown – which looks just like broadcast traffic.

Layer 3 address – IPv4

Multicast addresses are from IANA Class D range:
  • those starting with binary 110
  • 224.0.0.0–239.255.255.255
Common network control plane multicast reserved addresses are within 224/8. Within that, 224.0.0.0/24 is reserved for multicasting on a local segment. Routers will not forward traffic to this range. Some examples follow but see RFC 1700 page 56 for further details:
  • 224.0.0.1 – all ipv4 systems on the segment
  • 224.0.0.2 – all ipv4 routers on the segment
  • 224.0.0.5 – all OSPFv2 routers on a multiaccess network
  • 224.0.0.6 – OSPFv2 designated routers on a multiaccess network
  • 224.0.0.9 – RIPv2 routers
  • 224.0.0.10 – EIGRP routers
  • 224.0.0.13 – PIM routers
  • 224.0.0.18 – VRRP routers
  • 224.0.0.19-21 – ISIS over IP
  • 224.0.0.22 – IGMPv3
  • 224.0.0.102 – HSRP
  • 224.0.0.251 – mDNS
224.0.1.0/24 is reserved by IANA and designated the Internetwork Control Block. It is used for traffic which must be routed through the public Internet.
  • 224.0.1.1 – Multicast NTP
  • 224.0.1.39 – Cisco multicast router AUTO-RP-ANNOUNCE
  • 224.0.1.40 – Cisco multicast router AUTO-RP-DISCOVERY
  • 224.0.1.41 – H.323 Gatekeeper discovery
239.0.0.0/8 is reserved for administrative scoping and, like RFC1918 addresses should be filtered at site border routers.
A nice summary and further assigned address ranges can be found on the Wikipedia Multicast page.
Note: Although I used CIDR notation above, there are no subnets in the Class D range. Hence there are 228 possible multicast groups.

Layer 3 addresses – IPv6

These are described fully in RFC 4921. They are the addresses starting with binary 11111111. Here are some common network control plane IPv6 multicast addresses:
  • FF01::1  – All IPv6 nodes on a segment
  • FF02::2 – All IPv6 routers on a segment
  • FF02::5 – All OSPF routers on a segment
  • FF02::6 – All OSPF designated routers on a segment
  • FF02::9 – All RIPng routers on a segment
  • FF02::A – All EIGRP routers on a segment
  • FF02::D – PIM routers on a segment
  • FF02::1:2 – DHCP Servers and Relay agents on a segment
  • FF05::1:3 – DHCP Server (site scope)
  • FF05::101 – NTP
  • FF0x::FB – mDNS, where x is the flags bit

Layer 2 addresses – Ethernet

If the Group Member’s NIC is multicast ready, it will derive a multicast MAC address from the IPv4 layer 3 address using the reserved MAC address 0100.5E00.0000 and the lower 23 bits of the multicast IPv4 Group address. As an example, the MAC address used to forward traffic to the 224.0.0.5 would be 0100.5E00.0005.
IANA owns the OUI 0100.5E00.0000 – 0100.5EFF.FFFF (see RFC 5342). From this they allocated 223 addresses IPv4 multicast.
  • The eighth (Individual/Group) bit is set 1 to indicate a multicast MAC address. See RFC 1112.
  • The 25th bit is always zero, leaveing 00.0000 – 7F.FFFF for use.
This scheme does leave some scope for ambiguity as we are using 223 ethernet addresses to represent 228 Multicast Groups. Each MAC address could represent 25 Multicast Groups. If two groups happen to exist on a LAN then all hosts in each group will receive traffic for both groups. This is still an improvement on broadcast traffic, which is always received by all hosts on the subnet, especially given that the chances of it happening are small.
If the Layer 3 protocol is IPv6 the the lowest two nibbles are OR’d with the MAC address 33:33:00:00:00:00. For example, 2000:BAAD:CAFE::2:1 would become 33:33:00:02:00:01.

There a a couple of benefits to this method of mapping L3 to L2 addresses. Firstly we don’t need ARP or any multicast equivalent. Secondly, at L2 a source host only needs to send frames to one MAC address and all hosts subscribed to the Group will receive the traffic – which is the whole point.

Summary

This post is intended as a quick round up of multicast addresses. In the next post in this series I’ll look at Multicast Groups.

Notes on Multicast: 1 Introduction and Glossary

Introduction

This series of posts is intended as a summary of the basics behind IP Multicast. Unlike Unicast, which is one-to-one traffic, or broadcast, which is one-to-all traffic, multicast is one-to-subscribers traffic. If broadcast traffic is like hearing a Tannoy announcement at a public event, multicast is like carrying a walkie talkie tuned into the correct frequency.

Multicast: Image source: http://en.wikipedia.org/wiki/File:Multicast.svg

In a routed network this multicast the potential to save lots of bandwidth and host processing power. Since most of us don’t spend a lot of time configuring it, multicast is often poorly understood or once understood, quickly forgotten.

Special addresses are reserved for this process and dedicated routing protocols exist to build and maintain a ‘multicast tree’, with the source of the multicast transmission at the root and the subscriber hosts on the leaves.

Another reason to get into multicast is that there is no broadcast in IPv6, so we need to understand the basics if we are going to be ready to run an IPv6 network.

To offer multicast on a routed network you need the following:

  1. Addresses to identify multicast groups
  2. Group join and leave methods for hosts
  3. Efficient multicast routing protocol for delivering traffic to multicast group members
I’ll look at these in subsequent posts. For now, I’ll just list some of the common terms associated with IP Multicast.

Glossary

  • CGMP: Cisco Group Management Protocol (legacy). Distributes multicast sessions to relevant switch ports only. Runs on switches which cannot inspect IP header or have no IGMP ASIC.
  • Densely Populated: Many multicast group members in relation to number of nodes on a network (found on LANs). Make use of implicit join / broadcast-and-prune tree management.
  • Downstream: Direction away from Source which all multicast traffic should flow
  • G: Group receiving data
  • GLOP addressing: middle two octets in 233.0.0/8 are determined by the organisations 16 bit ASN. This gives 265 public, static Multicast addresses to all organisations with a unique ASN.
  • IGMP: Internet Group Management Protocol. How routers determine whether to deliver multicast sessions to a given subnet. Multicast routing protocol independent.
    • Querier: Router on a subnet (with lowest IP address) designated to forward multicast traffic to hosts, used by IGMPv2
    • DR: Designated Router: Router on a subnet (with highest IP address) used by IGMPv1 to manage groups.
    • Leave Group:  IGMP2 only message sent by host to 224.0.0.2 iff it was the last host to respond to a Membership Query.
    • Membership Query: Message sent by Router to verify whether any hosts on an attached subnet are members of a given Group.
      • General Queries (like keepalives) sent every 60 seconds to 224.0.0.1. New router issues GQ to force election.
      • Specific Query sent in response to Leave Group message (IGMP2 only)
    • Unsolicited Membership Report: Group join request sent from host to router via group multicast address
    • Solicited Membership Report: Reports also sent in response to a Membership Query by all hosts after $RAND seconds unless one already sent.
  • IGMP Snooping – mechanism for switch to inspect IP header for Group information
  • Loop avoidance: Packets not received on the upstream port (closest to source) are dropped.
  • MBone: regional multicast networks connected by IPinIP tunnels.
  • Multicast: transmitting data to multiple receivers
  • Multicast Forwarding Table: holds (S,G) pairs.
  • Multicast Router: Forwards packest away from the source to multiple destinations
    • Determine the upstream interface to the source
    • Determine the downstream interfaces for each (S,G) pair
    • Manage the (dynamic) tree as hosts join and leave the Group by grafting an pruning branches.
  • Multicast Routing Protocol: determines the upstream and downstream ports for any (S,G).
  • Multicast Storm: Situation where a forwarded packet returns to the incoming interface, causing a loop until the TTL expires.
  • Permanent Group: groups with a permanent multicast address
  • PIM: Protocol Independent Multicast
    • The two multicast routing protocols supported by Cisco are:
      • PIM-SP: Spare Mode
      • PIM-DM: Dense Mode
  • RP: Rendezvous Point. Single router used in shared tress which the sources must register with.
  • RPF: Reverse Path Forwarding. Multicast Routing Protocols calculate the shortest path to the source, not destination.
  • RPM: Reverse Path Multicast. The process of forwarding multicast packets only out of interfaces leading to group members.
  • S: Source of data
  • Scoping: Limiting the reach of multicast traffic. 239.0.0.0/8 reserved for scoped IPs.
  • (S,G): how a router identifies a group
  • Shared Tree: Used instead of souce based tree when >1 source exists in a Group. If all possible trees share a common, strategically placed router (the RP), the trees can all be rooted via the RP rather than a host.  The router uses a (*, G) state to reflect that it has multiple sources. More efficient multicast forwarding table; used in sparse topologies.
  • Source based tree: separate multicast tree per source. Use (S,G) state. Scales at (SG * GN) so no good in environments with many groups and sources. More efficient forwarding; used in dense topologies.
  • Sparsely populated: few multicast group members relative to the number of hosts on a network (about ratio not numbers). Found on WANs. Make use of explicit join ‘graft’ messages from routers which have hosts wanting to join the group.
  • Transient Group: groups with a temporarily assigned address from the unreserved pool
  • Unicast Router: forwards traffic based on longest matched route on destination address. Source address not usually relevant.
  • Upstream: direction towards the Source which no multicast traffic should flow

Janet6

I watched this presentation on Janet6 today. Since I work in ac.uk I’m always interested in their developments. Some key points:

  1. Exponential growth in day to day inbound traffic from external peers since records began in 2005. Peaked at 120Gbps last summer. Forecasted to reach 240Gbps in 2014, 0.5 Tbps by 2016.
  2. More and more hungry ‘big Science’ users are also driving the requirement for further bandwidth (LHC, Square Kilometer Array, Bioinformatics Genome Transfers, ITER Fusion Reactor, Radio Astronomy, Climate Data etc).
  3. To cope with this Janet6 will have a brand new network, to be deployed in parallel to SuperJanet5.
  4. Janet6 will use different PoP to Janet5 with all sites to be migrated by October 2013.
  5. For the first time Janet will operate the optical layer directly using DWDM and dedicated hardware to manage this.
  6. A new optical network is being procured with the SSE Telecoms to be the provider.
  7. The Juniper routers will be upgraded to T4000s which provide 240Gbps per slot or 192 10Gbps ports per chassis.
  8. The lightpath service will now be run over the existing EoMPLS network with upgraded 100GE interconnects.
  9. A new non-CIR L2 service will be offered using the MPLS backbone in addition to this.
  10. Requirement for a second connection to Janet becoming more common for institutions. Details around how these are financed are still being worked out.

This guy’s opinion

It is always good to see proactive capacity management in action. Higher Education in the UK is often seen as being at the bleeding edge of networking. The bandwidth figures here would seem to corroborate that point of view – it is one of the reasons ac.uk is such a fun environment to work in. Certainly science departments at my place of work are demanding more and more bandwidth, as are our 30,000 users. The new backbone looks very beefy both at Layer 1 and above and that is a good thing.

Resilience is a word I’m hearing more and more at work and like many others it would seem, we are in the process of introducing a second connection to Janet (maybe we’re not bleeding edge there). I’ll be interested in how this develops. For us it was affordable because we are a PoP for several other institutions so we are paid to host the router. In tight economic times others may find the expense hard to justify if they have to go it alone, especially given the incredible reliability of Janet5. The only outages I’ve experienced have been as a result of our FWSMs flaking out or IOS upgrades on our border 6500.

Finally, I’m pleased that the lightpath service is getting some love. I wonder if Universities would start co-locating their datacentres and use lightpath as the DCI if the service could be fast and reliable enough?

BGP Troubleshooting

I came across a device with the global bgp routing table today, which is open to the world. You access it like this:

telnet route-server.bb.pipex.net

It is a great little resource for bgp troubleshooting. For example, I’ll see if the Google DNS server’s subnet (assuming /24) is in the global routing table:

route-server>show ip bgp 8.8.8.8 255.255.255.0 longer-prefixes 
BGP table version is 117876702, local router ID is 212.241.174.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
 r RIB-failure, S Stale, m multipath, b backup-path, x best-external
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network Next Hop Metric LocPrf Weight Path
*> 8.8.8.0/24 212.241.174.5 100 100 0 (65300 65306) 15169 i
* 212.241.174.5 100 100 0 (65300 65306) 15169 i
* 212.241.174.5 100 100 0 (65300 65306) 15169 i
* 212.241.174.5 100 100 0 (65300 65306) 15169 i

Sweet!

Diagram

Using BGP communities and local preference

At my current place of work we have a single ISP. We have two routed connections spread across two line cards, on a dual-sup Cisco 6500. However, with 30,000 Internet hungry users and the upcoming 6500 chassis replacement this no longer offers enough resilience. We’re going to move one of our links to a second router and do some simple load balancing using a combination of BGP communities and local preference.

I’ve mimicked our IPv4 unicast setup as far as I am able using Dynamips and RFC1918 addresses. I will document it here. We also run IPv6 and IPv4 multicast but the setup is the same.

Requirements

  1. We want both links to be active. If one site were to fail, we want to know the other site is ready to go.
  2. We don’t want routing loops.We do want to stay online if the uplink interface fails on either of our routers. Therefore we’ll run iBGP between our two Internet routers.
  3. iBGP peers use loopbacks. Our two Internet routers have multiple paths between them so we’ll peer on loopbacks. For this lab they only have one link. I’ll use OSPF so that all my lab routers know each others loopback addresses.
  4. eBGP peers use interface addresses. We don’t run an IGP with our ISP so we need static routes to peer on loopbacks. Now that each router will only have one link, this isn’t needed.

Network Diagram

We want to draw traffic to 10.1.0.0/16 via 3.3.3.3, and traffic to 10.2.0.0/16 via 4.4.4.4. Both routers will offer a default route to our IGP if they have one.

Diagram

Outbound traffic

Our ISP will offer us a default route. We will configure each Internet router so that if it has a valid route to 0/0, it will advertise it into our IGP. Therefore outbound traffic will flow via both Internet routers in normal operation.

Inbound traffic

We have two /16s and will configure each Internet router to draw traffic for just one of them in normal operation. Our ISP will accept two BGP communities from its customers. Any routes including 64511:1 will get a better (higher) local pref than routes with 64511:2. Although I don’t know exactly how they achieve this, I will mimic it by setting the local pref to 150 and 50 respectively.

Asymmetric routing then?

This configuration does not guarantee that the traffic will flow in and out the same way. However, this shouldn’t matter. My own experience is that turning on HSRP causes a similar issue locally [1] and that hasn’t broken any applications yet.

Configuration

I’ve attached full Router Configs to this post.

A basic BGP setup with no funny stuff

The four routers are peering with one another as per the diagram.

  • 3.3.3.3 and 4.4.4.4 are both offering the /16s with no tweaks.
  • 1.1.1.1 and 2.2.2.2 are offering 0.0.0.0/0 with no tweaks.

So we have resiliency in our Internet connection. However, we have no influence over which router our ISP will use as the next hop for our prefixes. Since BGP will choose one next hop for a given prefix by default, only one of our routers will get used for inbound traffic until we take further action.

Community spirit

Here is what we need to do on each enterprise router (3.3.3.3 & 4.4.4.4):

  1. Create an ACL or prefix list to match the subnets we wish to influence
  2. Create a route-map to add the BGP community we chose to each prefix
  3. Configure BGP so that BGP communities are offered and the route-map is applied to our eBGP peer.

Our ISP routers (1.1.1.1 & 2.2.2.2) will need the following:

  1. Create a community list for each BGP community we will accept
  2. Create a route-map to match on the appropriate list and the adjust the local pref
  3. Configure BGP to apply the route-map to the appropriate neighbor.
Here is the relevant config:
! R3
ip prefix-list 10_1 seq 5 permit 10.1.0.0/16
ip prefix-list 10_2 seq 5 permit 10.2.0.0/16
!
route-map TO_ISP permit 10
 match ip address prefix-list 10_1
 set community 64496:1
route-map TO_ISP permit 20
 match ip address prefix-list 10_2
 set community 64496:2
!
router bgp 64496
 bgp log-neighbor-changes
 neighbor 4.4.4.4 remote-as 64496
 neighbor 4.4.4.4 update-source Loopback3
 neighbor 192.168.13.1 remote-as 64511
 !
 address-family ipv4
  neighbor 4.4.4.4 activate
  neighbor 192.168.13.1 activate
  neighbor 192.168.13.1 send-community
  neighbor 192.168.13.1 soft-reconfiguration inbound
  neighbor 192.168.13.1 route-map TO_ISP out
  no auto-summary
  no synchronization
  network 10.1.0.0 mask 255.255.0.0
  network 10.2.0.0 mask 255.255.0.0
 exit-address-family

R4 is pretty well the same, apart from the route-map which reverse the community choice:

route-map TO_ISP permit 10
 match ip address prefix-list 10_1
 set community 64496:2
route-map TO_ISP permit 20
 match ip address prefix-list 10_2
 set community 64496:1 

R1 and R2  have virtually the same config:

! R1
ip community-list 1 permit 64496:1
ip community-list 2 permit 64496:2
!
route-map FROM_CUSTOMER permit 10
 match community 1
 set local-preference 150
route-map FROM_CUSTOMER permit 20
 match community 2
 set local-preference 50
!
router bgp 64511
 bgp log-neighbor-changes
 neighbor 2.2.2.2 remote-as 64511
 neighbor 2.2.2.2 update-source Loopback1
 neighbor 192.168.13.2 remote-as 64496
 !
 address-family ipv4
  neighbor 2.2.2.2 activate
  neighbor 192.168.13.2 activate
  neighbor 192.168.13.2 soft-reconfiguration inbound
  neighbor 192.168.13.2 route-map FROM_CUSTOMER in
  no auto-summary
  no synchronization
  network 0.0.0.0
 exit-address-family

Verification

First we verify on R1. Notice that it only has one route for 10.1/16 because R2’s best route for 10.1/16 is learned from R1 via iBGP – iBGP rules mean it won’t re-advertise the prefix.

R1#show ip bgp 10.0.0.0/8 longer-prefixes 
BGP table version is 5, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.1.0.0/16      192.168.13.2             0    150      0 64496 i
*>i10.2.0.0/16      192.168.24.2             0    150      0 64496 i
*                   192.168.13.2             0     50      0 64496 i

Then on R2 we notice a similar situation, although now we only have one route for 10.2/16:

R2#show ip bgp 10.0.0.0/8 longer-prefixes 
BGP table version is 7, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*>i10.1.0.0/16      192.168.13.2             0    150      0 64496 i
*                   192.168.24.2             0     50      0 64496 i
*> 10.2.0.0/16      192.168.24.2             0    150      0 64496 i

Thoughts

This method words well in our environment as our IPv4 allocation gives us a natural way of roughly dividing our traffic between our two Internet routers. This lab has also shown me one of the reasons ISPs like BGP communities. They are simple, extensible, scalable and flexible. In this case we can select our primary router – the ISP could easily overwrite our choice. Clearly in a production network the ISP would be a little more careful about which routes it accepts from its customers. Even so, the config is simple and powerful.


Appendix – the basic BGP setup

R1 and R2 each have a static route as follows:

 ip route 0.0.0.0 0.0.0.0 Serial1/5 192.168.[1|2]5.2

R1 learns both /16s from both of its R3 and R2. It also learns another router to 0/0 from R2:

R1#show ip bgp summary | b Neighbor
Neighbor        V          AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
2.2.2.2         4      64511      21      20        4    0    0 00:15:52        3
192.168.13.2    4      64496      39      40        4    0    0 00:37:06        2

Since it has a local route for 0/0 with a local preference of 1, the RTM prefers that to the iBGP route so we only see two BGP routes.

R1#show ip route bgp
     10.0.0.0/16 is subnetted, 2 subnets
B       10.2.0.0 [20/0] via 192.168.13.2, 00:38:08
B       10.1.0.0 [20/0] via 192.168.13.2, 00:38:08

R2 is the same as R1.

R3 and R4 learn a default route only from their eBGP neighbor. They learn the two /16s from each other, as well as the default route.

R4#show ip bgp summary | b Neighbor
Neighbor        V          AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
3.3.3.3         4      64496      25      24       11    0    0 00:21:03        3
192.168.24.1    4      64511      30      32       11    0    0 00:26:37        1 

Again, I’m only showing one router’s output.

R4#show ip route bgp
B*   0.0.0.0/0 [20/0] via 192.168.24.1, 00:22:36

Before I start mucking about with metrics, I’ll just shut s1/5 on R2. That should drop the static route from the RIB. This should cause 4.4.4.4 to start using 3.3.3.3 as its next hop for the Internets:

R4#show ip route bgp
B*   0.0.0.0/0 [20/0] via 192.168.24.1, 00:41:27
R4#
*Jul 16 11:37:02.595: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial1/2, changed state to down
*Jul 16 11:37:02.631: %BGP-5-ADJCHANGE: neighbor 192.168.24.1 Down Interface flap
R4#show ip route bgp
B*   0.0.0.0/0 [200/0] via 192.168.13.1, 00:00:06
Which it does.

[1] The difference is that with HSRP we find that both routers advertise the prefixes for incoming traffic but only one receives outgoing traffic.

Musing on naming schemes

I’ve just read another excellent post by the Networking Nerd and thought I’d blog a quick response. With our servers, we follow much the same nonsense as everyone else and give the server a name from popular culture / fiction / mythology / greek letters… My favourite are the mail servers named after female characters from Firefly. However, with the backbone network we wanted something simple, easy to remember (and print on cable labels, but that’s another blog post) and extensible. So we came up with this:

  • We assign letters from the alphabet to a function. For example, B is a core switch, C is a PE router, F is our point-of-presence switch in a building. Actually F came from FroDo or Front Door as there used to be a LOTR thing here but we got rid of it. There are a few others but you get the idea
  • Then we have up to four letters to denote location. If we had a router in Mordor, we might use MORD.
  • We append a number to each logical device in a location. Stacks or VSS / IRF / VPC type groups get an additional letter. So a VSS pair of PE routers in Mordor would be called CMORD1a and CMORD1b respectively. If there was a standalone POP switch it would be called FMORD1. Actually it wouldn’t as there is a different scheme for the FroDos but that isn’t really important here.

There is a published key to all this which is updated if we add a new device type. This system will break down at 26, but that won’t be a problem any time soon here. Further details about all this can be found on our team blog site.

My thoughts on device nomenclature

I think any naming scheme should be simple, unambiguous, extensible and short. The one we use here evolved so has some limitations, but it has served us well.

Network Diagram

Adding a second Internet route to our MPLS VPNs.

At my current place of work we use MPLS VPNs for our wireless service. The design wasn’t mine, but it was done well and has enabled us to remove a couple of nasty trunked VLANs from our core. The only flaw was that it relied on a single 6500 chassis for Internet connectivity. We’re busy forklifting out our 6500 chassis to replace them with 6500Es as they are going end of support in November.  Since we were allowing 4 hours for the upgrade I wanted to add a second link to the Internets.

Network Diagram

The configuration is typical of an MPLS VPN with some subtleties around CE PE routing which isn’t important here. A requirement was that clients connected to different PE routers and clients on different WLANs should not have visibility of each other. This was achieved by using a different VRF for each direction of traffic. The outbound VRFs only import an RT carrying the default route. The inbound VRF imports routes from all outbound VRFs so the traffic can return.

Here are the important bits of config for the Internet VRF. Clearly you need iBGP, MPLS and LDP configured for this to work. We redistribute connected routes so that the client WLANs have a route for the gateway of last resort in their routing tables.

ip vrf WIRELESS-EGRESS
 rd 5.5.5.5:99
 route-target export 5.5.5.5:99
 route-target import 5.5.5.5:10
 route-target import 3.3.3.3:10
! and the rest
router bgp 65501
 ! <snip>
 address-family ipv4
  vrf WIRELESS-EGRESS
  redistribute connected
  no synchronization
  network 0.0.0.0
 exit-address-family
!
ip route vrf WIRELESS-EGRESS 0.0.0.0 0.0.0.0 ser 1/2 192.168.25.1
!
int s 1/2
 ip address 192.168.25.2 255.255.255.252
 ip vrf forwarding WIRELESS-EGRESS
!

So on my second 6500 I added similar config. Although the VRF name and RD are locally significant, I’ve used different ones to keep things clear for myself:

ip vrf WIRELESS-EGRESS2
 rd 6.6.6.6:99
 route-target export 6.6.6.6:99
 route-target import 3.3.3.3:10
!
router bgp 65501
 ! <snip>
 address-family ipv4 vrf WIRELESS-EGRESS2
  redistribute connected
  redistribute static
  no synchronization
  network 0.0.0.0 route-map WIRELESS-EGRESS2-MAP
 exit-address-family
!
ip route vrf WIRELESS-EGRESS2 0.0.0.0 0.0.0.0 ser 1/2 192.168.26.1 tag 10
!
int s 1/2
 ip address 192.168.26.2 255.255.255.252
 ip vrf forwarding WIRELESS-EGRESS2

The route-map simply drops the local-pref to 50 on our second default route, so that the main route is used where available (we have only one application firewall). It also drops the weight to 0 so that hosts using the second 6500 as their gateway don’t default to the local route).

route-map WIRELESS-EGRESS2-RM
 permit 10 match tag 10
 set local-preference 50
 set weight 0

The Internet router, 2.2.2.2 on the diagram, has a couple of static routes to permit return traffic, with the route via our firewall preferred.

ip route 0.0.0.0 0.0.0.0 Serial1/6 192.168.26.1 210
ip route 0.0.0.0 0.0.0.0 Serial1/5 192.168.25.1 220

Finally we just need to import both Internet RTs at each client VRF. We redistribute connected routes into BGP too or else traffic wouldn’t get back. Here is a config sample:

route bgp 65501
 ! <SNIP>
 address-family ipv4 vrf WLAN1
  redistribute connected
  no synchronization
 exit-address-family
 !
ip vrf WLAN1
 rd 3.3.3.3:10
 route-target export 3.3.3.3:10
 route-target import 5.5.5.5:99
 route-target import 6.6.6.6:99

Let’s see if this has worked. First we look at the WLAN RIB before we add our second router:

R3#show ip route vrf WLAN1

Gateway of last resort is 5.5.5.5 to network 0.0.0.0
     192.168.25.0/30 is subnetted, 1 subnets
B       192.168.25.0 [200/0] via 5.5.5.5, 01:05:24
     10.0.0.0/24 is subnetted, 2 subnets
C       10.10.30.0 is directly connected, Loopback10
B*   0.0.0.0/0 [200/0] via 5.5.5.5, 01:05:24

Now we add the second router and wait for the neighborships to come up:

*Jul  3 17:51:46.079: %BGP-5-ADJCHANGE: neighbor 6.6.6.6 Up 
R3#show ip route vrf WLAN1
<SNIP>
Gateway of last resort is 6.6.6.6 to network 0.0.0.0

     192.168.25.0/30 is subnetted, 1 subnets
B       192.168.25.0 [200/0] via 5.5.5.5, 01:15:23
     192.168.26.0/30 is subnetted, 1 subnets
B       192.168.26.0 [200/0] via 6.6.6.6, 00:00:04
     10.0.0.0/24 is subnetted, 2 subnets
C       10.10.30.0 is directly connected, Loopback10
B*   0.0.0.0/0 [200/0] via 6.6.6.6, 00:00:04

If we shut any relevant interfaces on or reboot 6.6.6.6, then the previous state is restored.

*Jul  3 17:57:41.399: %BGP-5-ADJCHANGE: neighbor 6.6.6.6 Down Peer closed the session
R3#show ip route vrf WLAN1

Gateway of last resort is 5.5.5.5 to network 0.0.0.0

     192.168.25.0/30 is subnetted, 1 subnets
B       192.168.25.0 [200/0] via 5.5.5.5, 01:21:05
     10.0.0.0/24 is subnetted, 2 subnets
C       10.10.30.0 is directly connected, Loopback10
B*   0.0.0.0/0 [200/0] via 5.5.5.5, 00:00:02

Technical detail about our environment is deliberately very light as this post is only about adding the second route to the Internet.

Network Diagram

Using IPv6 in my Dynamips Lab

In this short post I will describe how I modified my IPv4 numbering scheme for IPv6.

Overview

I picked 2001 as the first quad nibble as I want to use Global addresses in my linknets. I can always use the Link Local addresses if needed. For the second quad nibble I take the two router numbers in ascending order (I leave the rest at zero as it saves a lot of time). I also decided to use /64s for my linknets.  Although this is wasteful I would like the option to use SLAAC. When statically configuring addresses I use the lowest two in the subnet. For example, the link between R1 and R2 would be 2001:12::/64, with R1 taking 2001:12::1 and R2 taking 2001:12::2.

Network Diagram

Network Diagram

Frame relay

The Frame Relay cloud is still available. For subinterfaces connecting v6 addresses via Frame Relay the only change is that I use 2002:XY::/64, where X and Y are the two routers I’m connecting in ascending numerical order.

Summary

That’s it for this mini series on how I use Dynamips when labbing. Hopefully you will find it helpful. I welcome any comments or suggestions.

Network Diagram

Adding Frame Relay to my Dynamips Lab

In this post I will explain how I set up Frame Relay using Dynamips to provide greater flexibility.

Overview

I want to achieve a full mesh of routers on demand but be able to keep the number of interfaces in general use to a minimum. With that in mind I hooked up each router to a frame relay switch via serial 2/1. I configured a full mesh of DLCIs and used the router numbers to donate the circuit and direction.

For example, the DCLI linking R5 to R6 would be 506. The return DCLI from R6 to R5 would be 605. When configuring subinterfaces I name the interface after the DLCI. So in with this example R5 uses serial 2/1.506 to connect to R6 via Frame Relay, R6 uses serial 2/1.605 for the return circuit.

Network Diagram

Network Diagram

So I can now connect any router to any other router with predicable configuration. I’ve kept the first /30 in the /24s I use for each router’s serial linknets, e.g. 192.168.12.0/30 was the linknet for R1/R2. I use the next /30 for the Frame Relay linknet. Although I don’t currently have a physical link between R5 and R6, I’ll still keep 192.168.56.0/30 back for future use. So when I dial up that Frame Relay circuit, I’ll use 192.168.56.4/30.

Summary

Hopefully it is clear that a vast number of topologies are possible with the combination of the physical links and my new Frame Relay cloud. It surprised me how often I’ve been in the middle of a lab and suddenly needed a direct link between R3,4,5 or 6 so this has come in very useful. I stand by my decision to keep the available physical interfaces to a minimum though; I have usually been able to keep the whole topology in my head while configuring the routers for some funky designs because of it. Sometimes less is more.

Network Topology

My Dynamips Lab

In this first post I will describe how I have configured my base lab in dynamips. I intend to use this lab in my future posts and hope that by explaining how the lab works once, I will be able to limit later posts to to discussing the technologies I’m learning about.

My general principals are:

  1. I don’t want to have to think about the topology when learning a new technology.
  2. Nomenclature should be predictable and extensible.
  3. The topology should be large enough to be useful and small enough to be usable on my laptop.

With that in mind I developed the following topology.

Transit networks or ‘linknets’

Network Topology

I’m aware of RFC 3330, specifically that 192.0.2.0/24 is reserved for examples and documentation but I found one /24 too limiting.

Client networks

I use either the 10/8 or 172.16/12 networks to mimic clients, either on Loopbacks or Ethernet ports. In any topology I’ll incorporate the router number in all its clients. If I’m doing something simple, I’ll probably use 172.16.N.0/24 for each client where 0<N<7. Or if I want a few more networks I may take 10 per router, so R1 would get 172.16.1[0-9].0/24. If I’m doing some crazy MPLS VPNs and want a lot of networks, I may use the 10/8 space and denote the router with the second octet and the VC with the third.

Summary

Hopefully you can start to picture some simple network designs using this topology and also see how a methodical approach to nomenclature and interface choice is of benefit. Next time I’ll add some frame relay to give us a few more options.

 Appendix: Dynamips Config

 [[ROUTER R1]]
        f0/0 = S1 1
        s1/2 = R2 s1/1
        s1/3 = R3 s1/1
        s1/4 = R4 s1/1
        s1/5 = R5 s1/1
        s1/6 = R6 s1/1
        s2/1 = F1 1

    [[ROUTER R2]]
        f0/0 = S1 2
        s1/3 = R3 s1/2
        s1/4 = R4 s1/2
        s1/5 = R5 s1/2
        s1/6 = R6 s1/2
        s2/1 = F1 2

    [[ROUTER R3]]
        f0/0 = S1 3
        s2/1 = F1 3

    [[ROUTER R4]]
        f0/0 = S1 4
        s2/1 = F1 4

    [[ROUTER R5]]
        f0/0 = S1 5
        s2/1 = F1 5

    [[ROUTER R6]]
        f0/0 = S1 6
        s2/1 = F1 6

# for route advertisment
    [[ethsw S1]]
        1 = access 1
        2 = access 2
        3 = access 3
        4 = access 4
        5 = access 5
        6 = access 6