From Huh to Learning: September 2014

19 September 2014

Quirky Operational Situations or QoS

QoS is something that I never had to deal with in a work environment. The only time I ever have to deal with it is on the tests. Makes it something interesting to do. Honestly, though, it really ain't so bad on the routers. Switch QoS is something that is a little different.

Order is important in a policy-map. It applies from the top down. Screw the order up and you can mess up what was intended. A class that matches TCP before a class matching HTTP will never get to the HTTP class.

A lot of tasks for INE and iPexpert (v4) seem to want to try to transfer images from one device to another. Good idea for testing to make sure that HTTP, FTP, ACLs, QoS, etc, is working. Problem is in GNS3, it is hard to do that. I just fire up SLA and run it. May not be the best for some aspects of testing but it works. It won't flood a link but it at least will match HTTP traffic. Commands that I use are
ip sla 1
tcp-connect 155.1.146.1 80
threshold 1000
timeout 1000
frequency 1
ip sla schedule 1 life forever start-time now

To limit the size of the FIFO queues for traffic classes, use the queue-limit command. Starting in 15.2(2)T, the default for this was packets. In other words, a queue-limit 24 would mean that the queue had a size of 24 packets. This command can be used for queue sizes in other queuing methods as seen in a minute.

Turning on WFQ for the class-default class is done through the fair-queue command under that class in the policy map. Once turned on, you can modify the queue sizes with the queue-limit command.

Nested service policies have always gotten me and I don't know why. I can do nested loops in programming. Anyways, say you want to make the overall bandwidth of an interface to be shaped to a specific number and also for the classes to have certain attributes as well. What what you do is create a parent policy that only has class-default as a class, apply the shaping to it, and then use the service-policy command to apply a child policy that has your classes broken out in it. Not really that hard and yet I still screw it up every time. Cisco only allows up to three policy maps to be in a nest. Say you shape with the first one (all traffic), shape even more for the second (TCP traffic), and the third is super specific (HTTP).

As soon as the bandwidth command is given under a class in a policy map, CBWFQ is used. Each bandwidth commands means that that class-map has a FIFO queue attached to it as well. If you configure a bandwidth command under the default class, you turn off fair-queuing. This turns it into a single FIFO queue. If all you have is class-default and you do a bandwidth statement, you have turned the entire interface into a FIFO queue.

If you try to mix bandwidth and bandwidth percent in the same policy-map, it won't work. All units must be the same across all the classes. Didn't do this, but figured that it was was still a good thing to know.

When coming up with a LLQ policy, you need to always remember your Layer 2 overhead. LLQ accounts for that and can screw you up if you forget it.

A task was asking for dropping of packets using random-detect. I figured out how to get it on, but as for actually getting the dropping working, I didn't get that. Solution was to use random-detect precedence, which now makes sense. With random-detect precedence, you tell it what precedence, the minimum and maximum thresholds and the probability. But the question I had was how do you know the precedence to filter on and only get the packets you want? Another portion of this task was to mark all packets of a certain type with IP precedence 2. I took that to mean as they left the interface. They did it as the packets moved in to the ingress interface and then did random-detect on the egress. Sneaky.

CBWFQ has three main drop policies
1) Tail-drop
-- default for user-defined classes
2) Congestive Discard for WFQ
3) Random Early Detection (RED)

To enable random drops on unclassified traffic, the configuration seems to need fair-queue to be enabled in the policy. Do that and then enable random-detect also.

When doing RED with ECN, you may need to enable TCP ECN. Do this with ip tcp ecn. And of course, that command doesn't seem to be available in 7200 Software (C7200-ADVENTERPRISEK9-M), Version 15.2(4)S3. Awesome.

Fun times configuring CIR and Bc values. Bc stands for Burst Committed. CIR is Committed Information Rate. AR is Access Rate (line speed). Tc is Time Committed. Shapers send packets every Tc milliseconds, essentially separating them by Tc time periods. Tc cannot be configured manually on a device but you can use it come up with CIR and Bc. Bc = CIR * (Tc/1000) which means that the Bc is equal to the CIR times the Tc in seconds. The lower the Tc the more the processor will work. There is also Excess Burst (Be) which comes into play. The maximum Be value is equal to the AR minus the CIR all multiplied by the Tc in seconds or maxBe = (AR - CIR) * (Tc/1000).

Single-Rate Three Color Marker (srTCM) is a system that has always boggled my mind. Think I am starting to understand it now. For every 1/CIR milliseconds a token is put into the bucket. The space between those tokens is Ti ms. The size of the bucket is Bc. Excess tokens that are not used are transferred to the excess bucket for size Be. I am thinking that a token is the amount of information that can be send in Tc. So when a packet comes in that is size X, it is checked first against the Bc bucket. If it is less than or equal to the amount of tokens it is marked with the conform action and the Bc bucket is deducted the size of the packet. If it is less than or equal to the Be bucket, then it is marked with the exceed action and the Be bucket is deducted the size of the packet. Finally, it's marked with the violate action if it fails the others.

To account for the sliding window content correctly, it is better to set the policer Bc to the shaper burst size plus the average packet size. This will also give you a little wiggle room.

Two-Rate Three Color Marker (trTCM) uses two different buckets that fill with tokens based on the time intervals configured with Bc and Be. One is for the CIR and the other is for the Peak Information Rate (PIR). There are still three possible outcomes like with srTCM. If the packet is greater than the tokens in the PIR bucket is violates. If the packet is greater than the tokens in the CIR bucket, it exceeds and the PIR bucket is docked the size of the packet. Finally, if the other two are are not met, the packet conforms and the PIR and CIR buckets are docked the size of the packet. Cool. Think I am getting this now.

QoS pre-classify can only be done on tunnel interfaces, virtual templates, and crypto maps. The command is unavailable on any other interface types. It can only be enabled for IP packets as well. This command allows the IOS to apply QoS before the data is encrypted and tunneled. Without this, the interface that has the service-policy applied to it will treat the tunnel as one flow and not respective flows. This command will cause the router to see the header before being put in a tunnel and then use that header for matching class-maps. Class-based traffic shaping doesn't work for DMVPN, so it looks like qos pre-classify is the way to go.

12 September 2014

iPexpert Workbook Topology and My Plan

iPexpert has released the topology that they are using for their study materials. You will find it below.

What I find interesting about this topology, is the sheer scale of it. Rough count was about 35 routes and 8 switches. Looking at the interfaces and I am thinking that this was done in IOU/IOL. I am good with that. I cannot wait to get playing with it. Cannot wait for the workbooks to start rolling more and more from both vendors.

My plan for studying is the following:
1) Finish the ATC section of INE's material (hopefully this month)
2) Read the workbook vol 1 from iPexpert and work areas that are unknown or shaky
3) Work through Cisco Press's v5 book
4) Work through iPexpert's vol 2 workbook
5) Do the INE full labs
6) Re-take my bootcamp with iPexpert in December
7) After the test, continue on things I was weak, and keep doing full labs
8) Take the lab exam in January
9) After passing the test, get drunk

Well, enough of that. Time to move on to QoS.

IPv6: It's Just a Fad That is Becoming MainStream

IPv6 is something that I haven't gotten to play with at work. I have taken a class on it. I have played with it in the lab. I have studied it for tests. Production, though...NOPE!! I do like it. I remember when I first heard about it and someone was saying that it HAD to have IPSec because it was built in. Well, it is, but thankfully, it is not required. Could you imagine the implementation mess that would make.

When using autoconfiguration on an interface, add the default keyword to have the router add a default gateway to the router it is learning its prefix from. Also since routing is enabled essentially on the autoconfiguration-passing router, may be a good idea to turn on ipv6 unicast-routing. Apparently Ethernet interfaces by default have router advertisements turned off. If you are wanting to do some autoconfiguration, you may want to use the command no ipv6 nd suppress-ra but you definitely need to turn on unicast routing for it to work.

In IPv4, you could disable the split horizon per interface for RIP. For IPv6, it is done on the process level. That means you are turning off split horizon on all interfaces that that RIP process is configured on. That seems like a bad idea because what if you don't want it off on all interfaces. Also the EIGRPv6 split horizon is still per interface.

So with IPv6 EIGRP, there apparently isn't a leak map. Interesting. Fun trying to let some prefixes out and hide others then when doing a summary. Think about when doing that for a default route injection but still want to show some routes. Have to redistribute the route and use distribute-lists for the other routes to advertise.

So even though the command reference doesn't mention it, you can still use in and out keywords when doing a distribute-list prefix-list in IPv6 EIGRP. Messed me up since it wasn't shown.

Just read that right now, IPv6 EIGRP can only do equal-cost load balancing. That may be something to look at. Apparently you can tell EIGRPv6 to find paths with unequal metrics but it will say screw you and only balance across them equally.

Two things about summarizing routes in OSPF between areas. One, and this doesn't matter where the summarization is, remember to change the OSPF network type for loopbacks otherwise they come in as /128 or /32 depending on the protocol. Next, the area range command still throws me. It is saying pull from this area anything that matches this range. I don't know why I can't get that. Getting frustrating.

And here I was looking for how to turn on PIM on interfaces. Crap. When you enable multicast routing for IPv6 on a router, it does so for ALL interfaces. You have turn it off on the interfaces you don't want with no ipv6 pim. Also replace ip igmp with ipv6 mld and you just about got it all. Other interesting note is that the documentation for PIM is broken on Cisco's website for the 15 code. I am having to use the 12.4T documentation for PIM.

IPv6 tunnelling has been moved to the Interface and Hardware Component configuration guide. Well, ain't that something. Figured it would have been kept in the IPv6 guide but that was too easy I guess.

So according to what I read, you have to do static routing with Automatic 6to4 Tunnelling. Since 6to4 is a multipoint technology, dynamic routing seems to be an issue. INE says that the common thing is to do a static for the full 2002::/16 network pointing to the created tunnel. Also since ISATAP tunnels cannot extract the destination automatically, static routes have to be used there as well.

Multicast: When everyone gets it but not everyone understands

Got to play with multicast. When broken down step by step like the INE workbook, it wasn't so bad. Hopefully I can maintain that understanding as I move forward. Now, I did notice something interesting before moving on. For the switch portion of this lab, I used IOU. I got snooping working but I could not get profiles or MVR. Really hoping that is not on the test since they are virtual as well.

A big gotcha for this multicast section is that I cannot open the PIM section of the 15 code configuration guide. Any release. To do the multicast section, I has to revert back to the 12.4T configuration guide. The 15 code guide just returns a HTTP 404.

Interesting. There is a command to check the RPF for a source. It is show ip rpf <source_ip>. It shows the following:
R5#show ip rpf 155.1.146.6
RPF information for ? (155.1.146.6)
RPF interface: GigabitEthernet0/0.45
RPF neighbor: ? (155.1.45.4)
RPF route/mask: 155.1.146.0/24
RPF type: unicast (eigrp 100)
Doing distance-preferred lookups across tables
RPF topology: ipv4 multicast base, originated from ipv4 unicast base

To debug RFP, you can disable the CEF switching on interfaces that multicast packets are received on and sent out of with no ip mfib cef output and no ip mfib cef input. Then do a debug of the process switched multicast packets and signal any RFP issues. This will let you know what interface the multicast packets are coming in on and what happened with them. If the packet is accepted (sent on), it will show that too and what interface it is sent out of. The no ip mroute-cache command is deprecated. Cisco is wanting us to use the no ip mfib commands now. Good to know.

When using ip mroute, be as specific as possible for the source. Too loose and you can mess up other multicast sources. It will cause a source to fail RPF by having it look to be coming from the interface in the mroute. Tighter is always better.

So in dynamic routing protocols, the static entries are king. In multicast, static RP entries lose to dynamic entries. Way to beat that is the use of the override option when configuring a static RP. Using override will, you guessed it, override the dynamic option and force the use of the static.

PIM Sparse mode uses tunnels to talk with the RP from the sender. The sender and RP use the tunnel to encapsulate joins. The RP gets two tunnels. One for encapsulating and one for un-encapsulating. These tunnels can mess up configs too. If you paste a config into a router, and the multicast tunnle gets made before a manual tunnel, the manual tunnel will fail if they share the same number. The tunnels are not configurable and can be seen with show derived-config interface tunnel#. An example is below.
R6#show derived-config int tun0
Building configuration...

Derived configuration : 205 bytes
!
interface Tunnel0
description Pim Register Tunnel (Encap) for RP 150.1.5.5
ip unnumbered GigabitEthernet0/0.146
tunnel source GigabitEthernet0/0.146
tunnel destination 150.1.5.5
tunnel tos 192
end

To prevent the fallback from Sparse-mode to Dense-mode in PIM, use no ip dm-fallback. Good for when you have only a couple of groups that you want to be dense without any way of the others coming back to it. Helps keep traffic from going all over.

PIM Assert is used so that only one router on a broadcast segment sends multicast traffic. To determine the winner, the routers let each other know the AD and metric of the routing protocol to get to the source. Best AD wins, followed by best metric. If both are a tie, then highest IP wins. An ip mroute with no distance specified beats EIGRP in AD for example.

When assigning a router to be the RP for a network via AutoRP, I have to remember to enable it for PIM as well. And since two multicast addresses have to propagate throughout the network, sparse-dense-mode is the best choice for multicast. Just remember though that this can be a problem if a negative any (deny any) list is used for the RP any where in the network. This will cause all networks to become dense mode operational.

With the ip pim rp-announce-filter command, if you omit the rp-list option, all announcements with groups matching the group-list are matched. If you don't use the group-list option, then all updates from the RPs in the rp-list are matched. Good way to deny or allow all based on RP or group.

Now I get the use of ip pim autorp listener. If all the interfaces are in sparse mode, it allows the router to still flood 224.0.1.39 and 224.0.1.40 in dense mode. It makes sure that there is no failback to dense mode for any other groups.

PIM NBMA mode only works with sparse mode. Good to know. And since Auto RP works only in dense mode, you have to make changes to the network. One way is to create a tunnel from spoke to spoke so that if the cRP is behind one and the mapping agent behind another, they can talk.

When setting the multicast boundary and wanting to filter AutoRP messages, you have to use a standard ACL. You cannot base it on the source or the RP. It looks at any incoming IGMP and PIM messages to see if needs to drop or allow the traffic. Unicast PIM Register messages are not affected by this.

Where AutoRP floods information about who the RP is or who wants to be it, BSR just goes hop by hop. Should make boundaries easier to configure. No multicast out an interface and done. If only it was that easy. Also have to remember that BSR messages are subject to RPF. Yep. Another one of those bite ya in butt things.

The highest hash-value that you can apply to ip pim bsr-candidate is 31. Good to know if comes up. Using that value with more than one RP, will cause them to load balance. You take one, I take on scenario.

Multicast stub routing is a way to help out smaller sites. It limits what PIM and IGMP information is sent across to the stub router to limit traffic. You configure it on the border router to the stub and then on the stub facing the clients, configure ip igmp helper-address <main router>. The stub router can be set completely to dense mode as well just to make sure that all traffic gets to the distro router and clients. No PIM adjacency is ever formed from the hub to the client router via the ip pim neighbor-filter <ACL> which denies and permits who can form a neighborship.

Yep. It's another Cisco-ism. Multiple ways to do things. You can do ip multicast boundary to filter multicast traffic of course. But there is also ip igmp access-group as well. This filters based on the multicast groups that is trying to be joined. According to INE, it is more common method. When doing the filter, you can use either standard or extended ACLs. Standard ACLs are used to filter ICMP v1, v2, and v3 receivers. Extended ACLs allow you to filter IGMPv3 reports.

To limit the aggregate number of multicast groups that are joined by receivers that are directly connected, use the ip igmp limit command. This can be done globally or per interface. Basically it limits the amount of mroute states created due to IGMP reports.

The designated querier, one that makes sure someone is still listening, is based on the lowest IP address. The PIM DR is based on the highest IP address. Can't overload one router on the segment. Share the wealth.

Periodic IGMP queries are sent based on the ip igmp query-interval command. If a non-designated router running multicast doesn't hear any membership queries based on the time in the command ip igmp querier-timeout, it will try to become the new designated querier. Without that command, the timeout is two times the query-interval of that same interface with the default being 60 seconds. To shorten leave times, you can configure the ip igmp query-max-response-time so that everyone knows to send responses in a timely manner. The ip igmp last-member-query-interval is how fast special IGMP Leave messages need to be seen to be counted. Nothing else, just put on the interface ip igmp immediate-leave with and ACL and that interface no longer cares about multicast once it gets a Leave message.

Steps to make a multicast helper map
1) Set up a multicast network between the two broadcast domains
2) Enable broadcast forwarding on the ingress router to the multicast network with ip forward-protocol
3) On the ingress router to the multicast network, at the interface connecting to the broadcast domain, enter the ip multicast helper-map broadcast command. The ACL for this command has to be extended for the UDP matching.
4) Enable broadcast forwarding on the multicast network egress router with ip forward-protocol as well as put ip multicast helper-map on the interface connected to the multicast network. Again the ACL has to be extended.
5) Enable directed-broadcast on the interface connected to the broadcast network on the egress router. Can also specify a different broadcast address with ip broadcast-address

You can test helper maps with DNS. Enable DNS name resolution on the first broadcast network and don't enter a DNS server. The router will eventually broadcast for 255.255.255.255 and if the ACL for the helper map is any any or specific for that entry, then it gets hit. You can also use an extended traceroute.

When enabling bidirectional PIM, make sure that you enable bi-directional PIM with ip pim bidir-enable. Also learned that the rp-candidate command is particular about its options. Contrary to what the documentation says about the placement of the group-list option in relation to the bidir option, the bidir option has to come last.

When setting up source specific multicast, you don't need an RP if that is all you are doing. The receiver (ip igmp join-group <group> source <source>) and everyone in between builds the shortest path to the source based on the source specified. SSM also uses either the default range (232.0.0.0/8) or can be given a range with an ACL. No shared trees are used for either range and (*,G) joins are dropped. IGMP version 3 only has to be enabled on the receiving interface. Not on all.

To be able to exchange multicast traffic between two different ASs, you need to do the following:
1) Turn on PIM between the two ASs. PIM SM is most common. Limit BSR/AutoRP leaks.
2) Exchange route information using a routing protocol. BGP is most common since it has an extension for this.

When applying the multicast address-family to BGP, make sure to activate all neighbors. Without this, not everyone is going to know about multicast routes. You can also use pre-pending to manipulate routes. If allowed, redistribute the IGP into the multicast address-family for help with RPF checks and routing.

After some initial headaches with finding a Layer 2 image in IOU that does IGMP snooping, I just hit the same problem with MVR (Multicast VLAN Registration). Just beautiful. Anyways, on to MVR. There are four basic steps to configure MVR.
1) Enable it with mvr and mvr group <multicast-group>
2) Set the MVR VLAN with mvr vlan <vlan-id>. This is the VLAN that carries all the multicast traffic and spans all the switches. Feel free to define the mode here as well.
3) Tell the switch what the sending and receiving interfaces are with mvr type <source|receiver> at the interface level.
4) Optionally, create a static group join with mvr vlan <vlan-id> group <ip-address>. This is done on the receiving ports.

IGMP profiles are another one that I cannot do in IOU/IOL. That is fine. I think that I have it figured out. IGMP profiles are for when you want to permit and deny at the switch level. It looks a lot like named ACLs but with numbers instead of a name. I do like the hierarchical way of configuring things. You apply the profile to the interface with ip igmp filter #.

IPSec VPNS and DMVPNS

Finally got to do some DMVPN stuff and I have to say, it ain't so bad. It truly reminds me a lot of Frame Relay. I didn't mind FR. I kinda miss it. But then I am a little off when it comes to technology. :) Well, enough rambling. On to my notes.

To make IPSec tunnels be robust off two interfaces, use crypto map <name> local-address <inteface>. This will allow a crypto connection to be made to one location on the device and not drop if an interface dies. Well, not exactly true. Traffic will be routed out the other interface so a couple of packets may disappear but it is automatic and not a manual process. You just have to be sure to use the address on the other side of the tunnel on the peer.

I never really had to create VPNS before. Trying to do this isn't really that bad. From what I understand, you create the ISAKMP policy, which includes the initial encryption, DH group, hash type, and authentication type. Outside of that, you configure the authentication type parameters, with pre-share being super easy. Next, define the IPSec transform-set with encryption and hash types along with a name. Also don't forget to set the mode to tunnel or transport. Configure ACLs to define what traffic is being encrypted with source and destination. Create a crypto map where you set the peer, the transform-set to use, and the traffic ACL. Finally apply that to an external interface and done. This will allow encrypted and unencrypted traffic out of the interface.

When doing GRE over IPSec, have the ACL for the traffic match protocol GRE and apply the crypto map to the external interface and not the tunnel. GRE over IPSec allows the use of dynamic routing protocols. GRE over IPSec also means fewer SAs. More entries in the ACL for traffic to be encrypted means that there are more SAs needed. One for each entry. With GRE over IPSec, there is only one entry for each location and that means one SA per location. Also by default, the DF bit is not copied from the original IP header to the GRE payload to the ESP header. This may cause you to want to lower the MTU on the tunnel. You can also modify the MSS window in TCP with ip tcp adjust-mss <NEW_MSS> to help reduce packet size and decrease fragmentation.

With a crypto map you have to define peers but with a crypto profile, the peer is pulled from the tunnel. Really only seems that you are putting in the crypto commands to secure the information riding in the tunnel. All other transport information is handled by the tunnel. You also don't have to worry about an ACL to specify what traffic is encrypted. If it goes over the tunnel, it gets encrypted.

I was wondering what the difference between a Virtual Tunnel Interface (VTI) and GRE IPSec tunnel was when I first starting seeing the terms in the workbook. The difference is that the VTI is directly encapsulated in ESP and doesn't need another header to move it from place to place. It also ahs a smaller overhead. With GRE and IPSec, to help increase the payload and decrease the overhead, you have to set the IPSec mode to transport. For VTI, you can leave it as tunnel. Another big difference is that VTIs only carry IP. GRE IPSec tunnels can carry just about anything from IS-IS to your momma. Also interesting that the tunnel interface for a VTI accounts for the ESP header in its MTU auto-magically. Do a show interface tunnel# to see it. This also means you don't have to enter the MTU for the tunnel since it is smarter than the average tunnel. Wonder if a VTI also looks for picnic baskets. There is a thought. Still, setting the ip tcp adjust-mss is a good idea.

DMVPNs aren't really all that scary. At first, I was thinking that with the dynamic nature of them and the encryption, that it was super tough. They really aren't. Set the spokes to point to the NHS (hub) and agree on NHRP authentication, GRE tunnel key number, and whether or not multicast will be used. After that check the hub for registrations and voila. Use show dmvpn and show ip nhrp for verification.

By default multicast is not used within DMVPN tunnels. You have to turn it on with the ip nhrp map multicast command. On the spokes, you put in the "external" IP of the hub. At the hub, you add the word dynamic. Think of doing this like adding the word broadcast to that wonderful protocol called Frame Relay. Also multicast packets are not replicated to other spokes. No spoke to spoke there people.

So what I am gathering is that DMVPN has phases. Phase 1 is spoke to hub only and not good for a lot of traffic going spoke to spoke. Phases 2 and 3 are where traffic can go from spoke to spoke. Phase 2 is where traffic goes from spoke to spoke via asking the hub where each spoke is located. Route updates are sent to the spokes with the next-hop unchanged. Phase 3 is where the spokes respond to NHRP resolution requests.

When creating a DMVPN with IPSec and you want to make it so that the spokes only securely talk with the hub, set the crypto isakmp key to the external tunnel interface. For the hub, you can set it to 0.0.0.0. Gonna have to see if you can enter multiple entries for each spoke. Only problem with that is the hub has to be modified for each spoke added. PKI is the real way to go.

DMVPN uses crypto profiles and crypto maps to add security. If it used maps, it would require the set peer option. Using set peer removes the dynamic portion of DMVPN. Also setting the mode to Transport over Tunnel saves overhead by not encapsulating a header with another header. To make sure that the DMVPN is working, check the hub has ISAKMP and IPSec SAs for each spoke. Only after IPSec connections are made are the NHRP connections made and can be checked.

To do DMVPN Phase 1 with EIGRP, set up the hub like before but the spokes are now going to get tunnel destination and take out tunnel mode gre multipoint. This makes sure that the spokes don't talk with each other and that all traffic goes through the hub. The hub is king in Phase 1.

When doing DMVPN with OSPF, there are three choices. One is to make the hub point-to-multipoint and change the hello-interval to match the spokes. Two is the change the hub to point-to-multipoint and change the spokes' hello-interval to match the hub. Three is to make the hub and spokes all point-to-multipoint OSPF type. The third option cannot be used in Phase 1. The hub always has to be point-to-multipoint. By default tunnel interfaces are point-to-point. Again, sounds more and more like Frame Relay. Frame Relay, my old friend, you come around, again and again.

To do DMVPN Phase 2, two key criteria have to be met. One is that the hub doesn't do any summarization. When done, the next hop is changed to the hub so the spoke sends it there. That brings in key point 2, which is that the next hop cannot be changed. Again, if the next-hop is changed then, all traffic goes to the new next-hop. Other interesting fact with DMVPN Phase 2 is that adjacencies are only formed from the spokes to the hub. No spoke to spoke adjacencies are there. Again, sounds more and more like Frame Relay.

07 September 2014

MPLS or Major Protocol Looking Suite

I was running through the MPLS section of the INE workbook. Nothing too major at first and then the fun started. I have to say, I kinda like MPLS. It has this funny way of looking all sweet and innocent and then biting you on the buttocks with all the underlying complexity. Anyways, on to my lessons learned and things I found fun or interesting.

So when running VRF Lite, you can do a static route from one VRF to another through the use of ip route vrf and stating the interface the traffic will flow out of. This can get cumbersome in some situations but in others, it is fine for a few small fine tunings. Think small deployments. Large deployments and static routes just suck.

When configuring MPLS LDP and you want to use the physical interface IP as the transport IP, use the command mpls ldp discovery transport address interface in config-if or subif mode. Or if nothing else you can just state the IP. Good for when not having the IGP send out the loopbacks or just to be different. :)

With MPLS, not only configure MPLS globally but it is good to also do it on each interface with mpls ip. If using OSPF and say you want it enabled on all the OSPF interfaces, use the command mpls ldp autoconfig under the OSPF process.

So I keep forgetting that when I set up MP-BGP, you have to source it from the loopback and it has to have a /32 subnet-mask. Crap. Also need to realize the routes are to be redistributed with either static or connected. Combine that with not sending communities and well I just jacked up that config horribly. I was right in my thinking, just not in the execution. I just need to get the steps correct now.

When using export maps, set the export route-targets based on the prefixes as wanted and then if there is a default target, don't forget to use a match any statement. Think of it as the catch all.

So, you want OSPF rouutes traveling across the MP-BGP to be recognized on the far side as external. No worries. Configure the PEs in different OSPF domains. This will force the routes to be external. Also if the routes are being blackholed., look at turning on capability vrf-lite which will cause the routes to not look at the down bit.

For an OSPF sham-link, the interface is usually a loopback for the source and destination and has to be known in the VRF. You can advertise this into BGP directly on the PE, since it is a good idea to advertise these into the VRF with a method other than OSPF. Good ole network statement in BGP address-family for the vrf.

Site of Origin is a filtering technique for when a network is multi-homed. When using it, you don't have to use EIGRP route tags and BGP communities to filter routes. You configure it with a route-map that sets the extcommunity soo value to ASN:XX. That route map is applied at the interface level via ip vrf sitemap <ROUTE_MAP>. Now as the route leaves a router, it is tagged and if the router sees it again or a route with the same SoO, it drops it. This is a lot like using route tags but with a lot less work. Don't apply it on the MPLS links. Apply it on the links to the customer site and the links for the backbone. That way all routes that enter and leave that interface are checked. Also a router doesn't ahve to be running VRF to use the sitemap command.

There are two ways to apply a sitemap. One is on the PE interface facing the CE routers using the same SoO on every PE router. Doing this does not allow the MPLS core to be used as a backup between sites. The second way is to apply different SoO values at every PE router. I like the idea of using the same SoO value at each site becaue then you make sure that the route doesn't get put back in at the same location but still allow the use of all links.

With BGP SoO, you configure it per neighbor on the PE router. It is used to prevent routing loops in a multi-homed scenario. Hmm... What does this sound like? Maybe EIGRP SoO. You can do it via route-maps and neighbor route-map commands but to me neighbor soo is much more elegant. You can use a different value on each PE to the CE and then the CEs can match their respective PEs to each other. Keeps the backdoor link a backdoor and allows full route table "respect". If you disrespect the route table, it will make loops that will drive you insane. And if you want to see how many prefixes are getting filtered via SoO, look in the neighbor information under BGP (show bgp vpnv4 unicast vrf <NAME> <NEIGHBOR>). It will list the SoO loop information.

So you have this MPLS cloud connecting multiple sites to each other. Great. Awesome. How about Internet? There are two ways to get the routes to the users for Internet access. You can set the connection to the Internet as a VRF and import and export the routes using route-targets for one. That works pretty easy. Path 2 is to have the Internet connected to the global route table and then create static routes for the VRFs. Easiest way is to the it with the default route for the VRF pointed to the global table and interface. You may then need to do NAT on the addresses as they go out if they are private. I like the VRF method. Just seesm easier and I am all about that.

Think that MPLS MP-BGP is fast? Want it to go faster? There are three factors for how the speed is affected.
1) Time for an IGP update to be redistributed into BGP. This is more event-driven now in newer code, so it is nearly instantaneous.
2) Time to send updates to peers. This can be done faster with setting the neighbor advertisement-interval to 0. This will send the updates not and not wait for a batch job. For newer code, this is the default. Still, I like to be explicit.
3) Time needed to import MP-BGP VPNv4 prefixes into the local table. You can use bgp scan-time under the VPNv4 address-family but that is deprecated. Instead, you can use import path selection all under the vrf address-family in BGP.

04 September 2014

My BGP Experience and Thoughts and Overall Screaming at Myself

So after working the BGP labs from INE, I came across some gotchas, some oh crap moments, and some what the hell was I thinking. Below are those times so that I can relive them over and over.

BGP auto-summarization only happens if there is a network command with a classful subnet or if prefixes are redistributed into BGP. With the network statement, the aggregate is installed into the BGP table if there is a subnet of the prefix in the IGP table. Also auto-summarized routes do not have the aggregator or atomic aggregator attributes in them. This happens because of where the summarization is is performed. It does it on the IGP prefixes and not the BGP prefixes.

Best-path selection excludes some prefixes based on certain criteria:
1. No valid next-hop. If you can't get to the next-hop, how can it be the best-path. Duh. That is hy for eBGP neighbors, you may want to include next-hop-self
2. BGP synchronization is enabled and the prefix is not found in the IGP table. Basic synchronization there. Thank goodness it is off by default.
3. The received prefix has your AS number. Must have come from you so goodbye.

BGP best-path looks at the following list to determine what is a better way to send that prefix out.
1. Ignore paths that don't work. See above list.
2. Highest weight. (Time where more weight is a good thing).
3. Highest local preference
4. Locally originated prefixes. Come in by network, aggregate-address, or redistribution commands.
5. Shortest AS-Path.
6. Lowest Origin Type code where IGP < EGP < Incomplete
7. Lowest MED, so long as the first AS is the same
8. External BGP routes over Internal BGP routes
9. Smallest IGP metric to next-hop
10. Lowest BGP originating router-id

By default MED is only compared when the prefix is received from the same AS. If the prefix comes from into you from AS 54 and 55, then it won't be used. But if they both list AS 54 as the first AS, then it can be.

You can route based on the IGP or the MED. If you use the IGP, that is "hot potato" routing and "cold potato" routing using the MED. With the IGP, you want that traffic away from you as soon as possible. With MED, you are willing to hold onto it a little longer.

If a path contains route reflector attributes (Originator-ID, Cluster-ID), the originator ID is substituted for the router ID in the path selection process. But if you make it this far down the list for the best-selection, then wow. That is one heck of a selection.

So, you have prefixes that are failing to show up in the route table. You check the BGP table and you see that there are RIB failures. Why? Well, there is a command to see why. Use show bgp ipv4 unicast rib-failure and it will out put the following (or something close):

R1#show bgp ipv4 unicast rib-failure
Network Next Hop RIB-failure RIB-NH Matches
150.1.77.0/24 155.1.13.3 Higher admin distance n/a

So, I had a major brain fart. I had to aggregate some routes. No issue, right? Do aggregate-address under the BGP process and all is good? Nope. I did a stupid mistake and forgot about getting the routes into to the BGP table first. Doh!!

There are routes coming into your router or your AS that have communities set. You want a clean slate. Under a route-map use set community none. I was trying to use a no set community no-export. That didn't work.

The advertise-map option of aggregate-address is for showing what prefixes are added to an aggregate.

I got so focused in setting community-lists that I forgot about the comm-list delete option for set in a route-map. Gotta read my command options more carefully in the documentation.

Conditional advertisements come from using an advertise-map with a neighbor statement. They can either be for as long as a prefix exists, broadcast this; or when this doesn't exist, broadcast this.

A BGP inject-map requires two route-maps. One to show what is being injected with any parameters as well. This is done with a set ip address prefix-list. The second route-map includes the match for the aggregated address with a match ip address prefix-list and the second requirement of a match ip route-source prefix-list, which matches where the aggregation originated from. This is not the next-hop but the aggregation origination. This source can change when moving from iBGP to eBGP. Inject maps are great for deconstructing an aggregation.

Filtering with BGP is applied in two different orders depending on the direction, inbound or outbound. Inbound does route-map, filter-list, prefix-list/distribution/list. Outbound does prefix-list/distribution-list, filter-list, and route-map.

So with the no-prepend option of neighbor local-as, the old AS is not prepended to the AS list. The new AS, the one configured with router bgp <AS>, is. This helps so that you can run a new AS number on your borders and run the old AS number internally until everyone is moved over. Without this, the internal routers would reject the route. This also only applies to inbound routes. All external routes have the old AS and the new AS prepended to the them.

Using the replace-as option with the no-prepend option of neighbor local-as, allows the router to replace all instances of the new AS number with the old one. This allows the new AS number to be hidden from the outside world.

The command bgp scan-time is used to adjust the amount of time that BGP process allows before initiating a scan of the BGP routing information. INE used it in their lab talking about the amount of time that the BGP process will wait to process a conditional route advertisement. I was looking for the advertisement coming in and they were talking about it going out. The defautl is 60 seconds.

Outbound route filtering (ORF) has to be applied using neighbor prefix-list in. Needs to be done once the capability has been enabled.

BGP ttl-security needs to be set the same on both sides of a link to make sure that the links stay secure and expect the same value. Also you can use it or ebgp-multihop but not both for the same neighbor.

The neighbor allowas-in is not in the BGP command-list. It is in the MPLS section instead. Makes sense when you think about it. The option is for discontinuous systems and that can happen easily with an MPLS backbone.

Back to Labbing

It felt so good to be labbing again. I missed it. Started back where I needed to with INE's workbook and watched videos from iPexpert. Kinda missed labbing. But I am hitting it now. I finished out the BGP section and I am working on MPLS. I have to go back and hit the Redistribution section but that can wait until the rest of the book is done. My goal is to finish the individual sections, go to Cisco Press's book, move to iPexpert and INE full labs, and then take the bootcamp in December. After that I plan to study as I can over Christmas and then take the test towards the end of January. Heck of a plan. We will see what happens.