Sunday, April 12, 2009

RIPv1 Holddown

Several of you wanted a more in depth understanding of one of the primary activities of convergence - holddown. As you will remember from our class discussion, convergence involves four primary activities: update, invalid, holddown, and flush. These are the timers that a Cisco router uses to control the way RIP reacts to changes in its routing table.

The following discussion of Cisco’s implementation of RIP holddown comes from Robert Wright’s book, IP Routing Primer, ISBN 1-57870-108-2. I have found his discussion on holddowns to be most helpful. According to Wright, a router puts a route into holddown under one of three conditions.
1) The router that was advertising the route stops advertising it for 180 seconds (invalid period).
2) The router that advertised the original route sends a new advertisement for the same route with a metric greater than the metric stored in the routing table. This usually indicates that there is a routing loop, which causes the route to be immediately deleted and put into holddown instead of being forced to wait for the invalid timer to fire.
3) The router that was advertising the route sends a new advertisement for the route with an unreachable metric, otherwise known as poisoning the route.

Cisco’s RIP holddown, 180 seconds by default, refers to routes that have been marked as invalid but are not yet capable of being replaced with a new route of a higher metric. Holddowns prevent routes from changing too rapidly by allowing time for either the downed route to come back up or the network to stabilize before accepting a route to the same destination with a worse metric. The idea of the holddown period is that if the path you are using to reach a particular network goes down, you wait for some time before switching to another path. During the time a router has that route in holddown, it continues to forward any packets it receives (updates and user traffic) that are destined for that particular network with a cost of infinity.

According to Wright, one of the reasons for having routing protocols behave this way is based on the assumption that temporary packet loss due to using routes to networks that might not be viable is better than immediately accepting a less desirable route to the destination network. In a discussion on route flapping due to congestion on a link between two routers, RouterA and RouterB, he gives us a pretty good understanding of the reasoning behind holddown. When RouterB stops advertising 168.71.8.0 (a LAN directly connected to RouterB) to RouterA (RouterA and RouterB are connected via a serial link) for 180 seconds (the invalid timer expires), RouterA puts route 168.71.8.0 in holddown.

By allowing these packets to be dropped instead of sending them via the less desirable path, through RouterC, RouterA and RouterB are giving the hosts off of RouterA and RouterB a chance to react to the dropped packets by sending fewer packets at a time – perhaps even sending smaller packets. This, as he tells us, would require that either the applications in use or their underlying protocols keep track of packet loss and to respnd accordingly.

If RouterA immediately accepted the less desirable route, through RouterC, as soon as the invalid timer expired for the downed route 168.71.8.0 on RouterA and forwarded all traffic to 168.71.8.0 over it, and congestion on the more desirable, original path would cease to be a problem.

The routing updates that had been getting dropped due to congestion would start arriving again, and RouterA would immediately go back to using its link to RouterB to reach 168.71.8.0. At this point, the problem would start all over again (route flapping which is when a route continuously switches between two different next hop routers).

If the traffic pattern that caused this problem is more than just an anomaly, it will be necessary to either increase the speed of the link between RouterA and RouterB or permanently configure the hosts to send fewer packets at a time (and possibly smaller packets as well) to prevent it from happening again.

No comments:

Post a Comment