IP reliability features are emerging in Internet routers, as Graceful Restart and Nonstop Routing gain momentum

January 8, 2003

4 Min Read
IP Routing Gets a Restart

If Internet Protocol (IP) data services have any hope of replacing Frame Relay or Asynchronous Transfer Mode (ATM) services, they'll need to get a lot closer to the “five nines” of reliability carrier customers already demand of mission-critical services.

As a result, IP routing vendors have been working overtime on solutions that will make their routers more reliable. Cisco Systems Inc. (Nasdaq: CSCO) has introduced its Cisco Globally Resilient IP features for its IOS software, which incorporates features called Nonstop Forwarding and Graceful Restart, now under development as standards in the Internet Engineering Task Force (IETF) (see Cisco Intros Globally Resilient IP). Others, like Alcatel SA (NYSE: ALA; Paris: CGEP:PA) and Avici Systems Inc. (Nasdaq: AVCI; Frankfurt: BVC7), are marketing their Nonstop Routing approaches (see Alcatel Unveils New Routing Technology and Avici Touts Router Reliability).

As these features start to roll out on products, carriers are chiming in with their opinions. The various approaches have their differences: With Nonstop Routing, a backup processor operates with a primary router processor; and in Graceful Restart, the router relies on sophisticated software to get it back up and running in the case of a failure.

“Graceful Restart is a good place to start,” says David Garbin, vice president of network strategy for Cable & Wireless (NYSE: CWP). “But what is really needed is recovery without a glitch. We’re very interested in the Nonstop Routing solutions that claim to do that.”

Router vendors Cisco, Juniper Networks Inc. (Nasdaq: JNPR), Laurel Networks Inc., and Redback Networks Inc. (Nasdaq: RBAK), to name a few, use Graceful Restart. Unlike Nonstop Routing, in which a backup processor receives and stores routing table updates, Graceful Restart helps routers relearn routing information more quickly once the route processor comes back on line.

In Nonstop Routing, the backup processor takes over immediately without disturbing the routing session.

There are drawbacks to both approaches. In Nonstop Routing, there's a high chance that the software errors that produce failures in the primary route are likely to occur in the backup processor. Second, the primary and backup route processors must be synchronized very closely to the have identical information. Because of these issues, most vendors don’t use a completely mirrored approach. Some vendors, like Avici, have equipment that doesn't keep an exact copy of the primary route processor protocols, but instead keeps enough information to continue routing. Equipment from other vendors stores the routing table updates in other parts of the router.

“There’s a tradeoff here,” says Adrian Brooks, an IP consultant with BTexact Technologies. “If you store an exact replica of the routing table on the backup processor, it’s going to have a faster recovery than if you store it in some other part of the router.”

Vendors including Alcatel, Avici, Caspian Networks, Charlotte’s Web Networks Ltd., Chiaro Networks, and Hyperchip Inc. all claim to support some variation of Nonstop Routing. Amber Networks, which was bought by Nokia Corp. (NYSE: NOK), was actually one of the first vendors to suggest such a feature. But all of their approaches differ slightly from one to the other.

In the Graceful Restart approach, a failed router relies on peer routers, rather than backup processors. The router uses extensions to Border Gateway Protocol (BGP) to notify peer routers that they should continue forwarding packets to it, even though its route processor has gone down. When the failed router returns to service, the surrounding peer routers send all the routing information to it again so that it can build a new routing table.

But Graceful Restart has its own issues. While it helps minimize a router’s overall downtime drastically, it can potentially cause routing loops or "black holes" in the network if routing information changes before the recovered router is able to complete its updates and convergence. As a result, the restart may need to be tuned as more deployment experience is gained.

Also, depending on the network topology, there could be routers in the network that don't understand Graceful Restart, increasing the exposure to routing loops or black holes.

C&W's Garbin says that Graceful Restart is an improvement over the traditional approach of rebooting a router, which can take anywhere from 20 minutes to a couple of hours. But he says that carriers are still hoping for more.

“Graceful restart is an improvement, but it’s not seamless failover,” says Garbin. “It significantly cuts down the reconvergence time, but we’d like to not have to reconverge at all.”

He adds that some packets are usually dropped during Graceful Restart. This may not be a problem for applications like email or even voice. But compressed video can’t tolerate even a hiccup, he says.

Other experts in the industry agree, and many routing players say they are working towards a reliability approach of their own.

“We don’t offer [Nonstop Routing] now,” says Steve Vogelsang, vice president of marketing and co-founder of Laurel Networks Inc. “But it’s definitely something on our roadmap.”

If you’d like more information on these and other IP reliability issues, check out the Light Reading Webinar scheduled for January 23, 2002. To register, click here: IP Reliability: Adding Five 9s Resiliency to IP Networks. — Marguerite Reardon, Senior Editor, Light Reading

Subscribe and receive the latest news from the industry.
Join 62,000+ members. Yes it's completely free.

You May Also Like