x
Optical/IP

WorldCom's IP Outages: Whodunnit?

Data customers on the UUNet backbone owned by WorldCom Inc. (Nasdaq: WCOM), the largest Internet backbone in the world, experienced several intermittent outages this morning.

The outages spurred a flurry of debate and speculation among email posters on the North American Network Operators' Group (NANOG) mailing list.

WorldCom officials blame the problem on a train derailment that occurred in Ohio, 50 miles south of Toledo, resulting in fiber cuts. Meanwhile, independent engineers pointed to Cisco Systems Inc. (Nasdaq: CSCO) routers, which Cisco officials later confirmed. But the bottom line may be: If there's a fiber cut or router problem, isn't the network supposed to stay up anyway?

According to Linda Laughlin, a spokesperson for WorldCom, two separate fibers were cut. One was damaged when the train accident occurred at around 6:30 AM Central Time. The other fiber was cut later in the day, when cleanup crews were removing wreckage from the site. WorldCom immediately deployed crews to repair the fibers, says Laughlin.

But ISP network engineers on Nanog say that UUNet engineers are telling them a different story. They say the issue is linked to the Cisco routers deployed in UUNet’s network.

Cisco confirms there were problems with its routers in the UUNet network today. According to Martin McNealis, director of IP product management, there was a bug in an older version of Cisco's IOS routing software that only appears in certain instances when the IS-IS routing protocol is running. McNealis says Cisco discovered the problem well over a year ago and has fixed it in its more recent versions of IOS. But he says UUNet was running an older version of the software that did not have the patch.

The bug caused memory corruption in several Cisco routers, wiping out entire routing tables and causing delays while routers rebooted and repopulated their routing tables. The problem continued all morning, affecting ISPs across the country from Boston to Memphis to San Francisco (www.lightreading.com was among those affected).

Richard Steenbergen, an independent network engineering consultant, says he experienced a similar situation with another inter-domain routing protocol, OSPF, which crashed several Cisco GSR 12000 routers at another large tier-one carrier a couple of years back. He says the bug and the series of events that triggered it would not likely appear in testing.

Steenbergen blames Cisco’s apparent router instability on its IOS routing software.

"Because of its monolithic design and lack of protected memory space for individual components, IOS is notorious for bringing down the entire router if so much as a single error occurs," he says.

But Cisco's McNealis says that if the same problem occured in any other router, such as one from Juniper Networks Inc. (Nasdaq: JNPR), it would have had the same effect.

"When you have a memory corruption problem and you lose the routing tables, it takes time for the routers to talk to each other," he says. "There may be variations in recovery time, but in a similar situation an outage would have also occured in a Juniper router." Officially, WorldCom is sticking to its story and has not issued any statement about a router problem. But McNealis says this is the first he has heard of a fiber cut.

— Marguerite Reardon, Senior Editor, Light Reading
http://www.lightreading.com
Page 1 / 10   >   >>
skeptic 12/4/2012 | 10:31:31 PM
re: WorldCom's IP Outages: Whodunnit? When you have a memory corruption problem and you lose the routing tables, it takes time for the routers to talk to each other," he says. "There may be variations in recovery time, but in a similar situation an outage would have also occured in a Juniper router."
-----------------

What he says is partially true. However, the
design of IOS (lack of protected memory) means
that software problems far removed from routing
itself can cause memory corruption in the
routing. All the "eggs" are in the same basket.



intranic 12/4/2012 | 10:31:31 PM
re: WorldCom's IP Outages: Whodunnit? if folks want to hear about gossip on nanog, they will listen there. this is a pretty lame story.
mfg_boy 12/4/2012 | 10:31:30 PM
re: WorldCom's IP Outages: Whodunnit? Amen! Testify the truth will set you free.
pooh-bear 12/4/2012 | 10:31:29 PM
re: WorldCom's IP Outages: Whodunnit? This is SHOCKING!

A very insightful piece of reporting.
skeptic 12/4/2012 | 10:31:28 PM
re: WorldCom's IP Outages: Whodunnit? Always the same debate, IP is the problem more specifically Layer 3 routing protocols, they just don't scale.
-----------------

In what way don't they scale in your opinion?

And as far as connection-oriented networking
goes, ATM had its chance to replace IP and
failed for any number of reasons. And PNNI
is every bit as horrible as IP routing protocols
are in terms of complexity.

Lichtverbindung 12/4/2012 | 10:31:28 PM
re: WorldCom's IP Outages: Whodunnit? Always the same debate, IP is the problem more specifically Layer 3 routing protocols, they just don't scale.

OSPF is bad, IS-IS is slightly better and BGP is the worst. Everybody knows it but what can you do ? Invent a better way to do it (connection oriented, scalable). Too dificult, too large of an embedded base of routers.

Let's keep inventing enhancements to BGP and IS-IS between ponytails at the IETF and patch IOS, JUNOS until the next crash.

This is IP, if you have mission-critical applications, use leased lines !!!

Lopez 12/4/2012 | 10:31:28 PM
re: WorldCom's IP Outages: Whodunnit? The problem of IP scalability is not a matter of routing technology. The problem is the addressing. If people would be willing to use/assign addresses in a hierarchical way, IP would scale pretty good. Unfortunately *because* the IP routing protocols scale good enough to be very relaxed in your addressing assignments, people get more and more used to not thinking about it. They assign IP addresses almost like their are MAC addresses. :) And when they run into deployment limitations, they simply call their vendor and yell "fix it".

This relying on the vendors to "fix their code", in stead of ISPs fixing their own random address assignment strategies, that is what is broken.


ISP address allocation isn't broken, multihomed customers are what is broken.
Flower 12/4/2012 | 10:31:28 PM
re: WorldCom's IP Outages: Whodunnit? Yes, let's put leased lines between every pair of computers that want to talk together. That has excellent scaling properties !!

The problem of IP scalability is not a matter of routing technology. The problem is the addressing. If people would be willing to use/assign addresses in a hierarchical way, IP would scale pretty good. Unfortunately *because* the IP routing protocols scale good enough to be very relaxed in your addressing assignments, people get more and more used to not thinking about it. They assign IP addresses almost like their are MAC addresses. :) And when they run into deployment limitations, they simply call their vendor and yell "fix it".

This relying on the vendors to "fix their code", in stead of ISPs fixing their own random address assignment strategies, that is what is broken.

Oh, btw, I guess all this fingerpointing is just gossip. Nothing more.
Lichtverbindung 12/4/2012 | 10:31:27 PM
re: WorldCom's IP Outages: Whodunnit? Flower,

You said :
Yes, let's put leased lines between every pair of computers that want to talk together. That has excellent scaling properties !!

I say :
Have you heard of cross-connects ? They are big, they are fat, they don't scale but guess what ...the revenue-generating customers and their mission-critical apps are using them.

If you run IT for a bank, you put your traffic on leased lines, or if you like risk on frame relay. You will NEVER put it on a public IP network. NFW.

You and me that do not have anything better to do with our lives than posting useless messages on LR boards are using routers. And if my message is lost, who cares ??

That's why ISPs have not, don't and will not make money ! IP is just the wrong technology !
dietaryfiber 12/4/2012 | 10:31:27 PM
re: WorldCom's IP Outages: Whodunnit? Imagine what its going to be like managing the number of endpoints that exist in the PSTN. There are about 5x the number of endpoints there as there are on the Internet

dietary fiber
Page 1 / 10   >   >>
HOME
SIGN IN
SEARCH
CLOSE
MORE
CLOSE