Who Broke WorldCom's Backbone?

Just when things seemed like they couldn’t get any worse for WorldCom Inc. (OTC: WCOEQ), they did. Yesterday the carrier experienced a massive outage on its Internet backbone that affected thousands of customers, including many businesses in the U.S. and those outside the country attempting to access content hosted in the U.S.

The company issued a statement late yesterday attributing the problem to a “route table issue.” (See WorldCom: Feeling Better Now.) The outage was supposedly caused by a software and routing table update late Wednesday night. As the new software code went live at 8:00 AM on Thursday morning, traffic stopped flowing, according to Matrix Netsystems, a company that measures Internet traffic. Hundreds of routers at 53 points of presence across the country were affected.

Because the outages seemed to occur at peering points -- places in the network where one carrier's traffic is handed over to another carrier -- some people have speculated that the problem was related to border gateway protocol (BGP) routing. They suspect that WorldCom was delivered a bad batch of new software from its routing equipment supplier that caused the problems in the routing tables. “At about 8 a.m., WorldCom’s packet loss went from zero to 22 percent. That’s huge,” says Tom Ohlsson, vice president of marketing and business development for Matrix Netsystems. “They were not passing any traffic.” Most network architects notice performance degradation even with 1 to 2 percent packet loss, he adds.

Nobody is saying for sure which routers caused the outage, but the list of candidates is short. WorldCom has only two suppliers for routing gear: Cisco Systems Inc. (Nasdaq: CSCO) and Juniper Networks Inc. (Nasdaq: JNPR). Both companies have counted WorldCom and its Internet backbone subsidiary UUNet as major customers for years. Until its recent financial problems, WorldCom had consistently been a 10 percent customer for Juniper. Cisco’s relationship with UUNet is also strong. The service provider has been using Cisco's GSR core routers since they were first introduced in 1997. Back then, Cisco announced a $50 million contract with WorldCom to supply it with GSRs and 7500 edge routing gear.

Considering that the problem was likely caused by a software glitch in an upgrade to the operating system, many people have pointed to Cisco gear as the culprit. Cisco’s IOS software, the operating system used to run all of its networking gear, is made up of thousands of lines of code. For this reason, bugs in the software are common. Cisco confirmed that WorldCom suffered another major outage in April when a bug in one its versions of IOS surfaced (see WorldCom's IP Outages: Whodunnit?).

“Odds are it’s probably a Cisco problem,” says Dave Passmore of Burton Group. “Historically, it seems that people have had more problems with Cisco software upgrades. It could be that there are just more of them deployed, so we hear more about it. I don’t know enough about this situation to really say one way or the other if it was Cisco or Juniper.”

Cisco would not comment for this story, but a Juniper spokesperson said that its equipment was not involved in the outage.

Whoever is at fault, the problem was widespread and could potentially cost WorldCom a sizable chunk of change as customers call in for refunds on their service-level agreements. In a statement issued by the carrier last night, it stated that roughly 20 percent of its IP customers were hit by the outage. But the actual number is likely much higher considering that WorldCom also hosts Web servers. Providers AT&T Corp. (NYSE: T) and Sprint Corp. (NYSE: FON) both say their customers experienced delays yesterday in accessing Websites hosted by UUNet. Although these carriers tried to downplay the effects to them as well, Ohlsson or Matrix Netsystems says that it was significant.

“AT&T was hammered just as hard as WorldCom -- and AT&T didn’t do anything wrong,” he says.

He adds that carriers such as Avantal, a service provider in Mexico, suffered massive disruptions. This is because the Mexican carrier partners with UUNet to use its backbone to carry most of its Internet traffic across the U.S. Nearly 65 to 75 percent of all Internet traffic traverses UUNet’s backbone, says Ohlsson.

Businesses across the country were affected. Some had no Internet access, while others experienced delays for most of the day. Companies such as Verisign supposedly lost thousands of transactions yesterday, costing the company substantial business. Light Reading experienced sporadic problems accessing Web servers at its hosting provider, which was connected to the Internet through a UUNet connection.

A big question is whether the outage may have been precipitated by recent troubles at WorldCom. Some suspect that WorldCom’s network management groups are understaffed with overworked engineers. It also seems strange that the company would be attempting a major upgrade in the middle of the week.

“That’s the kind of thing that is usually done at midnight on a Saturday or Sunday night,” says one telecom engineer.

— Marguerite Reardon, Senior Editor, Light Reading
Page 1 / 11   >   >>
EzraTzvi3 12/4/2012 | 9:37:40 PM
re: Who Broke WorldCom's Backbone? This is classic:

GÇ£Odds are itGÇÖs probably a Cisco problem,GÇ¥ says Dave Passmore of Burton Group. GÇ£Historically, it seems that people have had more problems with Cisco software upgrades. It could be that there are just more of them deployed, so we hear more about it. I donGÇÖt know enough about this situation to really say one way or the other if it was Cisco or Juniper.GÇ¥

Typical Maggie. Doesn't work hard enough to get a meaningful quote, so she just goes with whatever she gets, even if the source admits ignorance on the issue. Boy, LR is getting better and better.
wilecoyote 12/4/2012 | 9:37:40 PM
re: Who Broke WorldCom's Backbone? I am laughing right now. What a way to cruise into the weekend.

Remember the massive outage around 96 timeframe, caused by Stratacom gear? Cascade won a lot of business on that outage. JNPR must be lickin' its chops.

tester099 12/4/2012 | 9:37:31 PM
re: Who Broke WorldCom's Backbone? Blaming IOS is easy. But, let's see, wasn't Junos 5.5 released just a few days ago?


Paul Andrews 12/4/2012 | 9:37:30 PM
re: Who Broke WorldCom's Backbone? From what I am reading, this incident is essentially immaterial, or, at worst, falls in the category of things that happen every once in a while. Although it was a nuisance for some, and a juicy story for those who really get exited over lost packets and latency, the point of our modern networks is to be able to survive short-term "fiber cut" incidents and reroute the signals. I read this happened.

So there was a glitch. So what? Things don't always go as expected, especially when it comes to installing new software.

Now, if this is the first sign of chronic degradations in the networks' performance, I think we have a lot to be concerned about.

Until I see consistent signs of that kind of system failure, I would encourage everyone to stay calm and stay the course. The stuff works!
BobbyMax 12/4/2012 | 9:37:29 PM
re: Who Broke WorldCom's Backbone? Many carriers make fundamental error in chosing their suppliers. Both Cisco and Juniper do not have carrier grade routers. They have not been certified by Bellcore/Telecordia. It is not clear if their products is Osmine certified.

It is a common knowledge that products from Cisco are not carrier grade product. Another fundamental problem with Cisco is thar never publishes performance. Perhaps due to the fact that all its products except ONS 15600 have been acquired through acqisitions.

The lesson to be learned is that carrier grade equipment cannot be bought from non-carrier grade equipment manufacturers.
Phiber_Phreak 12/4/2012 | 9:37:29 PM
re: Who Broke WorldCom's Backbone?
Rather than blaming Cisco or Juniper or whoever, why don't LR and WCOM 'fess up to the possibility that the outage could have been caused by human error, like a misconfiguration?

Happens to even the best router jocks (or their clue-impaired bosses).

Or doesn't that make as juicy a story?

This is nothing more than sensationalism on LR's part and (possibly) a coverup of their own oops on WCOM's part.


ps. That analysts quote is a joke: Cisco did it. Well, maybe they didn't.
router123 12/4/2012 | 9:37:26 PM
re: Who Broke WorldCom's Backbone? Osmine certification does not mean that the
product is stable. The certification is just
an endorsement that a vendor's product works
with the Telcordia OSS. period.

So a lot of junk out there can still be Telcordia
certified, if they have the big bucks to throw
at the Telcordia hounds.

For one, i believe that Cisco has a larger and
longer exposure to IP and a better process as
far as product testing goes. Lets wait on the
prognosis of the outage before speculating.

LR, i wish you dig in and continue following up.
mha101 12/4/2012 | 9:37:26 PM
re: Who Broke WorldCom's Backbone? "CiscoGÇÖs IOS software, the operating system used to run all of its networking gear, is made up of thousands of lines of code. For this reason, bugs in the software are common."

I gues Juniper's JunOS is made of less then one thousand lines of code or maybe a hundred line or less, hence no chance of bugs!

HAHAHA - what a crappy article!
deer_in_the_light 12/4/2012 | 9:37:25 PM
re: Who Broke WorldCom's Backbone? We need more outages so that the tight-ass CFOs running the major RBOCs or IXCs realize that some investment is required to keep th enetwork running.

Network crash GOOD
broadbandboy 12/4/2012 | 9:37:24 PM
re: Who Broke WorldCom's Backbone? I have a question for all the network experts that post on this board.

This is UUnet's second widespread outage this year (at least that we know about). Parts of their network, affecting "thousands of customers" were affected.

What I want to know is, how do the ISPs account for these outages when calculating their claims of 3 or 4 or 5x9s reliability? Five nines means five minutes of downtime a year, right? If you are a UUnet customer that lost connectivity to the Internet for, say an hour, this single event sends that reliability metric right down the toilet.

But parts of UUnet continued to operate, so are they going to claim "the network was up" during this outage?

AT&T's IP net also crashed in the Chicago area recently. Are they still claiming 5x9s?

I am betting there is no objective standard for that bogus measurement, so every carrier can calculate it any which way they want.

Page 1 / 11   >   >>
Sign In