WorldCom Outage Only the Start

This month's glitch on the UUNet backbone that cut service to thousands of WorldCom Inc. (OTC: WCOEQ) customers and slowed Internet traffic around the globe has been described as the worst outage in recent memory (see Who Broke WorldCom's Backbone? and WorldCom: Feeling Better Now).

But some say it's only the beginning.

"I think that we haven’t even seen the tip of the iceberg yet," says Alex Yuriev, an independent technology consultant.

“It will happen again,” agrees Tom Ohlsson, the vice president of business development at Matrix NetSystems, Inc., an independent network monitoring firm. “It may not be WorldCom, but it will happen again.”

Massive layoffs and the dramatic drop in capital and operations spending is showing up in service degration, analysts say. While some folk were almost expecting a large-scale breakdown of the UUNet backbone following WorldCom’s gigantic accounting and bankruptcy woes over the summer, other networks have been experiencing similar problems, they warn -- problems that are getting worse (see WorldCom Workers Get the Shaft, Carrier Spending Hopes Dim, Whither WorldCom's Network?, AT&T: WorldCom Shutdown No Problem and WorldCom's at $7.1 Billion and Counting).

"This was very similar to the AT&T outage about a month ago. I think that the cash crunch has a significant effect on this.” Yuriev says.

According to Ohlsson, carriers have continued to take care of their core networks despite the capex crunch. But they have let their peering relationships, whereby network links are shared with other operators, slip a bit, and it’s showing in the performance of the Internet. Peering relationships take daily maintenance, he says, and they're not always getting it in today's environment.

Not true, says WorldCom spokeswoman Jennifer Baker. “At WorldCom, obviously the core of our network is very important, but [peering] is equally important,” she asserts.

Meanwhile, argument continues about what exactly caused WorldCom's outage. Initial reports indicated that the outages that started at 8 a.m. on October 3 and continued through most of the day were caused by faulty software loaded onto UUNet edge routers during a routine software upgrade. But Baker says the problem occurred when a technician who was repairing a gateway or border router in St. Louis made a configuration change that caused a routing instability in the network. This caused intermittent outages that lasted between 15 and 30 minutes each, she says, claiming that the problems were resolved by 2 p.m.

There's been much speculation about the supplier whose equipment was at fault. WorldCom buys most of its routers from Cisco Systems Inc. (Nasdaq: CSCO) and Juniper Networks Inc. (Nasdaq: JNPR), but the carrier won't say which of the vendors provided the routers in question. “We’re working with that vendor to make sure the product is more stable,” Baker says.

She also concedes that WorldCom has changed its operating procedures.

That’s a good thing, says Yuriev, who says WorldCom’s procedures, or lack thereof, should get most of the heat for what happened. The carrier was trying to do two things at the same time, he says, likely with inadequate communications among departments doing the work. He points out that while some of his clients that use WorldCom local loop services on the East Coast received notices of maintenance on transport nodes, others were told at the same time that UUNet would be performing a software upgrade.

Baker says the software upgrade happened within the normal maintenance window, which is very late at night or very early in the morning, and she says it did not overlap with the router repair. She wouldn’t comment on Yuriev’s speculations.

WorldCom’s troubles over the outage are probably far from over. Cutting service to a large number of customers for up to six hours is likely to have violated a slew of service-level agreements (SLAs), and customers will be asking for compensation. A service breach on this scale might prompt some to terminate their contracts altogether.

“We will honor any SLA agreements we have with customers,” Baker says, admitting that 20 percent of WorldCom’s U.S.-based customers were affected by the outage.

Bottom line? According to Ohlsson, the situation in which thousands of Internet users were deprived of a fast and flawless connection to the Web drives home the need to diversify and host with multiple providers. “It’s not like the sky is falling… but we do feel that it’s going to get worse before it gets better.”

— Eugénie Larson, Reporter, Light Reading
Consultant 12/4/2012 | 9:33:14 PM
re: WorldCom Outage Only the Start A lot of recent Lightreading articles seem to be a bit sensationalistic. This article is complete conjecture. The journalist, who probably has not worked in the telecom industry, did not do her homework. The FCC publishes outage reports on its web site and there are industry experts such as Andy Snow who did disseratations on the long term trends in network outages. There is no evidence that that work was reviewed.

Finally, major network outages on the UUNET scale have occurred in the past and hence to claim based on one data point that this is the emergence of a trend is flismy. Software upgrades triggered a massive AT&T frame relay outage several years ago. Indeed, software upgrades are one of the most likely events to trigger a network outage.

Lightreading should issue fewer articles and do more research per article. In addition, there is excessive reliance on the Yankee Group and other consulting organizations that have vested interests in pushing Ethernet and other technologies. The result is Ethernet cheerleading in other articles.

- A former AT&T industry analyst.
eyesright 12/4/2012 | 9:33:12 PM
re: WorldCom Outage Only the Start Consultant,

Say, you aren't accusing Lightreading of - gasp! -yellow journalism, are you?

Whoops, excuse me, Jerry Springer just started on cable....
67scout 12/4/2012 | 9:33:11 PM
re: WorldCom Outage Only the Start I wouldn't go so far as call this hype. I've had PUC's on my case for a 3 minute outage on a 9-1-1 network. Any outage that takes a day to fix and interupts service for a good portion of the day is not a minor incident. We're seeing the effect of what happens when loads hit the street without proper or complete in house testing due to the lack of staffing, or getting a load out just to meet the end of the quarter revenue. Even if a problem occures, the fix is to back out the load, go back to the lab and figure out what happen, not spending the day trying to figure it out on a live network. I guess we're seeing the effects of improper planning, and the lack of good MOP (Method Of Proceedure). This is sad!

An Old Telco Guy.
Consultant 12/4/2012 | 9:32:21 PM
re: WorldCom Outage Only the Start It is hype to take a single data point and use it to predict a trend.

I might add that it was Worldcom that aided Verizon during 9/11 by lending Verizon Worldcom's metro ring in NYC. Verizon lost connectivity between two key telecom hotels and it was Worldcom that made capacity available on its OC-48.

Sign In