& cplSiteName &

Server Glitch Crashes T-Mobile Network

Michelle Donegan
LR Mobile News Analysis
Michelle Donegan
4/22/2009
50%
50%

T-Mobile Deutschland GmbH suffered a massive network outage Tuesday that left all of its 40 million customers unable to make calls or send texts for four hours.

The service collapse was caused by a software problem in the network that caused the Home Location Register (HLR) servers supplied by Nokia Networks to crash, which, with refreshing accountability, has issued an apology.

"We apologize to all T-Mobile customers for the service disruption," says a Nokia Siemens Networks (NSN) spokesman. "We're working closely with T-Mobile to identify the software problem and why the redundancy built into the system didn't kick in."

The HLR database holds all of a mobile operator's subscriber information -- such as location, preferences, account status -- which is critical for routing calls. If this database goes down, calls cannot be completed.

In T-Mobile's calamitous outage yesterday, the subscriber locations were not being updated on the central HLR database, according to an NSN spokesman. All the HLR servers had to be taken offline and brought back up one at a time, which explains why the outage lasted as long as four hours.

The HLR servers that went wrong yesterday were installed by T-Mobile as part of a major upgrade project to streamline its subscriber databases. The servers in question are part of the product line Nokia Siemens acquired when it bought Apertio Ltd. for $206 million in January 2008. (See T-Mobile Picks NSN, Nokia Siemens Snaps Up IMS Vendor, NSN Completes Apertio Buy, and LR Names 2008 Leading Lights Winners.)

Counting the costs
A T-Mobile spokeswoman would not comment on how much yesterday's network outage would cost the operator in lost revenues.

But Germany-based telecom executive Chris Larmour, chief marketing officer at Actix Ltd. , which provides automated network status management systems to operators such as Vodafone Group plc (NYSE: VOD), estimates it could be about $100 million.

Larmour believes the severity of T-Mobile's network outage yesterday could have been limited.

"Some of this could have been avoided," says Larmour (who, incidentally, is not a T-Mobile customer). "It went wrong and no one was able to manage it. It took them four hours to figure it out. It will take them months to get back to normal."

The outage could be costly, not just for T-Mobile in terms of lost revenues, damaged market perception, and service reputation, but also for NSN if T-Mobile decides to seek compensation.

After a similar network outage back in 2004 at Bouygues Telecom in France, the operator sued its HLR supplier, Tekelec , for $81 million in damages. (See Bouygues Sues Tekelec Over Outage.)

— Michelle Donegan, European Editor, Unstrung

(8)  | 
Comment  | 
Print  | 
Newest First  |  Oldest First  |  Threaded View        ADD A COMMENT
menexis
50%
50%
menexis,
User Rank: Light Beer
12/5/2012 | 4:06:29 PM
re: Server Glitch Crashes T-Mobile Network


For an outage to last as long as four hours is a bit too long if you ask me. By professional standards, an hour of outage is too long in this business. You would think that they would have some sort of contingency in place being that you are dealing with 40 million mobile customers. The damages could be well over 100 million.

Michelle Donegan
50%
50%
Michelle Donegan,
User Rank: Light Beer
12/5/2012 | 4:06:28 PM
re: Server Glitch Crashes T-Mobile Network
Yesterday wasn't a good day for Deutsche Telekom/T-Mobile, that's for sure -- with a profits warning and network failure all in one day.

Michelle
greatwall7
50%
50%
greatwall7,
User Rank: Light Beer
12/5/2012 | 4:06:19 PM
re: Server Glitch Crashes T-Mobile Network
1> Apertio was a small company with limited people which NSN acquired in 2008,and till now the expertise on this system has not been percolated down to the regular NSN sales, engineering and support staff, leading to dependancies on few people and leading to support & trouble shooting delays.

2)Apertio with 20k TPS ranks lowest in comparision with competitor products, and the HW of the FE ( server + ss7 modules ) are non carrier class, making it prone to failures.
LECOTEJR
50%
50%
LECOTEJR,
User Rank: Light Beer
12/5/2012 | 4:06:13 PM
re: Server Glitch Crashes T-Mobile Network


Michelle,


 


Please check youor facts "before" submit articles. Tekelec, Inc does not manufacture HLR's. The cause of the outage was not caused by an HLR as well!

menexis
50%
50%
menexis,
User Rank: Light Beer
12/5/2012 | 4:06:12 PM
re: Server Glitch Crashes T-Mobile Network


thanks for clarifying! but can you elaborate

BigIrv
50%
50%
BigIrv,
User Rank: Light Beer
12/5/2012 | 4:06:12 PM
re: Server Glitch Crashes T-Mobile Network


LECOTEJR, please enlighten us and share what actually caused the outage.

vsomanv
50%
50%
vsomanv,
User Rank: Moderator
12/5/2012 | 4:06:10 PM
re: Server Glitch Crashes T-Mobile Network
1. Guess that the post is right in its sense. Tekelec did have some Virtual HLR Application resident on top of its Eagle STP Platform. They used to have it way back in 2003-04 times, and I have personally evaluated it at one point of time. Not sure if they continue to market and sell it. However, it stands true for a 2005 story. So I guess all is well - unless someone would like to differ.

2) Lecotejr - Could you clarify and amplify on your concerns, as such a statement of not verifying facts does not make for a right jurisdiction (atleast not for me)
3) A contention could be on the payout - 81 million vs 100 million ;-)
4) .. and do they get to sell to that operator as on date....
vsomanv
50%
50%
vsomanv,
User Rank: Moderator
12/5/2012 | 4:06:09 PM
re: Server Glitch Crashes T-Mobile Network
--> but what is the RCA for the HLR Crash. The article quotes that the redundant system did not kick in. It is quite strange that a redundant system which is there, did not kick in. But that did happen. If so, it points to a situation where the redundant system or some processes in the redundant system would have ceased to operate sometime back. Now comes the real clprit - NMS or management capability of the HLR, or the capability of HLR to send the right info at the right time to the Network Management System. If such a process had died down and the info not sent - the NOC personnel do not have any clue that the redundant system actually is dead in some way. And when the Network needed the redundant system to kick-in, it failed miserably.

The bottom line is(assuming that my assumptions are right) - the HLR should have notified the EMS/NMS and in turn the Ops Folk that the redundant fellow had died down... much much before..
Featured Video
From The Founder
Light Reading founder Steve Saunders grills Cisco's Roland Acra on how he's bringing automation to life inside the data center.
Flash Poll
Upcoming Live Events
February 26-28, 2018, Santa Clara Convention Center, CA
March 20-22, 2018, Denver Marriott Tech Center
April 4, 2018, The Westin Dallas Downtown, Dallas
May 14-17, 2018, Austin Convention Center
All Upcoming Live Events
Infographics
SmartNICs aren't just about achieving scale. They also have a major impact in reducing CAPEX and OPEX requirements.
Hot Topics
Project AirGig Goes Down to Georgia
Dan Jones, Mobile Editor, 12/13/2017
Here's Pai in Your Eye
Alan Breznick, Cable/Video Practice Leader, Light Reading, 12/11/2017
Verizon's New Fios TV Is No More
Mari Silbey, Senior Editor, Cable/Video, 12/12/2017
Ericsson & Samsung to Supply Verizon With Fixed 5G Gear
Dan Jones, Mobile Editor, 12/11/2017
Juniper Turns Contrail Into a Platform for Multicloud
Craig Matsumoto, Editor-in-Chief, Light Reading, 12/12/2017
Animals with Phones
Don't Fall Asleep on the Job! Click Here
Live Digital Audio

Understanding the full experience of women in technology requires starting at the collegiate level (or sooner) and studying the technologies women are involved with, company cultures they're part of and personal experiences of individuals.

During this WiC radio show, we will talk with Nicole Engelbert, the director of Research & Analysis for Ovum Technology and a 23-year telecom industry veteran, about her experiences and perspectives on women in tech. Engelbert covers infrastructure, applications and industries for Ovum, but she is also involved in the research firm's higher education team and has helped colleges and universities globally leverage technology as a strategy for improving recruitment, retention and graduation performance.

She will share her unique insight into the collegiate level, where women pursuing engineering and STEM-related degrees is dwindling. Engelbert will also reveal new, original Ovum research on the topics of artificial intelligence, the Internet of Things, security and augmented reality, as well as discuss what each of those technologies might mean for women in our field. As always, we'll also leave plenty of time to answer all your questions live on the air and chat board.

Like Us on Facebook
Twitter Feed