& cplSiteName &

Server Glitch Crashes T-Mobile Network

Michelle Donegan
LR Mobile News Analysis
Michelle Donegan
4/22/2009
50%
50%

T-Mobile Deutschland GmbH suffered a massive network outage Tuesday that left all of its 40 million customers unable to make calls or send texts for four hours.

The service collapse was caused by a software problem in the network that caused the Home Location Register (HLR) servers supplied by Nokia Networks to crash, which, with refreshing accountability, has issued an apology.

"We apologize to all T-Mobile customers for the service disruption," says a Nokia Siemens Networks (NSN) spokesman. "We're working closely with T-Mobile to identify the software problem and why the redundancy built into the system didn't kick in."

The HLR database holds all of a mobile operator's subscriber information -- such as location, preferences, account status -- which is critical for routing calls. If this database goes down, calls cannot be completed.

In T-Mobile's calamitous outage yesterday, the subscriber locations were not being updated on the central HLR database, according to an NSN spokesman. All the HLR servers had to be taken offline and brought back up one at a time, which explains why the outage lasted as long as four hours.

The HLR servers that went wrong yesterday were installed by T-Mobile as part of a major upgrade project to streamline its subscriber databases. The servers in question are part of the product line Nokia Siemens acquired when it bought Apertio Ltd. for $206 million in January 2008. (See T-Mobile Picks NSN, Nokia Siemens Snaps Up IMS Vendor, NSN Completes Apertio Buy, and LR Names 2008 Leading Lights Winners.)

Counting the costs
A T-Mobile spokeswoman would not comment on how much yesterday's network outage would cost the operator in lost revenues.

But Germany-based telecom executive Chris Larmour, chief marketing officer at Actix Ltd. , which provides automated network status management systems to operators such as Vodafone Group plc (NYSE: VOD), estimates it could be about $100 million.

Larmour believes the severity of T-Mobile's network outage yesterday could have been limited.

"Some of this could have been avoided," says Larmour (who, incidentally, is not a T-Mobile customer). "It went wrong and no one was able to manage it. It took them four hours to figure it out. It will take them months to get back to normal."

The outage could be costly, not just for T-Mobile in terms of lost revenues, damaged market perception, and service reputation, but also for NSN if T-Mobile decides to seek compensation.

After a similar network outage back in 2004 at Bouygues Telecom in France, the operator sued its HLR supplier, Tekelec , for $81 million in damages. (See Bouygues Sues Tekelec Over Outage.)

— Michelle Donegan, European Editor, Unstrung

(8)  | 
Comment  | 
Print  | 
Newest First  |  Oldest First  |  Threaded View        ADD A COMMENT
menexis
50%
50%
menexis,
User Rank: Light Beer
12/5/2012 | 4:06:29 PM
re: Server Glitch Crashes T-Mobile Network


For an outage to last as long as four hours is a bit too long if you ask me. By professional standards, an hour of outage is too long in this business. You would think that they would have some sort of contingency in place being that you are dealing with 40 million mobile customers. The damages could be well over 100 million.

Michelle Donegan
50%
50%
Michelle Donegan,
User Rank: Light Beer
12/5/2012 | 4:06:28 PM
re: Server Glitch Crashes T-Mobile Network
Yesterday wasn't a good day for Deutsche Telekom/T-Mobile, that's for sure -- with a profits warning and network failure all in one day.

Michelle
greatwall7
50%
50%
greatwall7,
User Rank: Light Beer
12/5/2012 | 4:06:19 PM
re: Server Glitch Crashes T-Mobile Network
1> Apertio was a small company with limited people which NSN acquired in 2008,and till now the expertise on this system has not been percolated down to the regular NSN sales, engineering and support staff, leading to dependancies on few people and leading to support & trouble shooting delays.

2)Apertio with 20k TPS ranks lowest in comparision with competitor products, and the HW of the FE ( server + ss7 modules ) are non carrier class, making it prone to failures.
LECOTEJR
50%
50%
LECOTEJR,
User Rank: Light Beer
12/5/2012 | 4:06:13 PM
re: Server Glitch Crashes T-Mobile Network


Michelle,


 


Please check youor facts "before" submit articles. Tekelec, Inc does not manufacture HLR's. The cause of the outage was not caused by an HLR as well!

menexis
50%
50%
menexis,
User Rank: Light Beer
12/5/2012 | 4:06:12 PM
re: Server Glitch Crashes T-Mobile Network


thanks for clarifying! but can you elaborate

BigIrv
50%
50%
BigIrv,
User Rank: Light Beer
12/5/2012 | 4:06:12 PM
re: Server Glitch Crashes T-Mobile Network


LECOTEJR, please enlighten us and share what actually caused the outage.

vsomanv
50%
50%
vsomanv,
User Rank: Moderator
12/5/2012 | 4:06:10 PM
re: Server Glitch Crashes T-Mobile Network
1. Guess that the post is right in its sense. Tekelec did have some Virtual HLR Application resident on top of its Eagle STP Platform. They used to have it way back in 2003-04 times, and I have personally evaluated it at one point of time. Not sure if they continue to market and sell it. However, it stands true for a 2005 story. So I guess all is well - unless someone would like to differ.

2) Lecotejr - Could you clarify and amplify on your concerns, as such a statement of not verifying facts does not make for a right jurisdiction (atleast not for me)
3) A contention could be on the payout - 81 million vs 100 million ;-)
4) .. and do they get to sell to that operator as on date....
vsomanv
50%
50%
vsomanv,
User Rank: Moderator
12/5/2012 | 4:06:09 PM
re: Server Glitch Crashes T-Mobile Network
--> but what is the RCA for the HLR Crash. The article quotes that the redundant system did not kick in. It is quite strange that a redundant system which is there, did not kick in. But that did happen. If so, it points to a situation where the redundant system or some processes in the redundant system would have ceased to operate sometime back. Now comes the real clprit - NMS or management capability of the HLR, or the capability of HLR to send the right info at the right time to the Network Management System. If such a process had died down and the info not sent - the NOC personnel do not have any clue that the redundant system actually is dead in some way. And when the Network needed the redundant system to kick-in, it failed miserably.

The bottom line is(assuming that my assumptions are right) - the HLR should have notified the EMS/NMS and in turn the Ops Folk that the redundant fellow had died down... much much before..
Featured Video
Flash Poll
Upcoming Live Events
December 4-6, 2018, Lisbon, Portugal
March 12-14, 2019, Denver, Colorado
April 2, 2019, New York, New York
April 8, 2019, Las Vegas, Nevada
May 6-8, 2019, Denver, Colorado
All Upcoming Live Events