For an outage to last as long as four hours is a bit too long if you ask me. By professional standards, an hour of outage is too long in this business. You would think that they would have some sort of contingency in place being that you are dealing with 40 million mobile customers. The damages could be well over 100 million.
re: Server Glitch Crashes T-Mobile NetworkYesterday wasn't a good day for Deutsche Telekom/T-Mobile, that's for sure -- with a profits warning and network failure all in one day.
re: Server Glitch Crashes T-Mobile Network1> Apertio was a small company with limited people which NSN acquired in 2008,and till now the expertise on this system has not been percolated down to the regular NSN sales, engineering and support staff, leading to dependancies on few people and leading to support & trouble shooting delays.
2)Apertio with 20k TPS ranks lowest in comparision with competitor products, and the HW of the FE ( server + ss7 modules ) are non carrier class, making it prone to failures.
re: Server Glitch Crashes T-Mobile Network1. Guess that the post is right in its sense. Tekelec did have some Virtual HLR Application resident on top of its Eagle STP Platform. They used to have it way back in 2003-04 times, and I have personally evaluated it at one point of time. Not sure if they continue to market and sell it. However, it stands true for a 2005 story. So I guess all is well - unless someone would like to differ.
2) Lecotejr - Could you clarify and amplify on your concerns, as such a statement of not verifying facts does not make for a right jurisdiction (atleast not for me) 3) A contention could be on the payout - 81 million vs 100 million ;-) 4) .. and do they get to sell to that operator as on date....
re: Server Glitch Crashes T-Mobile Network--> but what is the RCA for the HLR Crash. The article quotes that the redundant system did not kick in. It is quite strange that a redundant system which is there, did not kick in. But that did happen. If so, it points to a situation where the redundant system or some processes in the redundant system would have ceased to operate sometime back. Now comes the real clprit - NMS or management capability of the HLR, or the capability of HLR to send the right info at the right time to the Network Management System. If such a process had died down and the info not sent - the NOC personnel do not have any clue that the redundant system actually is dead in some way. And when the Network needed the redundant system to kick-in, it failed miserably.
The bottom line is(assuming that my assumptions are right) - the HLR should have notified the EMS/NMS and in turn the Ops Folk that the redundant fellow had died down... much much before..
For an outage to last as long as four hours is a bit too long if you ask me. By professional standards, an hour of outage is too long in this business. You would think that they would have some sort of contingency in place being that you are dealing with 40 million mobile customers. The damages could be well over 100 million.