When networks go down, even for a brief moment, thousands of dollars are in jeopardy. Can service providers afford sacrificing carrier-grade reliability and risk facing downtime?
I've been examining the technical requirements for carrier-grade reliability in telecom networks quite a bit in recent weeks. One recent blog post discussed the differences between "high availability" and "carrier grade," while a subsequent post outlined some of the key challenges that make it so hard to implement true six-nines (99.9999%) reliability in a telecom platform.
Given both the technical challenges and also the licensing costs associated with delivering carrier-grade reliability, sooner or later a smart finance person is going to ask whether the benefits actually outweigh the expense. Isn't high availablity good enough? Why go to the additional expense of carrier-grade reliability? So it's interesting to look at some numbers that unequivocally tell us the real question is not "Can you afford to implement carrier-grade reliability?" but rather "How could you possibly afford not to?"
Let's start with the numbers that no one can dispute. The traditional standard for telecom network reliability, measured at the service level, is six-nines or 99.9999%, which translates to an average 32 seconds of downtime per year, per service.
As service providers plan for the progressive deployment of NFV in their networks, they inevitably consider the use of standard enterprise-class virtualization software for use in their NFV infrastructure. The standard reliability guarantee for such enterprise solutions is three-nines (99.9%), which implies 526 minutes of downtime per year.
So as a service provider, what will 526 minutes actually cost you, compared to 32 seconds?
One report estimates the cost of downtime as $11,000 per minute, per server. This represents the revenue that's lost as a result of service level agreements (SLAs) with customers (mostly high-value, enterprise users). If each server is down for 526 minutes per year, that's an annual cost of $5,780,000 per server. We'll call that $6 million per server to keep the math easy.
Now, how many servers do you have in your data centers, delivering the services that your customers depend on? If you have 1,000 servers then your total annual cost of downtime is 1,000 x $6 million, or $6 billion.
And just to make it worse, that $6 billion represents only the revenue that's lost as a result of customer SLAs. It doesn't include the long-term revenue impact caused by some of those customers switching to other service providers who are promising them the service uptime that they need.
To complete the analysis, the corresponding revenue loss for service outages in a network with true six-nines carrier-grade infrastructure will be $5,780 per server, or total of $6 million, if we again assume 1,000 servers in the data center. That's not negligible, but it's only one-thousandth the cost of the first scenario and customers are much less likely to switch providers if they experience such low average downtime.
So, putting ourselves in a service provider's shoes and assuming our hypothetical 1,000-server installation, the trade-off is clear. We can base our NFV infrastructure on enterprise-class software designed for IT applications, incurring $6 billion in lost revenue because of our SLAs, while risking customer defections because they can't tolerate the downtime. Or we can implement carrier-grade infrastructure that delivers the reliability that customers have been conditioned to expect, in which case downtime should only cost us $6 million and we can expect to retain our high-value customers.
What do you think? Are these numbers realistic? We’d be delighted to hear from readers with more visibility into the true cost of downtime.
Charlie Ashton is director of business development at Wind River.