Storage/servers/generic IT

Healthcare.gov Outage Scars Verizon Terremark

Verizon Communications had a chance to be one of the heroes in the federal government's attempt to fix the Healthcare.gov fiasco. Instead, a major connectivity failure has Verizon looking as sheepish as everyone else associated with the troubled rollout of the "Obamacare" health insurance marketplace.

An outage at a datacenter operated by Verizon Terremark is being blamed for the disruption that rendered parts of the Healthcare.gov site useless for much of Sunday and early Monday morning. The US Department of Health and Human Services (HHS) issued a statement Monday morning saying the issue had been resolved.

The outage also affected other Verizon Terremark customers, though none with as high a profile -- or as infamous a reputation -- as the Healthcare.gov site.

Only last week, Verizon's Enterprise Solutions division reportedly was asked to be part of an expanded team effort to help fix technical problems that have dogged the healthcare exchange site since its launch.

There is nothing to indicate that Verizon Terremark will suffer any more glitches, and no one is calling for one of the biggest names in telecom to be yanked from the project. However, the occurrence of this one problem while the company was upon the national stage is troubling in itself. It comes at a time when carriers like Verizon are arguing that their cloud and datacenter performance and reliability are what separate them from other datacenter operators.

This outage is a reminder that things can go wrong even at datacenters run by storied telcos. Nobody's perfect, as the saying goes. Many large corporate enterprises probably like to think their traffic is more sensitive and mission-critical than anything the federal government is dealing with, and this matter of the Healthcare.gov outage is something they might just bring up as they talk to Verizon Terremark about hosting and managing their traffic.

— Dan O'Shea, Managing Editor, Light Reading

Page 1 / 2   >   >>
Liz Greenberg 10/29/2013 | 5:15:03 PM
Re: Stuff happens? Seven, we are on the same page on this one.  There are a lot of places for foul-ups to happen even in the best executed scenario but from all that I read in the papers here it sounds like the political committee approach to engineering solutions failed.  If something like this is "not anticipated in the app design or deployment" then it is a failure to understand the basic design requirements from the get go...sorry but either the specification would have been wrong or the implementation of the spec. 
brookseven 10/29/2013 | 4:11:52 PM
Re: Stuff happens?  



I have nothing direct, but datacenters, networks, equipment can all have outages.  It is a best practice to make your application resilient to those.  It is easy to NOT do that and harder to do it.

My best guess is that the datacenter failure (and thank goodness Sandy was last year right?) was not anticipated in the app design or deployment and thus it was not resilient to that failure.

Wonder how it will go when the have a route flap (Level 3 I am looking at you here....) :)


DOShea 10/29/2013 | 3:37:55 PM
Re: Stuff happens? I think there is some stuff here that we haven't heard explained yet. HHS seems to be holding Verizon's feet to the fire, but Verizon isn't saying much about it. Not sure why they're are taking the heat in this particular instance, and no one else is, but either way it's embarrassing for Verizon.
Liz Greenberg 10/29/2013 | 2:54:22 PM
Re: Stuff happens? Seven if it was at the web app level then you are right and I am off track on this.  To me when I read about what happened it seems like it was on multiple levels. Given the expected user levels I have to wonder if the system was under-engineered and therefore has multiple points of potential failure.  Any other insights to share?
brookseven 10/29/2013 | 2:13:02 PM
Re: Stuff happens? Liz,

I think you are off track in one thing.  I doubt the fault tolerance issue is actually a Verizon issue at all.  The normal way this is handled is at the web app itself which is why I wonder why Verizon is taking the hit at all.

Let me put it this way, if you don't have multiple instances that have fail over capability built in with maintained session ids then you got problems on any server, any where.


Liz Greenberg 10/29/2013 | 1:02:51 PM
Re: Stuff happens? Carol and Brookseven you are both on the right track.  Back in the "good old days" of telecom, a lot of effort was placed on reliability, redundancy, fault tolerance, disaster survivability, etc.  People have gotten very relaxed about this because they are so used to their mobile phones working only most of the time.  The reality is that the old attitude is required for data centers, cloud computing and the networks that serve them.  I am not saying "cost be damned" but basically IT folks have known for years how to prevent these issues and Verizon should have implemented these techniques from the beginning. They are their own worst enemies.
Carol Wilson 10/29/2013 | 12:50:42 PM
Re: Stuff happens? Seven, I totally agree -- twice in one week, I must be slipping. 

You actually said more clearly what I was thinking. Risk is always there when you depend on technology and at the same time, there are always ways to mitigate that risk. 
brookseven 10/29/2013 | 12:13:31 PM
Re: Stuff happens?  


That stuff can happen cloud or not and really it is not a technology fault directly.  It is an implementation issue.  The challenge is that "cloud" based means less control so IT folks can not repair failures directly.

But there are ways of building applications and systems with better Disaster Recovery (DR) and fault tolerance.  For example, if it is that big and important why are you on 1 vendor? 

I agree with you that this will not help perception, but I personally believe the fault lies with the folks driving the bus not the technology.


Carol Wilson 10/29/2013 | 11:13:05 AM
Re: Stuff happens? The problem is that when stuff happens and your data, applications, etc. are in the cloud, it can bring your business to a halt. No CIO or CTO wants to have to explain that to the CEO or the investors. 

so while the reality is that stuff happens to businesses and their networks all the time, the sensitivity toward new approaches to doing things will remain high as long as the perception is that they are riskier. And moving to the cloud is new. Among its many benefits is actually the ability to mitigate some risks and enable disaster recovery to happen more quickly.  

Butfor now. the  perception is that cloud itself is a risk and this outage won't help that perception. 
wanlord 10/29/2013 | 9:29:19 AM
Re: Stuff happens? The Internet community in general has had countlesss issues since created, growing pains, outages, slowness, etc. That is the nature of a connected network managed by seperate entities with different rules, policies, standards, and backhoes! That being said, I never heard any political party calling to shut the Internet down because of a data center glitch, although I am sure Al Gore will get the blame for this....


Page 1 / 2   >   >>
Sign In