Virtualizing the network means running on commodity hardware, which will fail much more often than today's systems.

August 19, 2013

4 Min Read
NFV Requires Redefined Reliability: AlcaLu

As service providers move toward virtual networks and virtualized network functions, the way they define and deliver reliability is going to change dramatically -- and that's just one of the many "big questions" the industry needs to be addressing, says Alcatel-Lucent (NYSE: ALU)'s Dor Skuler, vice president and general manager of the CloudBand Business Unit.

The traditional five-nines of reliability of telecom networks -- that is, they are 99.999 percent reliable -- assumes hardware fails an average of three to five minutes a year, Skuler says. And traditional, purpose-built telecom hardware meets that standard. But, the shift to software-defined networks (SDN) and network function virtualization (NFV) is predicated on using commercial off-the-shelf (COTS) hardware, not built to the same standards.

"As we move to commodity hardware, we should expect failures all the time at the hardware layer, so maybe we need to design services a little differently and measure service uptime, instead," Skuler says. "It may seem like semantics -- service downtime as opposed to hardware downtime -- but it's based on building the resilience into the software."

Virtualization will require different tools to address resiliency including trouble alerts, root cause analysis, and recovery, Skuler points out, because in a virtual network, the elements that make up a service will not be in one place, but in many different places. Detecting trouble won't be as simple as seeing a green light turn red on a specific piece of hardware or an element management system. As Skuler continued:

How do you deploy apps in an NFV world when it is many virtual machines in multiple tiers, multiple locations -- how do you upgrade them, how do you patch them, how do you recover from failure, how do you do automatic scaling and scale in when you don't need scaling anymore. How do you automate the network, because this whole process has to be automated? Those are the big questions we need to ask ourselves.

With CloudBand, its carrier-cloud architecture and platform for NFV deployment, Alcatel-Lucent is developing what it thinks are some of the answers to this and other big questions around virtualization, built on five basic principles of what NFV should do: orchestrate distributed data centers, manage application lifecycles, leverage the network, automate cloud "nodes," and be open and multi-vendor. (See: AlcaLu Unveils Its Carrier Cloud Play, and Analyst: AlcaLu Could Have a Cloud Hit.)

Industry analysts are generally crediting AlcaLu's CloudBand approach with being at the head of the pack for NFV answers. CIMI Corp. President Tom Nolle calls it "a credible approach" in his blog on CloudBand and Caroline Chappell, senior analyst with Heavy Reading , describes Alcatel-Lucent as having "more concrete solutions" than its competitors.

Skuler points to a 30-month process AlcaLu undertook to develop CloudBand that predates the NFV efforts, initiated by carriers in 2012, and the company's determination to build something from scratch that was carrier-specific, and not a retro-fit of an enterprise solution.

The solution includes Cloud Nodes, which are converged stacks of hardware and software that can be distributed but managed as a single resource, software that automates that management process, and a simple process for turning up nodes via a web interface in the configuration needed.

"Our customers can have a CloudNode up and running in two and a half hours, it doesn't take months of professional services and customization," Skuler says. "Once they are up and running all the lifecycle is automated -- if the hardware fails or the server fails or they want to add another node to the cluster, all of those things are completely automated."

The CloudBand Management system, which handles management functions, works through open APIs -- OpenStack or CloudStack APIs -- so that other nodes, not just AlcaLu's CloudNodes, can be part of the solution. (See: Alcatel-Lucent Expands CloudBand.)

Earlier this summer, Alcatel-Lucent rather quietly announced -- via Twitter -- that it had created an ecosystem, opening up CloudBand to the community to allow anybody in the industry to use a CloudBand system installed internally at its datacenter. This is one indication of the company's determination to maintain openness and avoid vendor or technology lock-in, Skuler says.

Anybody in the industry including our competitors are more than invited to using a CloudBand we installed internally in our data center and use our tools. They can actually work on top of a living distributed systems. Then when we go to customers we can show it working on something that is similar to what service providers have as opposed to running it on Amazon or their own six servers.

CloudBand is already in use by Telefónica SA (NYSE: TEF) and Deutsche Telekom AG (NYSE: DT), and Alcatel-Lucent has announced partnerships with companies such as Radware Ltd. (Nasdaq: RDWR) for load balancers and Metaswitch Networks for virtual IMS and virtual SBCs, among others.

Release 1.6 of CloudBand came out this summer, and Skuler says his team is producing a new release about every three months now, so stay tuned for what's next in the fall.

— Carol Wilson, Editor-at-Large, Light Reading

Subscribe and receive the latest news from the industry.
Join 62,000+ members. Yes it's completely free.

You May Also Like