NFV (Network functions virtualization)

NFV Requires Redefined Reliability: AlcaLu

As service providers move toward virtual networks and virtualized network functions, the way they define and deliver reliability is going to change dramatically -- and that's just one of the many "big questions" the industry needs to be addressing, says Alcatel-Lucent (NYSE: ALU)'s Dor Skuler, vice president and general manager of the CloudBand Business Unit.

The traditional five-nines of reliability of telecom networks -- that is, they are 99.999 percent reliable -- assumes hardware fails an average of three to five minutes a year, Skuler says. And traditional, purpose-built telecom hardware meets that standard. But, the shift to software-defined networks (SDN) and network function virtualization (NFV) is predicated on using commercial off-the-shelf (COTS) hardware, not built to the same standards.

"As we move to commodity hardware, we should expect failures all the time at the hardware layer, so maybe we need to design services a little differently and measure service uptime, instead," Skuler says. "It may seem like semantics -- service downtime as opposed to hardware downtime -- but it's based on building the resilience into the software."

Virtualization will require different tools to address resiliency including trouble alerts, root cause analysis, and recovery, Skuler points out, because in a virtual network, the elements that make up a service will not be in one place, but in many different places. Detecting trouble won't be as simple as seeing a green light turn red on a specific piece of hardware or an element management system. As Skuler continued:

How do you deploy apps in an NFV world when it is many virtual machines in multiple tiers, multiple locations -- how do you upgrade them, how do you patch them, how do you recover from failure, how do you do automatic scaling and scale in when you don't need scaling anymore. How do you automate the network, because this whole process has to be automated? Those are the big questions we need to ask ourselves.

With CloudBand, its carrier-cloud architecture and platform for NFV deployment, Alcatel-Lucent is developing what it thinks are some of the answers to this and other big questions around virtualization, built on five basic principles of what NFV should do: orchestrate distributed data centers, manage application lifecycles, leverage the network, automate cloud "nodes," and be open and multi-vendor. (See: AlcaLu Unveils Its Carrier Cloud Play, and Analyst: AlcaLu Could Have a Cloud Hit.)

Industry analysts are generally crediting AlcaLu's CloudBand approach with being at the head of the pack for NFV answers. CIMI Corp. President Tom Nolle calls it "a credible approach" in his blog on CloudBand and Caroline Chappell, senior analyst with Heavy Reading , describes Alcatel-Lucent as having "more concrete solutions" than its competitors.

Skuler points to a 30-month process AlcaLu undertook to develop CloudBand that predates the NFV efforts, initiated by carriers in 2012, and the company's determination to build something from scratch that was carrier-specific, and not a retro-fit of an enterprise solution.

The solution includes Cloud Nodes, which are converged stacks of hardware and software that can be distributed but managed as a single resource, software that automates that management process, and a simple process for turning up nodes via a web interface in the configuration needed.

"Our customers can have a CloudNode up and running in two and a half hours, it doesn't take months of professional services and customization," Skuler says. "Once they are up and running all the lifecycle is automated -- if the hardware fails or the server fails or they want to add another node to the cluster, all of those things are completely automated."

The CloudBand Management system, which handles management functions, works through open APIs -- OpenStack or CloudStack APIs -- so that other nodes, not just AlcaLu's CloudNodes, can be part of the solution. (See: Alcatel-Lucent Expands CloudBand.)

Earlier this summer, Alcatel-Lucent rather quietly announced -- via Twitter -- that it had created an ecosystem, opening up CloudBand to the community to allow anybody in the industry to use a CloudBand system installed internally at its datacenter. This is one indication of the company's determination to maintain openness and avoid vendor or technology lock-in, Skuler says.

Anybody in the industry including our competitors are more than invited to using a CloudBand we installed internally in our data center and use our tools. They can actually work on top of a living distributed systems. Then when we go to customers we can show it working on something that is similar to what service providers have as opposed to running it on Amazon or their own six servers.

CloudBand is already in use by Telefónica SA (NYSE: TEF) and Deutsche Telekom AG (NYSE: DT), and Alcatel-Lucent has announced partnerships with companies such as Radware Ltd. (Nasdaq: RDWR) for load balancers and Metaswitch Networks for virtual IMS and virtual SBCs, among others.

Release 1.6 of CloudBand came out this summer, and Skuler says his team is producing a new release about every three months now, so stay tuned for what's next in the fall.

— Carol Wilson, Editor-at-Large, Light Reading

Page 1 / 2   >   >>
Dor Skuler 8/26/2013 | 4:21:58 AM
Thanks for the feedback and joint the journey really impressed to see the depth of the posts, and happy the disucssion with Carol helped jump start it.

for anyone working on a VNF - we actually opened CloudBand eco-system to all. you're welcome to use the system, which is distributed, to test out how to make your VNF service reliable using the cloud system and the life cycle management tools we expose.

TomNolle 8/22/2013 | 2:44:01 PM
Re: economics? I think not.. No, I think your nirvanna is probably the right answer!  IMHO, SDN is an inherent two-layer structure, with the higher "connection" layer implemented as a virtual overlay and the bottom layer implemented using whatever transport control processes make sense, but NOT directly coupled to the applications.  I think policy control is what creates the linkage.  Traffic management is a network-level function and applications need to control only connectivity.

You're right about CloudNFV; the demo use case is Metaswitch Clearwater IMS, and we're also looking at some integration-partner activity that would implement EPC and mobility management plus MRF, PCC, and CDN as a single "as-a-service" NFV function set.  The latter group would be a combination of signaling-plane (MME) and data-plane (SDN) activities.  That's not part of our current demo plan, but an add-on activity in 2014 likely with some new players to help.
Dredgie 8/22/2013 | 1:21:06 PM
Re: economics? I think not.. I think one of CloudNFV's initial NFV use case PoC implementations of an IMS core in the cloud, through Project Clearwater, is a good example of a virtualized control plane. Naturally you can't simply stick a piece of software designed to run on 5x9's hardware on 3x9's hardware. It must be architected for deployment within a clustered environment with features such as high availability databases for short-term state maintenance, such as (in this example) Infinispan. 

To Toms point, I believe we are going to also (finally!) see control plane separation in the core switching and routing components - perhaps starting with virtualized routers  in the form of route discovery (interior and exterior) and routing information bases (RIBs) that inject traffic engineered entries to a hardware-based forwarding plane / forwarding information base (FIB) using techniques such as those defined in the IETF ForCES WG, perhaps. A future path to SDN / Applications Based Network Operations (ABNO, to steal a phrase from an IETF draft) includes a north-bound I2RS interface to enable granular control of the RIB by client services.

Down the stack, (L2-L0) we can look to virtualized Path Computation Elements (PCE) with stateful traffic engineering databases (TED) which also take control-plane responsibilities out of the core transmission equipment and into a data center environment. This might leverage a combination of existing protocols (i.e netconf) protocol extensions (i.e PCEP) in combination with 'legacy' signaling techniques (RSVP) for path establishment and perhaps PCEP extensions (again) or openflow for traffic steering - An arrangement that needs only interaction with newer edge hardware devices, while the core switches remain as they are today. Or maybe that's all just my nirvana! :-)
TomNolle 8/22/2013 | 12:11:29 PM
Re: economics? I think not.. I do think that what could be called "transport control" functions, implemented IMHO as NFV services, would be appropriate and could be made reliable.  I have some limited NFV implementation experience and control-plane stuff seems feasible.  In the data plane, the biggest benefits would come toward the edge, though.
TTDCorp 8/22/2013 | 12:02:50 PM
Re: economics? I think not.. Great clarification for me, thank-you.  I can clearly see the edge network support.  I can see the OSS/BSS platforms riding an IT infrastructure on SDN and NFV.  I could not see a core infrastructure deployed as SDN with NFV elements.  As SDN applications form, I too fall victim of not know the definition and use.
TomNolle 8/22/2013 | 10:29:31 AM
Re: economics? I think not.. I think there's another dimension to the reliability/availability story that your comment implies.  Remember that the stated goal of the NFV, from their original white paper, was to replace "network functions" with virtual equivalents, but their target functions were more in the middle-box or edge router area.  I don't personally think that virtual routing/switching is appropriate for transport missions, and in the surveys I've done of operators they're not targeting those missions for NFV, at least at this point.

That said, as you start to look at SDN or NFV as an element in a reliable service you have to look at how that particular element's software component and hosting would impact your reliability/availability.

TTDCorp 8/22/2013 | 9:38:27 AM
Re: economics? I think not.. Tom,  I think you are linking an item that needs to be discussed.  Carrier networks are driven to five 9's by the SLA they have delivered in Transport/transmission to Google, Facebook, Twitter etc...  What is not five 9's for jsut about any in the transition to All-IP networks is the applications software and the application hardware. 

We have enterprise OTT free with some aspect of subscription services in advanced telecommunications (voice is an aspect but collaboration, video, social media is the premise), which have adopted HD Codec support and starting to assist SBCs in homogenizing interoperability.  Thus the OTT/UCC/competitive VoIP/hosted platforms and enterprise VoIP hardware providers are starting to expose higher quality sessions, when operating.  It is the when operating = five 9's that software and hardware platforms are way behind. 

On the carrier side, the platforms are there but the applications are not.  I see the service provider/subscriber definition as volatile state,  as a result of this transition period.
MarkC73 8/20/2013 | 6:32:43 AM
Re: economics? I think not.. I'm looking forward to seeing more details about this at this year's NA SReXperts.  I've been pretty preoccupied recently, so I haven't kept up with what the major players have been doing.  As far as reliability, true people may be more tolerant about internet services, but I think in order to make a dent in the market place, vendors will need to differentiate themselves to the carries that are going to implementing them.  I see reliability playing a significant role in that.  The other thing that will need to be worked out is overall interoperability, as major vendors take different approaches and make different partnerships, I wonder how quick standards that are detailed enough can be made and agreed upon before it turns out messy.
TomNolle 8/19/2013 | 8:00:11 PM
Re: economics? I think not.. Personally, I think the Internet, Skype, Google Voice, etc. are proving an important point, which is that people accept best efforts for nearly everything.  I think "five-nines" is a goal that you achieve when somebody pays for you to achieve it, and in the case of voice I don't think that they will.  Yes, I know we have more reliable TDM voice service today for wireline, but that's not the point.  We used to have five-nines data services until the Internet came along too, and they'd now cost us about nine grand a month, so we don't have them any more.

I think we will engineer reliability into NFV services, as much reliability as the market will pay incrementally to get.  But I also don't think that voice services are a strong candidate for NFV.  Remember that the NFV Call for Action focused on middlebox functions that were more likely used for data services.
Dredgie 8/19/2013 | 7:42:17 PM
Re: economics? I think not.. Flicking this Q. to Tom as the NFV oracle. Obviously kudos to AlcaLu for doing it - not just sitting around but maybe I'm a little hung-up on the initial "big questions... the industry needs to be addressing" statement. This is exactly the area the RELAV WG is looking at, right? Yes - I know it's not public, but it's not like this stuff isn't really being ignored.

Totally agree with the equipment reliability vs. service chain resiliency argument, though. I think even a real-time service like voice (w/video, we are used to drop-outs), when architected appropriately with duped VNFs across VMs, Hypervisors and Geo locations (with appropriate E.412-like overload prevention, etc.) we may get better than 5x9s in terms of 'complete' outages, though I can foresee getting more 'blips' as a NSC recovers or adapts to loads. But, we should also consider that mobile QoE is the new accepted norm. Call set-up fails - we hit the 'call contact' button, again, and don't think twice about it. 

OK - fine - we curse the carrier, then hit the 'call contact' button again :-)
Page 1 / 2   >   >>
Sign In