NFV (Network functions virtualization)

Does NFV Have a Packet Processing Problem?

Is there a packet performance problem lurking behind NFV that could make the deployment of virtual functions in a typical IT-based cloud environment that much more complicated?

Metaswitch Networks thinks so, and the vendor is hoping to draw attention to what it sees as performance problems that could challenge the implementation of NFV, according to the company's CTO Martin Taylor.

By raising the issue and also laying out software and hardware fixes, Metaswitch wants to get telecom service providers talking about it and sharing their thoughts on which approach they prefer.

Taylor first spoke to me of his company's concerns back in November. Now he's brought them to light in his Metaswitch blog, which you can read here.

Specifically, Metaswitch found in its testing that network-intensive functions, such as those associated with user plane functionality -- versus control plane -- suffer performance issues when they are implemented in a typical IT-based cloud environment because that type of infrastructure is limited in its ability to handle heavy network workloads.

Many of its telecom service provider customers aren't aware of those limitations, Taylor says, and may be in for a nasty shock when they try to virtualize some functions on an OpenStack environment, for example.

"When we talk to telcos about what they are doing with OpenStack and the cloud environment for NFV, we find that they are obsessed with orchestration," Taylor told me back in November. "But our experience shows that there are these pretty serious performance issues lurking under the covers. Telcos might get a nasty shock when trying to do some virtualized network functions."

For more on NFV, head over to our dedicated NFV content channel here on Light Reading.

Metaswitch encountered these issues in the process of running its Perimeta session border controller, which handles both control plane and user plane functions. While the SBC runs fine on bare metal, its performance in a virtualized environment suffers because of the inefficiency of the data path between the physical network and the virtual machines. That path is provided by Open vSwitch software, which grew up in the IT environment and can't handle the million packets per second workload of a network SBC, according to Taylor.

The hardware fix essentially bypasses the vSwitch by establishing a data path between the physical network and the VM using Ethernet NIC (network interface card) technology called Single Root Input/Output Virtualization (SR-IOV). This approach is probably the most efficient, but it doesn't support overlay-based network virtualization, which is commonly used in migrating massive telecom networks to virtualization.

The software fix, which does support overlays but is less efficient than the hardware option, uses a commercially accelerated vSwitch to create a software-based data path between a conventional Ethernet NIC and the virtual machines, Taylor explains. The shared memory can be used to reduce or eliminate packet copying, thus make processing more efficient.

"We need the telcos to start to home in on what are their preferred options," he said. "We want the market to choose one."

— Carol Wilson, Editor-at-Large, Light Reading

Page 1 / 2   >   >>
dwx 1/6/2015 | 11:33:59 AM
Options I think just about everyone is aware of the processing limitations... 

SR-IOV is the key to using the Intel DPDK and fast processing, and right now there isn't good support in the hypervisor software for virtualizing the functions, so you do end up with a 1:1 mapping of virtual NIC to a physical NIC basically bypassing the virtualization layer.   There are NIC options where the NIC HW itself is virtualized and presents itself as multiple NICs even to the hypervisor, but the 1:1 mapping of physical to VM is still not ideal for spinning up instances.   I think things will get there eventually and you wil see OVS along with Juniper's Contrail VRouter and probably ALU's Nuage support the ability to present SR-IOV to an underlying VM.  

There are also acceleration options like Netronome's FlowNIC which is a network processor on a PCI card that already supports accelerating OVS.   We may see more of that come around due to the limitations of packet processing with general CPUs   General CPUs also use a lot more power than dedicated packet hardware.  
cnwedit 1/6/2015 | 11:50:19 AM
Re: Options Thanks for the feedback. One of the reasons I didn't write this up earlier was that my initial inquiries in following up on the Metaswitch info drew the email/voice mail equivalent of blank stares. Apparently, I was asking the wrong people. 
brooks7 1/6/2015 | 12:26:03 PM
Re: Options Carol,

I think I have said before that Virtualization impacts performance.  Essentially what dwx is giving as a response is somewhat bypassing the virtualization process.  There will be parts that are not virtualized.  When he says, "This can cause issues with spinning up instances." he is providing a really important piece of information.  

Essentially there are several problems that virtualization can solve.  They have different value and different problems:

1 - You can use NFV to eliminate custom hardware.  The notion being that you get better price/performance from COTS HW.  So instead of (in the Metaswitch case) having an SBC box, you now have a box which runs an SBC.  If you no longer need an SBC, you can turn the box into something else (note there are financial issues here deep in the tax code).

2 - You can use NFV to provide a way of creating products as they are required.  You can have spare boxes in Data Centers and instantiate what you need when you need it.  The setup and tear down is done by humans and is done on the order of days/weeks/years.  Beyond the custom hardware, you now have an inventory simplification (You buy a couple of server models and profile what you use on them).

3 - You can use NFV to rapidly respond to network changes.  Turn up and turn down instances based on loading.  This is a lot more problematic if you don't use pure virtualization.  Hard to keep track of your "inventory" of compute power and be able to deploy it flexibly.  On the other hand, this can be a really good way of keeping costs down.

4 - You can use NFV to manage your inventory very closely.  If you have a view to the load on any function, you can turn up instances on underused COTS resources.  This can provide off load of any hardware that is overloaded.  Again, hard (impossible) to do if you can't share hardware. Even tighter use of hardware.

I would argue that the NFV business case gets better as you progress down the list (I am sure there are other steps along the way).  I think it would be interesting to get a view to how SPs think they might use NFV and clearly articulating what value propositions interest them versus what vendors think.


cnwedit 1/6/2015 | 12:33:55 PM
Re: Options What service providers have consistently told me is that they are looking to NFV to both improve the cost picture (opex and capex) and, perhaps more importantly, be able to bring new services to market more rapidly. 

The latter seems to have become much more important over the past year. 

That said, the strongest business case in the early going seems to be - again according to the SPs I've interviewed - in replacing the multiple CPE boxes/devices with COTS hardware and software that is function-specific. Multiple forms of security plus load-balancers and optimization boxes are all on the list for possible replacement by an NFV device that could be upgraded remotely, eliminating the need for frequent truck rolls and maintenance issues. 
brooks7 1/6/2015 | 12:53:20 PM
Re: Options Well, there is good news there Carol.

Those boxes have been around for years!  Go virtually any IT function that you want (all the ones you have typed there included) are available in a virtual form and have for years.


Atlantis-dude 1/6/2015 | 1:01:07 PM
media functions Seems that metaswitch is highlighting user-plane processing. If they are specifically referring to transcoding, then there are certainly much more specialised DSPs than the x86. If for their use-case, bare metal provides sufficient performance then SR-IOV could be the closest to bare-metal option. Putting a vSwitch when the packets are not really being switched (but transcoded) is clearly a poor choice.
charlieashton 1/7/2015 | 7:30:07 AM
Re: Options

For NFV one of the biggset limitations of PCI Pass-through and SR-IOV is their limited support for Carrier Grade telecom reliabillity.

Network security is limited since the guest VMs have direct access to the network. Critical security features such as ACL and QoS protection are not supported, so there is no protection against Denial of Service (DoS) attacks.

These approaches prevent the implementation of live VM migration, whereby VMs can be migrated from one physical core to another (which may be on a different server) with no loss of traffic or data. Only "cold migration" is possible, which typically impacts services for at least two minutes.

Hitless software patching and upgrades are impossible, so network operators are forced to use cold migration for these functions too.

It can take up to 4 seconds to detect link failures, which impacts required link protection capabilities.

For service providers who are deploying NFV in their live networks, neither PCI Pass-through nor SR-IOV enable them to provide the service uptime that's an absolute telco requirement. A better solution is an accelerated vSwitch that's part of an NFVI platform that provides all the necessary Carrier Grade features.

martin_r_taylor 1/7/2015 | 7:39:13 AM
Re: media functions We are indeed talking about user plane processing, but the point we are making about packet processing is fundamentally concerned with the problem of getting packets from the wire to the VM and back to the wire again.  You have two choices here - either to use a vSwitch, or get the VM connected directly to the NIC, which is what SR-IOV lets you do nicely.  The "default" vSwitch implementations for both VMware and OpenStack currently introduce significant bottlenecks.  No doubt they will improve over time ...

It's true that SBCs may do a range of things to user plane packets besides shift them between untrusted and trusted network domains, including encryption / decryption and transcoding.  These functions may benefit from some kind of hardware acceleration.  As you point out, x86 _can_ work as a DSP for transcoding but purpose-built silicon is a lot more efficient.  But the problem with injecting purpose-built silicon into your NFV infrastructure is that it starts to look less and less like a nice generic cloud.  What you gain in efficiency, you lose in flexibility.
martin_r_taylor 1/7/2015 | 9:52:03 AM
Re: Options Charlie I'm afraid I have to take issue with the assertions you have made in your post.

First, network security.  Connecting VMs directly to the network (as opposed to passing via a vSwitch) does not necessarily degrade network security.  Indeed, it may actually be better.  The best performing accelerated vSwitches actually open up a serious security concern by providing VMs acccess to shared memory where an untrusted VM could cause havoc.  Furthermore, the SBC is itself a security appliance.  It expects to be connected directly to an unstrusted network, and one of its specific functions is to provide DDoS protection.  The SBC must be able to filter legitimate SIP signaling messages out of a flood of malicious traffic, and in order to protect the network against this malicious traffic while still handling legitimate call setup requets, it's essential that the host is able to pass all of this traffic to the VM that is running the SBC.  This is an example of the packet processing challenge in the control plane as opposed to the user plane, and it can be solved successfully with SR-IOV.

Second, live migration.  You are correct to point out that a VM that is connected to the network via SR-IOV cannot be live migrated, at least with current cloud software such as VMware or OpenStack.  However it's not clear that this is a real problem in practice.  Live migration of a VNF instance that is carrying voice traffic is actually highly undesirable since it results in suspension of all the media streams for a few seconds.  This is generally not acceptable.  If you need to move a virtual SBC from one host to another, then you can achieve this by leveraging the VNF's own fault tolerance mechanisms in conjunction with suitable orchestration.  For example, the orchestrator tells the VNF instance it wants to migrate to fail over to its standby, then shuts down the instance and starts a new one on the target host.  Finally it pairs this new instance up with the active half to restore full duplex fault tolerance.

I'm not disputing the value of accelerated vSwitches - they have their place.  But it's not clear to me that their advantages over SR-IOV are quite as clear-cut as you make out.
marco.passalacqua 1/7/2015 | 7:21:06 PM
Re: Options Hi,

Just a quick note.

If you choose to implement any virtual layer by-pass solution (like SR-IOV) you lose most of the benefits that you can get from a virtual (cloud oriented) environment.

So, I really don't understand the reason to implement an NFV compliant infrastructure if you cannot gain the main advantages of it.

Moreover, OpenStack doesn't support yet SR-IOV, so, from a cost point-of-view (in particular OPEX costs on which each customer is pretty sensible) if you choose a commercial solution (like VMWare) there's no a real decreasing.

Anyway, I'm pretty excited and I'm looking forward for the fabulouses new solutions for network performance improving in a virtual env that vendors will show us during this 2015, hopefully.

Page 1 / 2   >   >>
Sign In