x
NFV Tests & Trials

EXCLUSIVE! NFV Interop Evaluation Results

Further Interoperability Findings and Challenges
There was much to learn from the Phase 1 process.

Differences in OpenStack vendor implementations
One of the big fun areas with open source projects is their wealth of deployment options and general lack of backwards compatibility awareness. This is one of the reasons why RedHat has been successful with RHEL (RedHat Enterprise Linux): It validates and ensures consistency and backwards compatibility of Linux packages in its distribution. Unfortunately, such a thing is still lacking in OpenStack.

Three of the four NFVi vendors implemented OpenStack version Juno (released Q1, 2015), while one provided Icehouse (released Q2, 2014), which had an incompatible Heat version. Some of the more complex VNFs did not manage to boot up on Icehouse, and it would have been too cumbersome to configure them manually without Heat templates.

We intentionally did not request specific OpenStack versions. In the near future, service providers will be faced with data centers all based on different hardware and software combinations. Backwards compatibility is an important aspect to look at and clearly the industry needs to focus on it more extensively.

When the Icehouse NFVi vendor wanted to upgrade its servers, we noticed the next issue: In-service OpenStack upgrades will be a major challenge. It will be an interesting test area for the future to test in-service NFVi upgrades: One NFVi vendor was already willing to demonstrate this feature in the next round of testing.

Multivendor troubleshooting of VNF startup issues
When there are issues in bringing up VNFs, it is obviously important to identify the root cause efficiently. In our test bed, all vendors were fully motivated to identify and resolve issues swiftly, as each issue resolved prior to customer deployment saves a lot of budget and stress. Nobody was interested in fingerpointing, which may be different in a service provider environment, where a lot of business may be at stake if a NFVi/VNF combination does not work.

The multivendor interoperability troubleshooting techniques and methods are different than those required for operational problems. While it is important to get a very quick overview of the battlefield in case of operational issues, functional interoperability problems are best solved if full and detailed information can be made available from either side for analysis by the involved vendors.

To this end, OpenStack offers many logs. It turned out that one of the most insightful troubleshooting "tools" was simply the console log of the VNF, including its bootup messages. We had multiple cases where a VNF would crash during the boot process: Accessing its kernel panic or crash message was instrumental for the VNF vendor(s) to identify the root cause.

Simply getting out a text file with the console log required different methods in different NFVis: Graphical user interfaces shielding all the dirty CLI details from the operator do not help in this case. For most NFVi vendors it was possible to access the console logs in one way or another, but one of the participating NFVis unfortunately did not save the console log in instances where a VNF crashed early during the boot process. The simple live view of the console was insufficient, as text scrolled too fast -- even recording the console output with a camera did not help. In the end the VNF vendor had to give up and the combination was declared failed due to the inability to identify the root cause. NFVi vendors should include feature-rich and verified troubleshooting solutions for VNF interoperability testing.

OpenStack IP addressing and VNF IP addressing
Traditionally, OpenStack assigns a single IPv4 address (using Directed DHCP) to each virtual network port of the VM during bootup. That's probably reasonable for enterprise cloud environments, but for virtual network functions, this mechanism is less useful as it may limit the VNF too much, or cause confusion to the operator if different VNFs use different methods.

The four OpenStack NFVis in our test supported a range of IP address assignment options as follows:

    1. Directed DHCP for all ports, pushing an IP address by OpenStack directly -- can be used for the management port only.

    2. Directed DHCP -- for all ports.

    3. Plain DHCP: OpenStack maps complete subnets to virtual network ports, spins up a DHCP server and will subsequently always respond to any DHCP requests on that port.

    4. Boot config: Bypassing DHCP, the boot configuration may include static IP addresses as mentioned in the startup option subsection above. While the VNF may ignore the OpenStack IP addressing, OpenStack management tools will still assume the supposedly assigned IP addresses during the virtual port creation. This will create operator confusion.

Initially, we did not focus on IP addressing, so we went along with any suggestions NFVi and VNF vendors made. As we moved ahead with the VNF configuration testing, we noticed that a stricter rule -- preferably using Directed DHCP -- would ease the IP address management in a multivendor compatible way.

License management considerations
All of the VNFs under test were commercial solutions. As such, they require some sort of licensing. As it turned out during the test campaign, licensing was one of the least mature areas. By their nature, none of the open source projects addresses commercial licensing in a standardized way.

Traditional methods of software licensing do not work well or do not seem to fit in a virtualized environment. VNFs are instantiated, moved around the data center, terminated and re-instantiated as needed. Each time, they are given a new unique identification number so it is impossible to tie their existence to any static license. Thus far, most of the participating VNFs lacked a proper licensing mechanism that would fulfill all our requirements. Many have expressed their awareness that licensing needs to be improved and made an integral part of the VNF management in the future.

In the test bed, we observed the following ways to implement licenses:

1. Local license manager
The VNF requires a local license manager on instantiation. This solution required some planning to install a license manager in our lab environment and to route requests to the VNF over the internal management network. Once that was done, the local license manager solution worked fine. From the vendor perspective, this solution may still not be secure (see discussion below this list).

2. Public license server
A license server in the cloud sounds easy: No setup is required and it appears to be the most manageable and secure option for VNF vendors. Operationally, it may or may not be applicable to production environments. A service provider would need to open their management network to specific Internet locations and services, at least for outbound HTTPS connections. Even in our lab environment, we did not permit direct communication between VNFs and the Internet, so this option was out of question. For one vendor that needed it (because a local license manager was offered for production deployments only), we helped ourselves with an application-layer gateway (HTTP forwarding).

3. Pre-licensed image for a time limit
Any of the following three options apply traditional licenses (i.e. encrypted strings) to the VNF. Since it is difficult to identify any unique aspect of a VNF, it is probably most realistic to just not tie the license to anything but the system clock. In this case, however, there is no full guarantee that the customer might not reuse a license multiple times. Typically, the license is applied during instantiation time. The extra (manual) steps required make this option useful only in case of small-scale deployments that do not require automatic scale-out operations.

4. Pre-licensed image tied to MAC address
Some vendors, looking for unique identification of the VNF instance, resorted to the MAC address. This is obviously useless as the MAC address can be set with nova commands. It just ensures that there won't be two VNF instances using the same license on the same Ethernet/vswitch management network segment.

5. Pre-licensed image tied to UUID
Some vendors found a unique identification of the VNF instance -- the UUID. This is actually a correct and suitable unique ID assigned by OpenStack. It cannot be manipulated by the operator. Unfortunately, the UUID changes on each and every administrative action. A license tied to the UUID does not allow relocation of the VNF, or even termination/re-instantiation.

None of the license management options seemed very suitable. We feel that the industry has to put more thought into standardization in this area.

The unfortunate reality is that it is not possible to come up with a secure licensing solution in an elastic environment without the vendor being able to establish a trusted, uncloneable "anchor" point for VM licensing. The most favorable-looking option from the list above -- option 1, the local license manager -- provides this "uncloneability" characteristic only if used in conjunction with a physical license token (such as a USB key) that ties the server to a fixed set of host machines, or a Web-based license soft token.

With any license server (whether local or in the Internet), resilience is a concern. What happens if the license server becomes unavailable? One vendor informed us that their clients raise an alarm if the connection to the license server is lost; the customer has a whole month to react to the issue before the system enters its locked-down unlicensed state. (We have not tested this aspect yet.)

In the lab, there was an additional solution that worked most easily out of the box: Some vendors implemented a license-less mode where the VNF would just support a tiny bit of throughput, for example 0.2 Mbit/s. Cisco's and Huawei's virtual routers, for example, supported such a mode. Since we wanted to conduct functional tests only, these modes were sufficient.

What works great in the lab may be an operational concern: If a license server might become unavailable so that the VNF falls back to tiny-throughput mode, the issue might go undetected for a while if the VNF is idling. Orchestrators should probably include license management monitoring methods for each VNF instance and the license manager itself -- especially if the license manager might become a single point of failure. (This is just some food for thought, but worth noting we think.)

vSwitch Configuration Options
Some VNFs -- implementations that consisted of multiple VMs -- required jumbo frame support (beyond 1518 byte-sized Ethernet packets) for internal communication. Some of the participating NFVis supported jumbo frames by proprietary OpenStack code modifications as part of their software release; one implemented a temporary workaround that required manual intervention each time a VNF was instantiated. The lesson learned is that jumbo frame support cannot be taken for granted: One vendor told us that OpenStack Kilo will support jumbo frames out of the box, but another vendor contested any dependency on OpenStack versions.

Early Thoughts on VNF/NFVi Security
All NFVis, with the exception of one, fully trusted any management requests received via their REST APIs. Any OpenStack commands received by the NFVi were executed without requiring transport-level security (HTTPS) or validating a certificate. With this, it would be relatively easy to manipulate an NFVi once transport-level access has been achieved by any tenant. This is unfortunate, as OpenStack supports HTTPS and certificates. The one vendor that used and required this transport-level security (HTTPS), however, used self-signed certificates, which made it difficult for some VNFs to interoperate. Certificates had to be exchanged between NFVi and VNF externally, and some rejected the self-signed certificates. This highlighted an issue well known from the encryption industry: Certification management is not straightforward and provides a challenge comparable to the license management described above.

In OpenStack, tenants get granular operational rights assigned to instantiate VNFs. Sometimes, simple configuration on OpenStack was not sufficient: We needed to upgrade some of the tenants to gain administration rights in order to spin up their VNFs successfully. The standard OpenStack configuration does not allow sufficient customization of access rights to tenants, as would be required in a service provider environment with different administrative groups operating the NFVi and a range of VNFs.

For example, a heat_stack_owner permission should allow a tenant to use Heat for instantiating a VNF. However, sometimes this was not sufficient because tenants needed to create internal networks for communication between multiple VMs and part of the VNF. This was only possible with a more privileged account.

CPU Flags and Instruction Sets
Intel continues to evolve x86 instruction sets. NFV requires much more network throughput (i.e. copying to/from network interface cards) than other x86 use cases have needed in the past. Intel developed the Data Plane Development Kit (DPDK) to help developers access the network data plane more efficiently. Some of these modifications use Intel's x86 instruction set extension SSSE3 (Supplemental Streaming SIMD Extensions 3). Although we did not test performance in this program, we still came across instruction set issues: Some VNFs waited for certain CPU compatibility flags to be sent by the hypervisor (KVM). SSSE3 was supported by all CPUs in the test bed, but one of the VNFs required the appropriate flag to be transferred from KVM, as it wanted to check whether Intel DPDK 2.1 requirements would be satisfied. The test combination got stuck because the CPU flag could not be properly conveyed. DPDK was officially out of scope in this functional interoperability evaluation, but nevertheless is seemingly needed.

In another case, a VNF required Intel's Integrated Performance Primitives (IPP) library, which, in turn, is based on the SSSE4, SSSE3, SSE, AVX, AVX2 and AVX512 instruction set extensions. These are not supported by all CPU types, obviously -- specifically only by newer Xeon processors, not by the Atom family. In fact, hardware independence and compute/network performance are conflicting goals; VNFs with high performance requirements may well be compatible only with a subset of NFVi hardware platforms.

  • Prefer to read this report in a downloadable PDF file? An extended PDF version of this report, which includes individual report cards for each test combination that achieved a 'pass,' is available for free (to any Light Reading registered user) at this link.

    To go back to the first page of this report, click here.

  • Previous Page
    6 of 6
    muzza2011 12/10/2015 | 2:39:46 PM
    Network now needs a permanent Proof of Concept lab We're at the 'art of the possible' stage, rather than the 'start of the probable' regarding live deployment... which if taken at face value, should wholly sidestep the heinous blunder of productionising a nine 5's infrastructure in for a five 9's expectation.

    As much as the IT proposition to a user involves all layers of the ISO stack, if you screw the network you've lost the farm, which then negates any major investment in state of the art delivery systems up the stack.

    Any tech worth their weight will crash and burn this stuff on Proof of Concept lab... and weld all the fork doors closed... as it'll be their 'nads on the line should if fail in production, not any IT upper heirarchy who decided to make an (unwise) executive decision.

    Fail means *anything* regardless whether its a SNAFU or an hack-attack vector that no-one tested. 

    In short, test to destruction, armour-plate the resultant design, and always have a PoC lab bubbling away in the background for the next iteration, because the genie is now out of the bottle and won't go back in.
    mhhf1ve 12/8/2015 | 8:56:52 PM
    Open source doesn't necessarily mean any support available... It's always interesting to see how open source platforms still have gaping holes with major unsupported functions. Every fork/flavor of an open source project means more than a few potential gaps for interoperability with other forks.

     
    HOME
    Sign In
    SEARCH
    CLOSE
    MORE
    CLOSE