Validating Cisco's NFV Infrastructure Pt. 2

Network Technology

The second instalment of a two-part report into what test specialists EANTC found out when they went to Cisco's labs to perform an independent evaluation of the IP networking vendor's NFV infrastructure (NFVi).

October 23, 2015

23 Min Read

Validating Cisco's NFV Infrastructure Pt. 2

NFV (network functions virtualization) is just about the hottest trend in communications networking right now, with network operators allocating significant resources (human, technical, financial) to figure out the best New IP strategy for their business needs.

It's crucial, then, that there are independent evaluations of the technology that may be deployed in next-generation networks. That's why Light Reading asked its respected test lab partner EANTC to visit the San Jose, Calif. labs of Cisco Systems earlier this year to conduct a series of validation and verification exercises on a number of Cisco cloud, software-defined networking (SDN) and virtualization platforms. (See Validating Cisco's Service Provider Virtualization & Cloud Portfolio.)

More recently, the EANTC team returned to San Jose to evaluate Cisco's NFVi. The first part of the resulting report, which provides an overview of the aims of the evaluation, a look at Cisco's NFVi and an in-depth, multi-page performance evaluation of Cisco's virtual switch technology, has already been published -- see Validating Cisco's NFV Infrastructure Pt. 1 and Cisco's vSwitch Makes the Grade.

Part 1 of the report can also be downloaded in PDF format: click on this link to download Part 1.

Figure 1:

Now we have part two of the report to share, which delves into: Carrier-grade high availability and reliability; the integration, features and performance of two key applications -- Virtualized Video Processing (V2P), previously referred to as Cloud DVR, and virtual EPC (evolved packet core); and an evaluation of Cisco's "single pane of glass" management capabilities with regards to its NFVi.

Here is what is covered in Part 2 of the report on the following pages:

Page 2: Carrier-grade high availability and reliability of Cisco's NFVi
Page 3: Automated and validated OpenStack installation
Page 4: Centralized logging and Runtime Network Regression Testing
Page 5: High availability
Page 6: Putting VNFs to the Test
Page 7: Cisco's Virtual Packet Gateway VPC-DI
Page 8: Single Pane of Glass management

— The Light Reading team and Carsten Rossenhövel, managing director, and Balamuhunthan Balarajah, test engineer, European Advanced Networking Test Center AG (EANTC) (http://www.eantc.de/), an independent test lab in Berlin. EANTC offers vendor-neutral network test facilities for manufacturers, service providers, and enterprises.

Next page: Carrier-grade high availability and reliability of Cisco's NFVi

Carrier-grade high availability and reliability of Cisco's NFVi
Naturally, any network infrastructure solution in the telecommunications world is required to be highly available and operationally reliable. Major customers in the wholesale business and in key industries such as the financial sector have long required network availability values beyond 99.9% or "three nines" -- this can be achieved only by creating redundancy with hot standby components and alternative paths.

To achieve a three-nines available service, each contributing part of the service -- data center, core and aggregation network, access network -- needs to be even more reliable, as the reliability figures of each module will multiply statistically: Six components, each 99.99% available, create a service that will be 99.94% available.

This is one of the two reasons why the industry created a goal of "five nines" i.e. 99.999% availability. The other is that service providers' lawyers decided to measure availability in a relaxed a way as possible, evaluating over the course of a full year. A customer requiring at the very most just one hour of end-to-end service downtime needs to demand 99.99% availability, the equivalent to about 52 minutes per year: This is why a data center implementing a virtualized component of a network service must be able to provide 99.999% availability.

At EANTC, we have tested high availability features of many Cisco network components before -- core, aggregation and edge routers, data center switches, mobile core, etc. -- and we are confident that it is possible to build highly available network infrastructures. But we had not previously evaluated NFVi high availability prior to this evaluation.

High availability (HA) comprises (at least) three aspects:

Design: The end-to-end infrastructure should be designed holistically, eliminating any single point of failure and taking all layers into account -- hardware, virtualization infrastructure (NFVi), network services, management and end-to-end connectivity.
Configuration: The HA solution must be configured correctly at installation time, including its failover mechanisms, which will rarely be exercised in the production network in the best case, so without testing in advance there would be no guarantee they actually work at all.
Security: It is critical to put security mechanisms in place to reduce the risk that the solution's configuration might be compromised.

The HA solution's correct functionality and performance should be audited frequently as a health check and regression test.

Cisco presented a number of products (modules) for evaluation that contribute to these goals. In contrast to the vSwitch performance tests reported earlier, these Cisco environments were set up to focus on functionality, not performance or actual high availability figures.

Next page: Automated and validated OpenStack installation

Automated and validated OpenStack installation
Currently, any OpenStack installation is a tedious, highly manual and thus error-prone process, specifically when it comes to more elaborate high availability configurations. Another issue is that pure OpenStack configuration is insufficient to get an NFVi deployed: There are additional data-plane components that the administrator needs to deploy and integrate, such as the hypervisor, storage platform and switching options. Cisco provides scripts around this installation process, automating it and allowing efficient verification of the configuration.

Figure 2: OpenStack and additional NFVi components.

Cisco's deployment capabilities include user input and configuration validation, bare-metal and Openstack installation. Cisco demonstrated a command-line installer offering six modes of operation, as shown in the following screenshot:

Figure 3: Cisco installer choices.

The operations 'BAREMETAL,' 'COMMONSETUP' and 'ORCHESTRATION' had been pre-deployed by Cisco prior to our evaluation. EANTC was able to check that the deployment had been made.

Cisco explained to us that the 'VALIDATION' function detects improper parameters before starting with any installation, so eliminating unexpected issues during deployment.

Input parameters for the installer (both mandatory and optional ones) are specified in a YAML file. The installer verifies the presence of all mandatory parameters. It also verifies that the values provided for both mandatory and optional parameters are valid.

The validation method ran swiftly, taking around half a minute. It showed Linux boot-style checks:

Figure 4: Cisco install validation tool in action.

To check if it would detect any failures, we created an issue intentionally and ran the installer once more:

Figure 5: Cisco's install validation tool detecting an error.

In general, Cisco's installation tool is able to conduct a bare-metal installation as an automated bootstrap. It pulls all packages from the build server. Cobbler is used for PXE booting of a Linux VM. As the next step, it deploys OpenStack services in Docker containers. All the containers are started and maintained as Linux system processes.

Next page: Centralized logging and Runtime Network Regression Testing

Centralized logging and runtime network regression testing
All components of the NFVi create their own logs -- including OpenStack, the vSwitch, storage solutions and others. It is a standard and longstanding challenge for any IT systems administrator to gain a holistic overview of what's going on by aggregating system logs. When there is an issue, probably one of the many system log files will report it. The difficult issue is to monitor all these logs and triage their messages into those that are critical, important or less important.

Cisco presented a solution that enables enhanced centralized logging using the ELK stack, a public domain solution. All logs are collected by a component called LogStash Forwarder. A second component called Kibana Dashboard is used for gaining an overview of all logs, and a third component called ElasticSearch allows heuristic searches across the log database.

Figure 6: ELK logging overview

Figure 7: Kibana logging dashboard.

During the demonstration, live logs were shown to the EANTC team. Cisco provided a few basic searches across the database that did not show any major issues with the installation.

Runtime network regression testing
As we know, the fact that a solution worked at installation time does not mean it will run correctly forever. There are always reconfiguration activities taking place, either administratively driven or by intentional or unintentional hardware issues. It is important to check the functionality and performance of the NFVi frequently.

Cisco's VMTP is a Python application that contributes to this regression testing activity, covering the network connectivity. It automatically performs ping tests, round trip time measurement (latency) and TCP/UDP throughput measurements on an OpenStack cloud.

VMTP can be deployed and run by a command-line installer. This tool may be used to perform automated data path validation between VMs of a single tenant, between VMs of different tenants and between VMs in different LANs.

Figure 8: VMTP overview.

During our test session at Cisco's labs, we were able to witness VMTP running a number of preconfigured test scripts as shown in the following screenshot:

Figure 9: VMTP in action.

VMTP performed well in this standard situation. The EANTC team did not have the chance to inject an error to validate VMTP's ability to actually report issues, nor did we create a configuration deviating from Cisco's preconfigured demo environment.

Next page: High availability

High availability
The standard OpenStack high availability configuration is to deploy three OpenStack control nodes as a cluster. The control nodes are configured as active/standby mode using a "HA proxy" and a keep-alive daemon.

All OpenStack services except storage are deployed in active-active mode behind HA proxy. The HA proxy statistics page provides a consolidated view of the HA services and their status. It also gives overview of load balancing. During our session, Cisco showed the statistics report of a preconfigured system. We were able to witness at a glance that all services were running without issues.

Figure 10: Cisco HAProxy statistics screenshot.

The EANTC team did not have a chance to play around with high availability configuration. We did not validate how the HAProxy behaves if there is an actual issue in one of the components that it is supposed to report on.

CloudPulse health check
CloudPulse is an open source tool developed by Cisco that conducts health checks of the cloud environment. As Cisco explained, CloudPulse tests can be configured on both operator and application level. Tests can be configured to run periodically or on demand.

CloudPulse operator tests can:

Monitor the status of the cloud infrastructure
Monitor API endpoint and functionality
Monitor status of OpenStack services

In our session, Cisco demonstrated a number of preconfigured CloudPulse tests aimed at checking the health of OpenStack components (cinder, keystone, neutron, nova and so on). They were configured to run periodically. Since the OpenStack system was running correctly in our session, the tools did not report any issues, as shown in the following screenshot:

Figure 11: CloudPulse in action.

Cloud99 Pre-Deployment OpenStack Validation
Another tool contributing to Cisco's toolbox of configuration checks is Cloud99. It can be used to verify the behavior of the cloud once the administrator has deployed it and before it is put into production. This point in time is the best opportunity to run actual high availability and performance checks without affecting the production tenants negatively if anything fails.

In our session, Cisco demonstrated Cloud99's ability to perform high availability testing of OpenStack services and core infrastructure components.

Figure 12: Cloud99 solution overview.

Cloud99 combines OpenStack's internal test runners driven by Rally (an OpenStack testing language) with test harness tool Ansible and monitoring tool Nagios.

We verified its features by disrupting the OpenStack Nova API while running scale/performance tests through Rally. The artificial issue injected in the test caused the Nova API to restart frequently. Nagios and the OpenStack Health API were unaware of the disruption, but the Ansible tests caught it as expected.

Figure 13: Cloud99 Ansible dashboard.

Summary
EANTC witnessed a number of Cisco tools complementing OpenStack that are designed to improve the consistency of deployment, simplify the high availability options and improve operational checks. These tools are an interesting and worthwhile approach to provide added value on top of OpenStack, partly with the help of other public domain tools. Cisco users are traditionally CLI-savvy, as are Linux-oriented IT administrators. They will probably like the versatility of Cisco's toolbox.

Next page: Putting VNFs to the test

Putting VNFs to the test

As part of the first phase of our test of Cisco's cloud and virtualization portfolio, published in March this year, we looked at a small number of virtual network functions (VNFs) implemented by Cisco. (See Validating Cisco's Service Provider Virtualization & Cloud Portfolio.)

Back then, Cisco claimed to support 60 VNFs -- now the company says its ecosystem now includes more than 100 VNFs.

When the EANTC team returned to San Jose in September to conduct the NFVi tests, we took the opportunity to get some insight into two key VNFs: Virtualized Video Processing (V2P), a media service platform previously referred to as Cloud DVR; and the VPC-DI, the latest release of Cisco's virtual packet gateway for mobile networks.

Virtualized Video Processing (V2P)
As the vendor's team explained, the V2P "is Cisco's next-generation media service and applications hosting platform that provides the tools, frameworks, and containers required to host and manage standard media data plane functions. It includes software infrastructure containers that enable application decoupling and metadata storage. The applications range from Multi-Screen Live to Cloud DVR."

While this sounds relatively unexciting for the uninitiated, V2P solves a real operational problem that has bugged content providers: The media world becomes more dynamic every day, with new content channels sprouting all the time, yet the physical headend infrastructure to receive and transcode video streams and to stream the video data remains cumbersome to install, configure, scale and upgrade. As a virtualized service it would be much easier to bring up additional channels and to scale content streaming portfolios: It's that capability that Cisco invited us to verify, focusing specifically on the V2P Controller, one of the building blocks of V2P.

Figure 14: Virtualized Video Processing (V2P) applications overview.

Cisco explained that V2P includes functions such as real time ingestion of live channels and dynamic ingestion of video-on-demand (VoD) content, generating common manifests and index file formats, store media contents and media delivery.

The media functions comprise server instances called nodes, and logical clusters called endpoints. Each endpoint is composed of one or more nodes, and defines the (compute and storage) resources that are available in that cluster. The figure below shows the high-level architecture and functionality of the endpoints such as Media Capture Engine (MCE), Media Playback Engine (MPE) and Application Engine (App Engine), while a centralized log server (CLS) collects logging messages from each node.

Figure 15: High-level Cloud V2P architecture.

EANTC validation of V2P
In the test session at Cisco's labs, we used V2P to create a live video channel, experiencing the flexibility of configuration. We used a graphical workflow that integrated virtual resource pools of capture engines and playout engines. The V2P platform was deployed on an OpenStack environment.

Once we verified the V2P deployment on OpenStack, we started to register Cisco's Cloud Object Store (COS) cluster to the V2P controller. COS is a storage middleware implementation that can interact with hardware drives and software. Once the COS cluster was configured, the Cisco team executed a script (called 'cosinit') on the storage drive to register the storage node with the V2P Controller.

Figure 16: V2P GUI after storage connection establishment.

Next, we created a channel media source by entering a streaming type (ATS or TS) and source URL and then created a channel lineup referencing the media source previously created. Channel lineup is a workflow that allows a content provider to configure channels that can be ingested to certain subsets of users.

Figure 17: Stream profiles in V2P.

Then we created a 'publish template' and an HTTP header policy for the channel: Publish templates allows the addition of HLS, HHS, HDS, CIF and CIF-DASH-TS transcoder formats.

Figure 18: V2P's publish templates.

Finally, we created a live channel by configuring live asset workflows via the drag and drop GUI:

Figure 19: The V2P graphical user interface (GUI).

The GUI functioned as expected, and the detailed channel asset management view showed that the full configuration we entered correctly. We validated that our entries had been correctly accepted.

The templates were a very efficient way of configuring video content services. Given the limited time to investigate V2P, we were not able to verify the actual functionality or performance of the solution. Cisco asked us to see their application-layer configuration options, based on the OpenStack NFVi that we had evaluated in more detail in the previous sections.

We created templates for both HLS (for Apple IOS) and HSS (for Windows) and watched the channel simultaneously with both Apple and Windows-driven devices. In addition, we verified Cloud DVR recording, which allowed us to pause or rewind the video content.

Next page: Cisco's Virtual Packet Gateway VPC-DI

Cisco's Virtual Packet Gateway VPC-DI
EANTC has evaluated Cisco's mobile core components multiple times -- starting with Light Reading’s test of the full mobile portfolio in 2010, which included the physical packet gateway ASR5000. Later, we ran a performance and functionality test of the first virtualized version in 2013. (See Testing Cisco's Mobile Network, Part I.)

Back then, Cisco had ported the ASR5000 directly, keeping the configuration interfaces and source code.

Meanwhile, Cisco has implemented the next step with the 'VPC-DI' where DI stands for Distributed Instance -- an indication that the solution can now scale out per virtual component more flexibly. It is based on StarOS software that operates as a fully distributed network of multiple VMs. VPC-DI consists of two components:

Control Function (CF): Two control function VMs act as an active:standby (1:1) redundant pair
- Service Function (SF): Service function VMs provide service context (user I/O ports) and handle protocol signaling and session processing tasks. A VPC-DI instance can contain up to 46 SF VMs. A minimum configuration for a VPC-DI instance requires four SFs -- two active, one demux and one standby. We verified that vPC-DI could be brought up with Day 0 configuration using Cisco's Network Services Orchestrator (NSO). For this demonstration Cisco used a pre-configured blueprint to create a vPC-DI instance. The instantiation process was completed by spinning up SF and CF VMs. According to Cisco, the day1 or day2 configuration of VPC-DI is possible via NSO.
  However, it was not shown during the demonstration and Cisco used CLI commands and scripts for vPC-DI final configurations.
  Figure 20: Platform orchestration overview of Cisco's NFVi.
  We verified the system resiliency by performing card migration via command-line interface. In parallel, we verified that a previously initiated call was not terminated during the migration.
  Figure 21: Cisco VPC-DI virtual slots overview.
  Next, we terminated one VM (active Service Function) forcefully via NSO -- a scenario that might happen if the virtual service breaks down for whatever reason. It was auto recovered to standby mode -- as expected. The existing call continued to operate, at least on the control plane (there was no test equipment for the data plane connected).
  Finally, we performed live migration of vPC SF (VM) from one compute node to another using a CLI command. To maintain the active call previously established, Cisco used the Inter Chassis Session Recovery (ICSR) as an availability mechanism. In fact, the existing call was switched over to new chassis and continued to function. We did not measure how long (if at all) the data path was interrupted.
  Summary
  Having a powerful portfolio of VNFs ready is an important part of a cloud and virtualization strategy. Cisco quickly demonstrated two VNFs that contribute to this portfolio. The Cloud DVR/Virtualized Video Processing (V2P) platform looks like it can streamline a lot of operational functions for the content provider world, while its VPC-DI demonstrated functions ready to scale in a more granular way and to support VM operations. Both are worth a more detailed look from performance and high-availability levels in the future.
  Next page: Single Pane of Glass management
  Single Pane of Glass management
  The scalability of virtualized network services for really large network solutions is still a big unknown, hence service providers are experimenting quite a bit. Traditional enterprise cloud infrastructures may scale very well, but these usually lack the multivendor, open source approach that the telecommunications industry requires.
  In addition, service provider VNFs are more complex because they need to support highly available network service chains across multiple data center locations.
  So how can a network operator manage all this complexity? Only two years ago, this question would have attracted the disrespect of any seasoned Cisco operator, since the only viable answer was "using the CLI, of course!"
  In the eyes of most network designers and operators, the complexity of Cisco's feature-rich products was never met sufficiently by the company's graphical management tools.
  Well, the situation has definitely changed. As our journey across Cisco virtualized solutions in this article has shown, the multi-layered complexity of these solutions, the vast number of solution modules involved, and the lack of integrated configuration and troubleshooting interfaces in an open source world makes it virtually impossible to master the configuration using a command-line interface.
  So, surprisingly, there is both an opportunity and an overwhelming need for Cisco to come up with an integrated network management solution for its virtualized services portfolio.
  Multi-data center, multi-VIM service chain configuration
  During our visit to Cisco's labs in San Jose, the EANTC team verified a quite complex scenario that Cisco had preconfigured for our evaluation:
  There were two data centers that had to be managed using a single orchestrator (NFVO) and a single interface (management terminal).
  A network service with multiple components ('service chain') had to be provisioned using a graphical user interface, based on a catalog of service functions. Cisco chose a VNF service chain of vASA firewall and CSR router.
  The service chain function had to be coupled with a VPN network service that supports firewall security by chaining the required components. Cisco chose a dynamic client-to-site VPN as a Service.
  The network represented multiple data centers that would have been built at different times following different purchases: The scenario had to support both OpenStack and VMware ESXi/Vcenter.
  Finally and most importantly, all functions were to be handled by a "Single pane of glass" and all provisioning activities were to be integrated.
  The following figure illustrates the test bed architecture that was based on Cisco's vMS services:
  Figure 22: Cisco's Single Pane of Glass management architecture.
  The top-level component was Cisco's UCS Director (UCSD), its management system for NFV infrastructure. In addition to the virtual infrastructure, UCSD had to manage a physical test network:
  Figure 23: Physical test bed for UCSD evaluation
  To prepare for the scenario, Cisco had already preconfigured configuration templates in the UCSD catalogue. We used the first one in the diagram below -- the ASA and CSR virtual machine provisioning.
  Figure 24: UCS Director catalog.
  In this catalogue entry, it was easily possible to choose the virtual infrastructure management (VIM) type as either OpenStack or VMware vCenter. Being able to manage both vastly different solutions from the same graphical user interface looked like a big simplification to us:
  Figure 25: UCSD VIM selection in catalog entry.
  We continued with the use case evaluation, configuring services across two data centers – one that supported OpenStack VMM environment, while the other supported vCenter.
  Based on the preconfigured catalogue entries, UCSD simply did its job in a very straightforward manner. All resources and components were configured properly as expected.
  We validated the successful completion of provisioning using the UCSD interface:
  Figure 26: UCSD service completion status.
  In addition, we double-checked proper configuration using Cisco's network service orchestrator log files and OpenStack logs.
  Summary
  Cisco was able to demonstrate that its UCSD can function as a "single pane of glass" virtual services provisioning tool across multiple types of virtual infrastructures.