Close the Loop to Automate Service Assurance

The separation between obtaining service assurance data and acting on it is going away, letting the network automatically address issues.

September 14, 2017

13 Min Read
Close the Loop to Automate Service Assurance

Service assurance data will play a key role in how network operators bring automation to their service engines, but how that data is used is changing significantly in the process. Increasingly, service assurance data will be part of "closed loops" that use constant assessment of network traffic and performance to optimize service quality, detect threats and do other things on an automated basis, without human intervention.

This shift is all part of the virtualization transformation, and plays a key role in network operators' ability to offer new services which change with customer needs, without requiring network reconfiguration or a lot of human touch.

Vodafone Group plc (NYSE: VOD) is testing these closed loop systems in its laboratories and has built them in at multiple layers of its target architecture for software-defined networking/network functions virtualization, says Lester Thomas, chief systems architect. The company is following the TM Forum 's Zero-touch Orchestration Operations Management (ZOOM) model, moving away from building static networks as solutions to customer problems and using a more virtualized network platform to create an intent-based model and deliver services that can adapt to changing needs.

Figure 1:

"You can orchestrate a network solution that meets that intent but then you constantly monitor that using your assurance," Thomas explains in an interview. That constant monitoring will, in a closed loop system, optimize how the service is delivered in real-time. That includes immediate recognition and remediation of any issues -- including congestion, outages or security problems -- and automatic optimizing of network usage, which benefits both the customer and the network operator's bottom line.

"We need to optimize our costs," Thomas says. "The concept is you have this closed loop at multiple levels in your architecture, to constantly optimize and remediate any issues in delivering that service."

For example, as Vodafone demonstrated earlier at Mobile World Congress, if a customer's intent is to have a secure network, instead of providing a static virtual private network (VPN) service, the carrier delivers what it calls VPN Plus, which is a software-defined equivalent that comes with added network functions. (See Vodafone Live NFV Use Cases Powered by Amdocs .)

"As part of the closed loop assurance, we were constantly assessing where there any threats to the network and if the source of data showed that the network was under attack, the closed loop actually installed additional firewall capabilities facing something like a distributed denial of service attack," Thomas says. "The customer's intent was a secure network, so the implementation changed dynamically based on what was happening in the network."

Track the heartbeat of the virtualization movement with Light Reading at the NFV & Carrier SDN event in Denver. There's still time to register for this exclusive opportunity to learn from and network with industry experts -- communications service providers get in free!

Those additional resources were automatically added, without human intervention, until they were no longer needed, and then the network was again optimized, he said. "When the attack went away, it is optimizing for our cost, because it's a cost to us and to the customer, in terms of latency of their service, to have additional layers of firewalls when you're not under attack."

The notion of closed-loop versus open-loop systems in the service assurance world is one of the key evolutions taking place right now to enable automation of back office operations, says Vikram Saksena, CTO and vice-president of engineering at NetScout Systems Inc. (Nasdaq: NTCT).

"That is an evolution of service assurance, more into a closed loop kind of a format as opposed to an open loop," he says in an interview. "The way things used to be done in the physical world was that the action part was separate from the analysis of network data and the predictions that were being made. Now, with virtualization, service providers want to close the loop and automate so there is hardly if any human intervention."

Saksena views this transition as part of the network operators' drive to be more data-driven and sees service assurance data playing a key role in other ways as well. "We are going from pure-play service assurance focused on network operations to automating all of that, plus helping the planning folks as well as helping the customer care organization in using the data for their own needs," he comments.

The ability to become more proactive and predictive based on service assurance data "takes advantage of the fact that certain elements of the network are programmable and more intelligent than before and can pass real-time information back to you about an event," says Amol Phadke, global network strategy and consulting practice lead at Accenture . "In some cases, the environment is getting simpler, to be more proactive."

Companies such as Centina Systems Inc. are using this network transition to position their technology as a new breed of service assurance, specifically targeting the layers of a virtualized architecture, down to the NFV-Infrastructure, and up through virtual machines to the virtualized network functions themselves, says Anand Gonuguntla, Centina's CEO.

He sees a number of proofs-of-concept for this new world of service assurance, and some frenzy around assuring hot new services such as SD-WANs, but warns that not everyone is ready to make the leap.

"I don't think [network operators] are there yet, in terms of production readiness and mass deployment," he says.

Next page: The benefits of closed loops and smarter service assurance

The benefits of closed loops and smarter service assurance
Among the drivers for using service assurance data in new ways are the need for better customer service, more efficient use of network resources and faster delivery of new services to the market without sacrificing quality.

"There is a need to use our data for just-in-time resource management," says NetScout's Saksena. "A lot of the network planning folks who are involved in upgrading capacity and planning the overall network also want to automate the whole forecasting and management problem because now a lot of resources can be deployed as virtual machines as opposed to physical hardware. So they are looking to see if they can do just-in-time resource management to avoid long cycle times for adding resources which was a difficulty in forecasting and planning in the physical world."

Figure 2:

Service providers used to take up to nine months to design, test and deploy a service but now are moving much faster to get to market, and want constant automated testing and service assurance to guarantee performance, says Ross Cassan, director of product marketing for mobility infrastructure at Spirent Communications plc .

"In place of the waterfall method, they are moving to a virtualized platform and a streamlined process," he says.

Virtual test agents are deployed live to validate what is happening in the network and make sure that the platform is running as it is supposed to, with the right amount of compute power, storage and memory, Cassan explains. The orchestration system determines interconnectivity of the virtual machines involved and controls the VMs, and then, once the virtual network function is deployed, it is automatically tested and validated, with whatever service -- a virtual evolved packet core, firewall or other service -- emulated across the full network stack.

"And then we move into deployment stage and implement that into the service chain, with a quick validation to the entire service chain. We then give a signal to the orchestrator and the service is ready to go," he adds. "That is a completely automated process, we have it deployed in customers' networks now. It is not just a demo."

Vodafone's approach -- having a closed-loop deployment in each layer of its architecture -- is intended to support the ability to model services, catalog them and then orchestrate that service and assure it, Lester says.

"We have that closed loop model in each technology domain," Thomas says. "I might have an SDN connectivity domain, and I might have network functions domain, and I might have edge computing domain. I have a number of technology domains, each with that closed loop happening. The component service is constantly monitoring and remediating and scaling whatever it needs to maintain the component service."

There is also closed loop assurance of the end-to-end services that are being delivered to a customer, based on their intent/needs, that takes into account failures in any of the underlying technology domains, the addition of new sources or a change in the customer's intent over time, he adds.

Centina, which rolled out its vSure product in late 2015 and has participated in multiple PoCs and TM Forum Catalyst projects along with Vodafone, is now seeing an urgency among operators to push forward with newer approaches to service assurance, says Gonuguntla. As the SD-WAN market heats up, there is more of a business driver to get moving, he says.

Next page: The challenges to doing all this

The challenges to doing all this
Virtualization is one of the reasons all this closed-loop service assurance can happen, but the big messy transition from legacy to virtualized is also part of the challenge around doing this.

As Accenture's Phadke points out, automating a production environment using a DevOps process is great, but it's not really useful until the functions you put into that environment are able to automate themselves. That's why automating functional interoperability is a focal point of a lot of open source work right now, in groups such as ONAP.

Figure 3:

"When an operator wants to deploy a service to a business or a consumer, how does that overall service work across disparate functions, from many suppliers, on a common infrastructure layer that could be OpenStack or VMWare, and then how do they also deal with the legacy, the non-virtualized elements, taking on the traditional IT stack that the operators have built up over the last few decades?" he asks. "That is a non-trivial problem because each stack has different layers of automation and not many of them interoperate." That real-time view of what is happening in the network is essential as services become more dynamic, but it can be more than some traditional service assurance systems can provide, Thomas says.

"Your network function is installed one minute and it's not installed the next," he says. "It scales one minute and it scales back the next. It's a very dynamic nature to the actual network services but our traditional assurance systems weren't designed for that level of dynamism. [It's] one of the challenges we are working with new vendors on in this space, [so that] we may have better capability to integrate on this data-centric approach."

New systems must specifically address how to correlate the resource status across the overlay networks -- the virtualized piece -- and the underlying networks, which include physical resources, he notes.

"How do I correlate underlay and overlay resources so that a failure in the optical transfer part of our network can be correlated with which customer services would be impacted?" Thomas says. "Part of the problem is in this data-centric or dynamic data realm is how you would actually correlate this together so you can even do root cause analysis and then make your recommendations of what needs to be remediated in this closed loop manner."

That blending of service assurance from the old and new parts of a network operation is a key challenge to automation, agrees Accenture's Phadke, and one facing every network operator to some degree.

"When you have to bridge between a hardware environment where a lot of assurance calculations were done offline and the new world where a lot of things are online and have been done real-time, that is where there is the challenge lies," he says.

Doing all of this in a multi-vendor environment is another significant hurdle, says Centina's CEO. Network operators don't want to get locked into one vendor's gear or a limited set of VNFs, he notes, but they do often look for one vendor to provide an over-arching view of the landscape that will allow automation of service assurance. "Most of the carriers we are seeing are allowing someone to be the prime, to pull together what is needed," using applications programming interfaces and other tools, he notes. "There isn't really any standard way to do this yet."

The TMForum, in advancing its ZOOM work, also is working on how to build a standard platform that is a practical blueprint for managing a hybrid infrastructure, says Barry Graham, The TMForum's Agile Program director. One key challenge that work is addressing is a consistent way of communicating with existing infrastructure, so that every separate function doesn't require special treatment to be automated. The TMForum will have more to say on this topic shortly, he says.

Companies such as NetScout, which did traditional service assurance, are broadening the scope of what that data can do and also changing their own processes, says Saxsena.

"We have primarily focused on network traffic as our main source, but now there are other sources of data -- device statistics and machine logs and flow data and session records," he says, and NetScout's customers want those taken into account. "The process is now getting to be how do I harmonize my multiple sources of data into a unified data set -- what they call metadata -- which is contextual, timely relevant, structured, compact and actionable. What we have done is created this harmonized metadata that will allow them to drive all these use cases."

As network operators move off legacy hardware onto so-called COTS or merchant hardware, the need for all this to happen will only intensify, however, since the five-nines type of protections won't be built into the newer hardware, says Spirent's Cassan. An active test and monitoring system needs to be driven by pre-established analytics so that when certain performance metrics are breached, there is an automatic and instant response.

"You have to be able to allow the network and orchestration system and the test controller to go in and isolate what is causing the problem and do that in an automated and very quick fashion," he comments. "That is what is what you are going to need to deliver a five-nines service on a three-nines platform."

Ultimately, says Phadke, artificial intelligence and machine learning come into play here as well, but that's a story for another day.

— Carol Wilson, Editor-at-Large, Light Reading

Make sure your company and services are listed free of charge at Virtuapedia, the comprehensive set of searchable databases covering the companies, products, industry organizations and people that are directly involved in defining and shaping the virtualization industry.

Subscribe and receive the latest news from the industry.
Join 62,000+ members. Yes it's completely free.

You May Also Like