Service Management for the Telco Cloud
There is a resurgence in the use of Service Quality Management (SQM) for digital service operations. The reasons for SQM's growing importance are the introduction of the Internet of Things (IoT), which gained significant momentum in 2016, and the anticipated rollout of NFV-based services in 2017. In both cases it is expected that the current functionality of an SQM system will be extended to cover the high speed and scale demands of a digital service environment.
Both IoT and NFV, although essentially massive network transformations, will cause a tremendous impact on service transformation. While IoT technology is being introduced to ensure that manufacturing, cars, homes, cities and devices become more efficient and reliable, NFV enables customers to consume faster, on-demand and dynamically personalized/contextualized services such as IPTV, video streaming, mobile gaming and rich messaging.
As VoLTE and ViLTE (voice- and video-over-LTE) become the immediate technology levers for the launch of the digital services, stringent service level agreements (SLAs) will be formed to offer OTT-like services and benefits to customers. And the verticals that offer IoT services will need far higher support to maintain reliable, mission-critical connections between the IoT devices. SQM can help the CSPs in addressing the challenges of a new NFV and IoT service environment.
The following key changes in the network are shaping the re-definition of SQM in order to make it suitable for the digital environment:
1. Virtualization and SQM. The rising importance of SQM for NFV can be attributed to:
- The higher agility in creation, delivery, alteration and retiring of services. This inevitably means that managing and maintaining QoS will need to be equally responsive and agile. The iterative deployment and tearing down of services expects the Service Quality Management systems to monitor short-life services, lasting from a few days to a few hours, driven by events, location, customer context, etc.
- Dynamic network resources. The dynamic adjustments to network elements -- for example, capacity scale-up and scale-down, topology re-configuration and traffic route optimization -- have an immediate impact on the offered services. SQM needs to respond and align to these network changes.
- The hybrid nature of networks. In the hybrid (physical and virtualized) networks, digital services will be delivered over both parts. An SQM's system capabilities need to extend across all network types for an unbiased vendor-independent reporting.
2. Internet of Things and SQM. With IoT introduced to communication networks, service providers have the options of becoming IoT service providers, managed IoT service providers or simply bearers of the IoT traffic. In each case, the monitoring and assurance of IoT services poses a key risk to the new business of the CSP, since the quality criteria of the IoT services can be much higher compared to the traditional communication services. In addition, because of the wide variety of users (energy, health, robotics, manufacturing, automotive, etc.), the Service Quality Management aspect will need to introduce new dimensions to address specificities of each of the verticals. SQM will, hence, re-define for IoT as follows:
- Assigning high importance to service reliability and service availability as key service KPIs.
- Ensuring proactive maintenance in a high-scale operational environment.
- Faster service impact analysis to prevent network bottlenecks.
- A mechanism (through automation) for fast reaction to potential service failures.
- Visualization and prediction (through analytics) of service usage and geographic distribution by consumers and devices, in order to support creation of new IoT services.
The role of automation and analytics in managing NFV/IoT networks
In NFV and IoT environments, Service Quality Management needs to be more proactive, predictive and capable of offering rapid root cause analysis (RCA). Although RCA was ensured in traditional SQM when service degradation happened and, in many cases, a service impact or "what if" analysis was offered with it, the need to enhance these capabilities has increased significantly. Part of this requirement can be achieved by adding analytics to the SQM information, which provides more accurate failure prediction and a deeper assessment of service impact.
Additionally, automation across the SQM outputs helps in managing configurations. Also, by automating root cause analysis, the parent alarm can be quickly identified. Using service modeling and auto-discovery, the relationship with underlying network elements can be quickly ascertained and eliminated, reducing mean time to repair (MTTR).
However, an integrated approach of analytics, automation and SQM requires some drastic changes in the way service data is visualized and actioned in the Operation Center. The introduction of NFV with network functions, and of services hosted on common resources, inherently helps to achieve this integration to an extent. Use of open REST APIs also helps in connecting the OSS layers. Finally, hosting of OSS functionalities (analytics, automation and SQM) in the cloud can also accelerate the integration of the required functionalities of the Operation Center.
As discussed, for a next-generation digital service provider/telco cloud service provider, virtualization of network functions enables the creation and deployment of new services dynamically, with the time to market reduced from a few months to a few days. New telco cloud service assurance systems are being evolved, of which SQM forms a key component. The architectures of these next generation systems are based on REST APIs, big data clusters and OpenStack capabilities.
Other than the introduction of the new technologies to the underlying platform, it is important to develop a microservices architecture, which uses DevOps-enabled iterative processes to quickly respond to customer service needs by developing services faster. This is how the customer expectation of using new personalized/contextualized services every week or every few days will be realized. This also helps in conducting root cause analysis accurately and resolving customer issues quickly.
The SQM system should also integrate well with the Lifecycle Service Orchestration ecosystem to offer closed-loop assurance; this involves integrated dynamic inventory, service catalog-driven modeling and policy-driven service orchestration.
In summary, for the successful launch and long-term assurance of services in a hybrid network, to which there will be an added layer of IoT services, only a re-defined Service Quality Management system (dynamic, predictive and capable of offering rapid root cause analysis), will assure the expected digital service revenues.
— Sandeep Raina, Product Marketing Director, MYCOM OSI