BT Leverages Open Source for Fifth Generation of Network Monitoring
James Crawshaw, Senior Analyst – Service Provider IT and Automation, Heavy Reading
In a world where telcos increasingly compete with public cloud providers rather than each other, they need to revamp their market propositions and adopt new technologies and processes to remain relevant. To that end, network operators need to modernize the systems that monitor their services and networks in order to deliver a more cloud-like experience to their customers.
But commercial solutions are still coming up short, which is why BT is developing its own monitoring system, based largely on open source, according to José Domingos, OSS assurance architect at the UK incumbent.
Domingos, who took part in a panel on Telemetry and Analytics at last year's Software-Driven Operations conference in London, says better network monitoring is key to delivering this cloud-like experience, and is a critical component of BT's next-generation, agile OSS.
"In terms of monitoring and reporting, telemetry and closed-loop control are key to supporting this cloud-like experience," Domingos observes. "Telemetry gives us a real time view of the network. But you also need to have the services and their performance exposed via APIs in order to give the customer transparency. At the same time, the network needs to scale and ideally never fail. The three things need to exist together -- elasticity, resiliency and transparency."
And that's not just for enterprise and wholesale customers but also includes residential broadband and mobile customers who might want to customize content filtering and other aspects of their service via a smartphone app. "Customers not only want self-serve portals where they can access a catalog of services, they also expect APIs they can use to consume those same services and run their own playbooks. They expect those services to be programmable and to be easy to integrate with their own infrastructure."
BT has more than 100 different systems for network monitoring and reporting today, some for regulatory reasons and some for operational requirements. Like many telcos, it has a heritage of legacy services, such as PSTN and private circuits, that are still widely used by enterprises and residential customers. At the same time, BT is constantly introducing new services such as SD-WAN and 5G. All these services need assurance, monitoring and reporting. "We have over 1 million devices monitored across the world every few minutes for hundreds of health metrics," notes Domingos.
"The first generation of monitoring and reporting systems, some of which are still in use, were built completely in-house, including the electronics. The second generation of systems was built, mainly in-house, around relational databases. The third generation was based on event processing and these were mainly commercial software. The fourth generation, also largely off-the-shelf software-based, is based on expert systems that analyze large volumes of data, look for correlations, and have intelligent alarming. The latest, fifth generation of systems is real-time and based primarily on open source," notes the BT man.
BT's target is to shut down 10% to 20% of the older monitoring and reporting systems every year. Domingos explains that during the past five years, BT has "decommissioned several hundred monitoring and reporting systems. We shut them down if a product or service is no longer in use, or if we migrate the functionality to another, newer, system."
Domingos notes that the decision to build or buy systems is cyclical. "When there are no standards, we use open source components and build it ourselves. Once standards are in place it makes more sense to move to commercial solutions where the development costs can be spread over a large number of operators, not just one."
According to Domingos, the capabilities he's looking for in fifth generation monitoring systems cannot generally be met by currently available off-the-shelf solutions. BT does buy some systems from vendors, but develops others using open source components to get greater agility, control and flexibility. "Suppliers say the standards for telemetry are not there yet," notes Domingos. "And there are no use cases for closed loop. But I ask -- how do you do elasticity, transparency and resiliency without that closed loop capability?"
— James Crawshaw, Senior Analyst, Heavy Reading