TalkTalk's OSS Rationalization
James Crawshaw, Senior Analyst – Service Provider IT and Automation, Heavy Reading
Network operators the world over are engaged in the mammoth task of rationalizing and updating their OSS and BSS systems -- and UK telco TalkTalk is no different.
Like many businesses, TalkTalk's IT systems have been expanded and extended over the years and resemble a somewhat overgrown garden. To find out how the operator is dealing with that software support systems challenge, we talked with Jonathan Best, principal software engineer, and Alan Elliott, senior software engineer, to get an update on the operator's OSS/BSS "Tech Debt" transformation. (See TalkTalk's 'Tech Debt' Transformation .)
Best and Elliott have been pursuing a simplification of TalkTalk's OSS estate, doing some much-needed pruning and trimming. There are two key aspects to this IT husbandry -- applications and databases.
Application dependency map
TalkTalk has more than 200 applications in its OSS architecture, each of which depends on various components (.net Windows services, web applications, APIs, web services). Best's team set out to create an application catalog that would identify all the dependencies between applications and components: For example, identifying which components are talking to which databases and APIs; and which versions of the OS and .net are being run on each server.
During the past few months the TalkTalk team has automated the collection of data, such as Microsoft Internet Authentication Server (IAS) web logs, from OSS production systems and mapped it into a bespoke application catalog hosted in Azure. The catalog uses Microsoft's Power BI and Splunk for data analytics and reporting. Using these tools, they have been able to discover the inter-relationships between the different OSS applications, creating a dependency map that allows them to see which components of which applications are talking to each other.
The application catalog has also enabled them to identify orphaned applications and components, which were no longer in use and could be retired. Thus far they have been able to remove more than half of all the web components that existed within the OSS domain because they were no longer in use. Similarly, the overall number of OSS applications used has been reduced by 25%. This simplified OSS has enabled TalkTalk to reduce cost and complexity: For example, when an operations team member has to upgrade a server, they now might only have to reinstall 100 APIs on the server, not 200 as before.
The application catalog is also being used by developer teams, operations team and service management to understand application dependencies. For example, change management can see what the impact of changing a particular component might have on other systems. In essence, the application catalog has "given us visibility into the sprawling estate of OSS," according to Best.
As well as considering the IT risk posed by applications, TalkTalk ran a separate project specifically looking at databases. The team undertook a deep dive on the entire OSS database estate and were surprised to find that this added up to about 700 databases, not including operational databases that are there to keep things running day to day. Many of the databases were replicated to multiple servers for load balancing. Data was written to a primary database, while multiple secondary databases would be used for reading data. Furthermore, the team used bespoke schema on the secondary tiers, which often got out of synch with the primary database, causing headaches for developers.
Database replication had become a complex problem for TalkTalk. Whenever a new system had needed some data, another (partial) copy of the database had been created to support that new system. According to Elliott, "It created a complex web of interconnections between our databases." Additionally, some databases were running old software on physical servers with no clustering or resiliency. Simply upgrading all of these databases would have been a wasteful expense, as the TalkTalk team knew that many of these databases were rarely or no longer actually in use. For example, some databases consisted of just one table and one stored procedure that was read just once per month by one application.
Elliott and his colleagues evaluated whether certain databases could be eliminated by merging schemas together or simply decommissioning (because they were no longer used). They formed a team comprising a scrum master, two analysts, two developers and a tester to evaluate all 700-plus databases. They ran traces on the servers for 33 days (to capture the month start and end periods) and analyzed all the connections coming in and out. The plan was to migrate the primary databases onto more powerful and resilient infrastructure, get them upgraded to the latest version of SQL server and then utilize the new availability group technology for read-only replicas. (Availability group replicas are complete copies of the original and hence avoid synchronization issues.)
Through this project, Elliott's team has reduced the number of databases from around 700 to near 400 currently. The long-term plan is to reduce this to around 200. This leads to cost savings both in licenses and the IT support (which is based, among other things, on the number of database instances).
Are you undertaking a similar strategy? If so, please get in touch at [email protected].
— James Crawshaw, Senior Analyst, Heavy Reading