TalkTalk's 'Tech Debt' Transformation
James Crawshaw, Senior Analyst – Service Provider IT and Automation, Heavy Reading
In the wake of a damaging cyber attack in 2015, TalkTalk embarked on a cyber breach remediation plan for more than 600 applications and a broader plan to deal with the UK operator's 'tech debt' of legacy IT systems.
TalkTalk is a UK-based CSP providing consumer broadband, telephony, pay-TV and mobile services to around 4 million customers. In the UK consumer broadband market TalkTalk has the fourth-largest customer base, behind incumbent telco BT, satellite TV operator Sky (part owned by 21st Century Fox) and cable operator Virgin Media (the UK arm of Liberty Global).
Following a cyber attack in 2015, TalkTalk embarked on a significant transformation program to fundamentally restructure its software applications, network and IT infrastructure and estate. We recently met with Philip Clayson (LinkedIn here), TalkTalk's technology director, who described the transformational journey he and his team have been on for the last two years. (See TalkTalk Plummets on Security Woes.)
Clayson, who reports to the group CTO, leads an engineering team of 650, around 150 of which are based in the UK. In the immediate aftermath of the cyber attack, Clayson was tasked with creating a cyber breach remediation plan for more than 600 applications across TalkTalk's consumer and enterprise divisions. He also defined a plan to address more than 15 years of legacy IT that the company had accumulated over numerous acquisitions since its formation in 2002. Clayson refers to this legacy IT estate, comprising 84 million lines of code and 400 applications, as TalkTalk's "tech debt."
Trading Risk methodology
After the cyber attack TalkTalk developed a new "Trading Risk" approach to the evaluation of the tech debt in its software applications. This is similar to the Risk Register template in the PRINCE2 project management methodology, but more oriented towards the commercial capability of the business, e.g. systems needed to collect revenue and issue bills.
The Trading Risk system assigns a score to each IT system based on six fundamental variables each of which is a function of around 20 second-order variables. The six main measures are:
- Security -- mapped to the NIST framework for measuring security (see here)
TalkTalk's board decided a weighting factor for the six measures. Additional weightings are based on whether an application is customer facing or internal and whether the data it carries is highly sensitive (using a similar approach to the imminent GDPR regulation). Once the final risk score has been calculated, applications are categorized into four bands: low, medium, high, and critical risk.
The Trading Risk is visualized with the aid of a diagram that shows the dependencies between different IT systems across the entire IT estate comprising online (customer-facing portals), BSS and OSS, and enterprise data warehousing. With 400 different applications (down from 650 in 2015) the diagram is so large it has to be plotted on A0 paper (841 x 1189 mm) for use in decision making.
Note, the diagram is a live dashboard, not a static image. It takes advantage of the live data capabilities of Visio to give an up-to-date date view of the IT estate with periodic refreshes. As software gets re-platformed and made more secure, or retired, the data model gets refreshed and the diagram is updated.
Trading Risk as an input to the investment process
With this new approach TalkTalk is able to get a measure of the "trading" risk of its software footprint. Additionally, they can see what the likely improvement in trading risk will be for different IT investment options. For example, moving a particular database from a standalone implementation to an active-active arrangement might reduce trading risk by [n] points. A similar sized investment in middleware might only reduce trading risk by [n/2] points.
Clearly, TalkTalk also looks at the ROI potential of different capex options. However, often with IT modernization the financial return is ambiguous. As such, the trading risk reduction metric provides an alternative way to objectively evaluate investment options.
The new approach appears to have impressed TalkTalk's partners, some of which have invited Clayson to spend time with their other customers in other industry verticals, such as banking, to outline the new methodology.
Progress so far
There has been significant change at TalkTalk over the last two years. While the first few months following the cyber attack were taken up fixing urgent vulnerabilities, subsequently TalkTalk has had a more structured approach to remediation aided in large part by the "Trading Risk" approach. In the last 18 months since the new methodology has been running, TalkTalk has reduced the trading risk of the business significantly.
By March 2018 Clayson aims to have taken out half of the tech debt that TalkTalk had at the end of 2015. While this technical debt wasn't insecure, it was "cluttering up" the software estate, according to Clayson. It consisted of many applications and around 4000 servers that were no longer required as TalkTalk migrates more of its software applications into the public cloud. Clayson sees running applications on public cloud as more operationally effective than on-premises infrastructure for which you must take responsibility for continual updating and patching. Outsourcing those tasks to a Tier 1 provider such as Microsoft with their Azure solution is an easy way to reduce trading risk, in his view.
OSS vs BSS
TalkTalk's BSS estate is largely commercial off-the-shelf (COTS) product that has been customized for their particular needs. BSS comprises around 30 discrete pieces of software. Examples include CRM, business process management, and billing as well as several components from CA, HPE and Micro Focus (which recently acquired a large part of HPE’s software business). These different applications have been integrated using Java and other middleware, according to Clayson.
Conversely, TalkTalk's OSS is mainly developed in-house based on Microsoft technologies. It comprises more than 200 distinct software systems which is far greater than the number of BSS systems. Nonetheless, the number of OSS systems has been reduced significantly since the cyber attack and is the main contributor to the overall reduction in software systems that TalkTalk runs, which has fallen from 650 pre-attack to around 400 today. The OSS applications are maintained by a team of around 150 developers in Manchester, UK, a team that continues to grow rapidly under Clayson.
Even though Clayson is keen to move some of the OSS estate to COTS he believes there will always be a place for "home grown" software as it allows them to make changes more quickly than if they were reliant upon a supplier. Feature requests from suppliers can often take a year or more in delivery and involve considerable cost.
One big difference between the COTS offerings for BSS and OSS that Clayson has noted is that while BSS solutions are widely available as-a-service, OSS providers are only just now beginning to think about making their solutions available in the cloud. Clayson sees the SaaS model as more attractive for BSS and OSS as it allows him to spend less on infrastructure.
In terms of new technology introduction, Clayson believes that if he can get TalkTalk's tech debt down to 10%-20% then the company would be ready to start migrating the code base into more recent technology approaches, for example a microservices architecture.
Clayson and his engineers are also keen to use more open source solutions within their IT estate. The tooling they buy today is from best-of-breed suppliers. But Clayson regards open source as cheaper and more fun for the engineers as they get to build systems themselves rather than just implement and maintain someone else's. When the tech debt is down to an acceptable level, TalkTalk might start using more open source to build their own tools but for now Clayson believes they will be able to move quicker using suppliers COTS solutions.
Clayson notes TalkTalk is still in "catch-up" mode and the task of consolidating and modernizing the software application estate is mammoth. But he is very pleased with the progress so far and very proud of his team whose commitment has been "phenomenal" with everyone pushing for the same result. In Clayson's words: "It's all the people in my teams that have delivered this monumental reduction in Tech Debt."
— James Crawshaw, Senior Analyst, Heavy Reading