& cplSiteName &

How to Prevent Your Data Lake From Turning Into a Swamp

James Crawshaw

In their book Building the Network of the Future, Mazin Gilbert and Mark Austin of AT&T describe the big data framework that the operator has adopted to process the 118 petabytes of data that pass through its networks each day (as of 2016).

The operator not only tracks the payload of data that traverses its networks but also captures and stores, for later analysis, myriad data from user devices, radio access infrastructure, core network elements (such as XDRs), Internet cloud infrastructure (for example, CDN logs) as well as the application data itself (such as website logs). Much of this data stored in a Hadoop-based system running on common, off-the-shelf hardware. Feeding the Hadoop distributed file systems is a data ingestion engine based on open-source tools such as Kafka, Flume and Scoop. Sitting on top of Hadoop (figuratively) are modules for analytics (SPARK), batch processing (Map Reduce), search (SOLR) and NoSQL (e.g., MongoDB, Cassandra).

While all this open source technology looks fantastically fun, operators should step back for a second and ask themselves if filling huge data lakes with streaming telemetry about network paths, traffic flows and performance is going to provide a valuable resource for analytics or simply rack up a rather large bill for storage infrastructure (albeit commodity hardware). After all, the key point of the exercise is to unearth some valuable insights from the data that enables them to improve the business, such as faster root cause analysis, reduced mean time to repair or earlier detection of security threats. Might they be better off applying a courser filter to the data they collect, focusing on the metrics which are likely to have a material impact on performance? Judgement calls about which data is worth keeping require networking expertise which may be lacking in the IT development team tasked with building the data analytics platforms.

How will service providers enable automated and efficient network operations to support NFV & SDN? Find the answers at Light Reading's Software-Defined Operations & the Autonomous Network event in London, November 7-8. Take advantage of this opportunity to learn from and network with industry experts – communications service providers get in free!

As this article notes: "The best strategy for data lakes is to only collect data that is useful now. Data loses its value over time and if you can’t find what you’re looking for in the mess that is the data swamp, it's pointless to keep adding to it. Projects should only go after sources that can provide useful solutions to clearly defined business problems."

To find out more about data collection best practices and what to do with the data once you have decided to store it (standard correlations, sophisticated machine learning algorithms, etc.), join us at Software-Defined Operations & the Autonomous Network event in London, November 7-8 for the panel Zero Touch Analytics – Delivering Insights In Real Time.

Operators want analytics tools to provide them with tangible insights: findings that are actionable, concrete and palpable. At the same time, they want these systems to be highly automated, employ artificial intelligence and be zero-touch. So palpable and zero-touch at the same time -- quite a challenge. I'll be discussing this, and more, with speakers from Atrinet, Netcracker and Telefonica.

— James Crawshaw, Senior Analyst, Heavy Reading

(0)  | 
Comment  | 
Print  | 
Oldest First  |  Newest First  |  Threaded View        ADD A COMMENT
More Blogs from Heavy Lifting Analyst Notes
Zen Internet, an alternative ISP in the UK, has ambitious growth plans and is looking to a refresh of its back office software, including the introduction of SDN capabilities, to help achieve its goals.
Almost 70% of service providers in this month's Thought Leadership Council (TLC) survey say they either already have or will move compute and application execution to the edge by 2020.
For CenturyLink, transformation is about enhancing its business in terms of effectiveness, cost efficiency and customer experience. So how is it trying to achieve that?
Open source MANO (management and orchestration) developments are providing network operators with something of a conundrum.
Featured Video
Flash Poll
Upcoming Live Events
October 23, 2018, Georgia World Congress Centre, Atlanta, GA
November 6, 2018, London, United Kingdom
November 7-8, 2018, London, United Kingdom
November 8, 2018, The Montcalm by Marble Arch, London
November 15, 2018, The Westin Times Square, New York
December 4-6, 2018, Lisbon, Portugal
March 12-14, 2019, Denver, Colorado
April 2, 2019, New York, New York
May 6-8, 2019, Denver, Colorado
All Upcoming Live Events
Partner Perspectives - content from our sponsors
One Size Doesn't Fit All – Another Look at Automation for 5G
By Stawan Kadepurkar, Business Head & EVP, Hi-Tech, L&T Technology Services
Prepare Now for the 5G Monetization Opportunity
By Yathish Nagavalli, Chief Enterprise Architect, Huawei Software
Huawei Mobile Money: Improving Lives and Accelerating Economic Growth
By Ian Martin Ravenscroft, Vice President of BSS Solutions, Huawei
Dealer Agent Cloud – Empower Your Dealer & Agent to Excel
By Natalie Dorothy Scopelitis, Director of Digital Transformation, Huawei Software
All Partner Perspectives