& cplSiteName &

IBM Software Helps Speed Up Deep Learning

Scott Ferguson

IBM Research is looking to speed up the time it takes for deep learning to understand and recognize images and sounds with a new software library that allows the technology to work across multiple servers and GPUs.

On Tuesday, IBM researchers are releasing two blog posts that detail their research into what Big Blue calls its "Distributed Deep Learning" (DDL) software library. The library consists of multiple APIs that work across different open source machine learning frameworks, which then allows deep learning to scale across several servers running hundreds of GPUs.

The software is part of IBM's Power AI, the company's distribution platform for machine learning and artificial intelligence on the company's Power server systems. (See IBM Brings Open Databases to Private Cloud.)

One of the current problems with deep learning is that it takes a long time for the technology to recognize and "learn" different images and sound. Part of that problem is getting the technology to scale up to take advantage of larger server clusters that hold GPUs, Sumit Gupta, IBM's vice president of HPC, AI and Analytics, wrote in one of the August 8 blogs.

(Source: Geralt via Pixabay)

"At the crux of this problem has been the technical limitation that the popular open-source deep learning software frameworks do not run well across multiple servers," Gupta wrote. "So, while most data scientists are using servers with four or eight GPUs, they can't scale beyond that single node."

The example the researchers used is a neural network that took 16 days to be trained to learn to recognize images using an IBM Power "Minsky" server using four Nvidia GPUs accelerators. When they applied the newly developed DDL software, the researchers were able to scale over tens of servers with hundreds of GPUs and reduce the learning time to seven hours.

The IBM researchers created the software and algorithms using parallelization techniques that allow that task to be spread across multiple GPUs at the same time. The idea is to take advantage of the GPU structure, which has multiple, parallel cores.

Keep up with the latest enterprise cloud news and insights. Sign up for the weekly Enterprise Cloud News newsletter.

"But as GPUs get much faster, they learn much faster, and they have to share their learning with all of the other GPUs at a rate that isn't possible with conventional software," wrote Hillery Hunter, a research memory strategist and Director of the Systems Acceleration and Memory Department for IBM, in the other blog.

"This puts stress on the system network and is a tough technical problem," Hunter added. "Basically, smarter and faster learners (the GPUs) need a better means of communicating, or they get out of sync and spend the majority of time waiting for each other's results."

At the end, IBM researchers were able to run various machine learning across 256 GPUs.

(Source: IBM Research)
(Source: IBM Research)

For now, the new DDL software library is available in a technical preview. It is part of Version 4 of the PowerAI deep learning software distribution.

IBM is releasing the first set of APIs to work with TensorFlow, a machine learning workflow developed by Google, as well as Caffe, another open source workflow. (See Google's TPU Chips Beef Up Machine Learning.)

Later, IBM plans to add support for two other machine learning workflows: Torch and Chainer.

Related posts:

— Scott Ferguson, Editor, Enterprise Cloud News. Follow him on Twitter @sferguson_LR.

(4)  | 
Comment  | 
Print  | 
Newest First  |  Oldest First  |  Threaded View        ADD A COMMENT
More Blogs from Scott Ferguson
From its roots in industrial farm machinery and other equipment, John Deere has always looked for a technological edge. About 20 years ago, it was GPS and then 4G LTE. Now it's turning its attention to AI, machine learning and IoT.
Artificial intelligence and automation will become more integral to the enterprise, and 90% of all apps will have integrated AI capabilities by 2020, according to Oracle CEO Mark Hurd.
IBM is now offering access to Nvidia's Tesla V100 GPUs through its cloud offerings to help accelerate AI, HPC and other high-throughput workloads.
CIO Rhonda Gass is spearheading an effort to bring more automation and IoT to the factories making Stanley Black & Decker tools and other equipment.
Workday is looking to build out its machine learning and artificial intelligence capabilities with the acquisition of startup SkipFlag.
Featured Video
From The Founder
Ngena's global 'network of networks' solves a problem that the telecom vendors promised us would never exist. That doesn't mean its new service isn't a really good idea.
Flash Poll
Upcoming Live Events
March 20-22, 2018, Denver Marriott Tech Center
March 22, 2018, Denver, Colorado | Denver Marriott Tech Center
March 28, 2018, Kansas City Convention Center
April 4, 2018, The Westin Dallas Downtown, Dallas
April 9, 2018, Las Vegas Convention Center
May 14-16, 2018, Austin Convention Center
May 14, 2018, Brazos Hall, Austin, Texas
September 24-26, 2018, Westin Westminster, Denver
October 9, 2018, The Westin Times Square, New York
October 23, 2018, Georgia World Congress Centre, Atlanta, GA
November 7-8, 2018, London, United Kingdom
November 8, 2018, The Montcalm by Marble Arch, London
November 15, 2018, The Westin Times Square, New York
December 4-6, 2018, Lisbon, Portugal
All Upcoming Live Events
Hot Topics
Dell CTO: Public Cloud Is 'Way More Expensive Than Buying From Us'
Mitch Wagner, Mitch Wagner, Editor, Enterprise Cloud, Light Reading, 3/19/2018
Eurobites: Cambridge Analytica Feels the Heat
Paul Rainford, Assistant Editor, Europe, 3/20/2018
Is Business Voice Rapidly Fading?
Carol Wilson, Editor-at-large, 3/15/2018
HR: Cable Dominates US Broadband
Carol Wilson, Editor-at-large, 3/21/2018
Animals with Phones
Live Digital Audio

A CSP's digital transformation involves so much more than technology. Crucial – and often most challenging – is the cultural transformation that goes along with it. As Sigma's Chief Technology Officer, Catherine Michel has extensive experience with technology as she leads the company's entire product portfolio and strategy. But she's also no stranger to merging technology and culture, having taken a company — Tribold — from inception to acquisition (by Sigma in 2013), and she continues to advise service providers on how to drive their own transformations. This impressive female leader and vocal advocate for other women in the industry will join Women in Comms for a live radio show to discuss all things digital transformation, including the cultural transformation that goes along with it.

Like Us on Facebook
Twitter Feed