IBM: Wait Is Over for Deep Learning
Or at least significantly shorter, thanks to IBM's recent breakthrough in deep learning performance, as IBM Fellow Hillery Hunter explains to Light Reading.
Deep learning is one of the most exciting elements of artificial intelligence, but it's also one of the slowest moving. IBM Fellow Hillery Hunter calls deep learning, which enables computers to extract meaning from images and sounds with no human intervention, a "rarified thing, off in an ivory tower," but a recent breakthrough by the company promises to make it more accessible -- and a lot faster too.
Last week IBM Corp. (NYSE: IBM) announced that its software was able to take the speed of training deep neural networks down from weeks to hours, or hours to minutes depending on the use case, while also improving the accuracy. It accomplished this by increasing the scalability of its training applications across 256 Nvidia Corp. (Nasdaq: NVDA) GPUs in 64 IBM Power systems.
IBM was looking specifically at image recognition and was able to train its model in 50 minutes. The previous record was held by Facebook at one hour. IBM says it achieved accuracy of 33.8% for a neural network trained on 7.5 million images. The previous record here was held by Microsoft Corp. (Nasdaq: MSFT) at 29.8% -- it's only a 4% increase in accuracy, but Hunter says typical past improvements have been less than 1%.
"We need this kind of breakthrough where we've integrated this capability for speed into a number of deep learning frameworks and packages and provided it out to customers," she says. "It's the kind of thing that changes the rate and pace of an artificial intelligence capability like this."
Hunter walked Light Reading through IBM's recent breakthrough, why it matters, timelines for deployment and what it means for the telecom industry, specifically, in a recent interview. (We even threw in one Women in Comms-centric question.) Read on for a lightly edited transcript of our discussion.
Figure 1: IBM Fellow Hillery Hunter and her team have developed software with image recognition accuracy of 33.8% for a neural network trained on 7.5 million images.
(Source: IBM)
Light Reading: IBM's big breakthrough here centered on the speed to analyze and interpret data. How much have you improved it, and what's the ultimate goal?
Hillery Hunter: It is very interesting because deep learning works by feeding the neural networks many pieces of data where that data has been labeled. Pictures, for example, are labeled with the type of object, animal or place that they are, and the neural network ingests millions of those pre-labeled pictures. In some sense it's like reading a book -- it's given information that's known to be safe and correct, so it learns, and then you test it against a set of pictures or data it's never seen before. That is how you come up with validation that it learned correctly from what it was given. It's one of the few areas in modern computing where people wait weeks for full results. There are not a whole lot of others we would tolerate that in.
Deep learning has shown itself to be really effective, especially at speech. When you're talking to a phone, it drives speech recognition today. When you are using social media and images are auto labeled with the person or place it was taken -- all those things are deep learning, and it works well. It's now being applied to other things in the enterprise like credit card fraud and risk and things like that because it works well and you can get to even better than human accuracy with the tasks you are doing with deep learning. Because of that, people have tolerated really long learning and model training times. We are aware of many, many cases where deep learning researchers are waiting weeks to get to the results they need. We see that as unacceptable. We want to get it into hours and then for jobs that are smaller, take it from hours to seconds. That could be transformative, to get to seconds. It depends on the use case and how much data you are feeding the model. It takes so long because you want the model to work well so you feed it lots of data to learn the task.
LR: How much human involvement is required in deep learning processes?
HH: With deep learning, it doesn't require feature engineering. In many other types of machine learning that are used for artificial intelligence, the human has to specify the features that help the computer identify the task it's trying to do. This is not mathematically how it works, but conceptually, in order for a computer in non-deep learning communities to learn what houses look like you'd have to pick out that a house has two sloping lines for a roof and two sets of walls; it has "n" windows and a door. You would have to pick those things out and say, "These make up a house." Every time the computer would see all those features in one picture, it'd say, "I think this is a house." In deep learning, you just throw data at the computer with the right answer, and it sees enough house pictures that it figures out the features itself. You've shown it many houses, so it figures the features out. It does require you show it many houses, because it has to find those common features, but it saves user a lot of time because they don’t have to identify all the features that comprise a specific object.
LR: What are the use cases for deep learning that are most exciting in the world of telecom and network operations?
HH: Deep learning is definitely looking like it could play a role in cybersecurity. In general, it has been shown to be used for data center efficiency in terms of managing power and cooling in a data center. I would go with security, data center management and also being used to understand customer characteristics and demand forecasting. There are lots of potential use cases.
Next page: Deep learning in the enterprise and more
LR: Is deep learning being used successfully today in enterprises?
HH: It is being used commercially today. It's the basis for a good number of applications you have today that have speech recognition. People are starting to try to use a little bit of deep learning in chatbots for customer service. It's not too broadly deployed in that context. The social media providers are using it when you upload pictures to your social media and they are automatically labeled. We are starting to see a lot of enterprises look at it as a core detection technique. The computer never gets tired looking for anomalies or fraud or other things like that so in a lot of other cases, it's definitely showing promise.
LR: What kind of timeline are we looking at for deep learning to be mainstream in the enterprise?
HH: I think that the type of advancement we're showing here we hope will put us on a different inflection rate on our deep learning capabilities in general across a variety of use cases. The wait time has meant you really have to be using it in a space where it's worth waiting weeks for the answer; where it has to be far more effective than any existing technique to be worthwhile. With the productivity of deep learning science dropping to hours and minutes, we hope this will change the rate of adoption. In general, it's taken several years every time people try to apply deep learning to a new type of data and problem for the model and capabilities to be mature and understood enough to be comfortable deploying them. In a number of use cases we talked about, we're a couple years from deployment, but we hope this type of innovation will help pull those learning times in and help people get to models that are understood and well validated and have high accuracy such that they can be deployed faster.
LR: What type of investment does deep learning require for the average enterprise?
HH: The process of doing deep learning usually starts with data collection. Some enterprises are in excellent shape in terms of that. They have stats and characteristics of what they're running all ready to go and they understand anomalies and what they are looking for. Data is labeled. For others, it's a matter of beginning that process of data collection. That's true of any artificial intelligence, not just deep learning. You have to have your data in order. Once you've done that, the process of applying a deep learning model actually goes relatively quickly. There are a good number of models available in open source, so a lot of teams are refining models to get the end outcomes they are looking for. It's a highly iterative process of applying data to model to see if you got the outcome you wanted and rinse and repeat, effectively. Then actually deploying a model, doing the inference of the model, just means putting it in where the new events are happening. You have a system running already doing some type of monitoring or measurement or prediction, and you just enhance that system with this model. There is a data collection stage, a data discovery and exploration stage to get the model right and the deployment stage where you have some system -- you have something going today already and you are going to essentially increase the accuracy of that, make it better, using the model. For deep learning specifically, that's the big question.
LR: What is the business model for IBM? How will you commercialize deep learning?
HH: We are particularly excited about the commercialization aspect of this project. Deep learning is in its early days with really long wait times, but people are starting to explore new data types using it. It's also a very hot topic in the research community. My team and I sit in the research division and work closely with our cogitative systems organization and IBM server group. We can provide this breakthrough now in a technology preview through IBM's server group. It's available online now for IBM customers to download and try. We deliver it as a binary distribution so they don’t have to worry about how to install it or what all the dependencies are. Open source bases can be complicated and tough to get going with. We're taking an approach of providing in a "download and go" way with people who have IBM Systems. You can get going using it in the cloud with IBM Systems or a customer site if you have data privacy concerns or other restrictions. They can purchase IBM Systems and download software and get going with it.
LR: What is your biggest piece of personal advice for women building their careers in the comms industry?
HH: One of the greatest pieces of advice I have for women is to get a strong set of mentors. I have been privileged to have a number of mentors in my career. They each help me with different types of things I try to deal with be it dynamics in my team with coworkers or where should my career be in a couple of years -- different types of questions. For me, that's been one of the absolute key aspects to being a woman in tech -- to have a set of people, male and female, who have been advocates and have sanity checked what I'm thinking. They have encouraged me when I didn't enjoy what I'm working on and wasn’t sure of next steps and have always been there to talk me through whatever it was I needed to understand better about my career. I think I see a lot of women will hesitate to ask people to mentor them. I encourage you to just do it -- everyone is almost always flattered to be asked to serve in a mentoring role. It really can only help.
— Sarah Thomas, , Director, Women in Comms
About the Author
You May Also Like