IBM, Hortonworks Team on Machine Learning, Data Science

Hortonwork and IBM are expanding their partnership to include machine learning and data science for the enterprise.

Scott Ferguson, Managing Editor, Light Reading

June 13, 2017

4 Min Read
IBM, Hortonworks Team on Machine Learning, Data Science

IBM and Hortonwork are now looking to make it easier for enterprises to use Hadoop to crunch huge data sets and run other big data projects with the help of data science and machine learning.

At the Data Works and Hadoop summits in San Jose, Calif., on June 13, the two companies are announcing that they will combine Hortonworks' Data Platform (HDP) with IBM's Data Science Experience platform and Big SQL, the company's SQL data base engine for Hadoop.

IBM recently announced a version of its Data Science Experience, an integrated developer platform that provides real-time analytics to help crunch large data sets, for private cloud. The first version ran on public cloud. (See IBM Brings Data Science Experience to Private Cloud.)

Specifically, Hortonworks plans to resell IBM's Data Science Experience platform with HDP, which is the company's Hadoop distribution. Hortonwork also plans to adopt IBM's platform as one of its strategic data science platforms.

Figure 1: The elephant in the room. (Source: Hadoop) The elephant in the room.
(Source: Hadoop)

At the same time, IBM will adopt HDP for its Hadoop distribution. When the offer goes live in July, IBM plans to help customers transition to HDP from its IBM Open Platform (IOP), which is the company's Hadoop distribution and part of IBM BigInsights.

By combining and optimizing these platforms, Hortonworks and IBM are building on several years of agreements, including an announcement from February where IBM certified HDP for its storage offerings.

Now, the two companies are bringing Hadoop and its ability to crunch numbers close to where customers are storing their data, whether that within a public or private cloud, said Daniel Hernandez, vice president of IBM Analytics. This means that customers can tackle projects ranging from self-service analytics, to data lakes, to data warehouse modernization, as well as general data science projects.

"They are going to be able to turbo-charge some of the current use cases that they use Hadoop for, such as data warehousing, and through Big SQL, take advantage of all the tools sets they are using today," Hernandez told Enterprise Cloud News.

With more enterprises tackling big data projects, Hernandez said partnerships like the one between Hortonworks and IBM also help get these types of tools into the hands of other workers within an enterprise, especially developers who are using big data to create applications or incorporating big data components into apps.

M&A activity is turning the cloud upside down. Find out what you need to know in our special report: Mergers, Acquisitions & IPOs are Rocking the Cloud.

Brian Hopkins, a analyst with Forrester, wrote about this deal between the two companies in a blog post but noted to Enterprise Cloud News that the deal make a lot of sense, especially to IBM.

"Partnering with Hortonwork is a really smart thing for IBM," Hopkins wrote in an email. "The growth is waning in Hadoop as it gains enterprise penetration, so the partnership lets IBM focus up the stack on things like data science and machine learning with cognitive. This may be the smartest thing they have done in a while and I applaud their courage."

In addition to the optimization of the different platforms, Hortonworks and IBM plan to provide resources and contributions to the Apache Atlas platform, which provides a layer of security and governance for Hadoop. As big data projects open up more data to analyze, Hernandez said it's important to know who has access to the data.

"It's especially important when it moves to production," said Hernandez.

The two companies plan to move Atlas from Incubator status to a Top Level Project. In addition, both companies are planning to partner on Apache Spark, the open source framework for analyzing data sets across clustered environments.

(Editor's Note: This article was updated to include comments from an analyst.)

Related posts:

— Scott Ferguson, Editor, Enterprise Cloud News. Follow him on Twitter @sferguson_LR.

About the Author(s)

Scott Ferguson

Managing Editor, Light Reading

Prior to joining Enterprise Cloud News, he was director of audience development for InformationWeek, where he oversaw the publications' newsletters, editorial content, email and content marketing initiatives. Before that, he served as editor-in-chief of eWEEK, overseeing both the website and the print edition of the magazine. For more than a decade, Scott has covered the IT enterprise industry with a focus on cloud computing, datacenter technologies, virtualization, IoT and microprocessors, as well as PCs and mobile. Before covering tech, he was a staff writer at the Asbury Park Press and the Herald News, both located in New Jersey. Scott has degrees in journalism and history from William Paterson University, and is based in Greater New York.

Subscribe and receive the latest news from the industry.
Join 62,000+ members. Yes it's completely free.

You May Also Like