Service Provider Cloud

IBM, Hortonworks Team on Machine Learning, Data Science

IBM and Hortonwork are now looking to make it easier for enterprises to use Hadoop to crunch huge data sets and run other big data projects with the help of data science and machine learning.

At the Data Works and Hadoop summits in San Jose, Calif., on June 13, the two companies are announcing that they will combine Hortonworks' Data Platform (HDP) with IBM's Data Science Experience platform and Big SQL, the company's SQL data base engine for Hadoop.

IBM recently announced a version of its Data Science Experience, an integrated developer platform that provides real-time analytics to help crunch large data sets, for private cloud. The first version ran on public cloud. (See IBM Brings Data Science Experience to Private Cloud.)

Specifically, Hortonworks plans to resell IBM's Data Science Experience platform with HDP, which is the company's Hadoop distribution. Hortonwork also plans to adopt IBM's platform as one of its strategic data science platforms.

The elephant in the room.  
(Source: Hadoop)
The elephant in the room.
(Source: Hadoop)

At the same time, IBM will adopt HDP for its Hadoop distribution. When the offer goes live in July, IBM plans to help customers transition to HDP from its IBM Open Platform (IOP), which is the company's Hadoop distribution and part of IBM BigInsights.

By combining and optimizing these platforms, Hortonworks and IBM are building on several years of agreements, including an announcement from February where IBM certified HDP for its storage offerings.

Now, the two companies are bringing Hadoop and its ability to crunch numbers close to where customers are storing their data, whether that within a public or private cloud, said Daniel Hernandez, vice president of IBM Analytics. This means that customers can tackle projects ranging from self-service analytics, to data lakes, to data warehouse modernization, as well as general data science projects.

"They are going to be able to turbo-charge some of the current use cases that they use Hadoop for, such as data warehousing, and through Big SQL, take advantage of all the tools sets they are using today," Hernandez told Enterprise Cloud News.

With more enterprises tackling big data projects, Hernandez said partnerships like the one between Hortonworks and IBM also help get these types of tools into the hands of other workers within an enterprise, especially developers who are using big data to create applications or incorporating big data components into apps.

M&A activity is turning the cloud upside down. Find out what you need to know in our special report: Mergers, Acquisitions & IPOs are Rocking the Cloud.

Brian Hopkins, a analyst with Forrester, wrote about this deal between the two companies in a blog post but noted to Enterprise Cloud News that the deal make a lot of sense, especially to IBM.

"Partnering with Hortonwork is a really smart thing for IBM," Hopkins wrote in an email. "The growth is waning in Hadoop as it gains enterprise penetration, so the partnership lets IBM focus up the stack on things like data science and machine learning with cognitive. This may be the smartest thing they have done in a while and I applaud their courage."

In addition to the optimization of the different platforms, Hortonworks and IBM plan to provide resources and contributions to the Apache Atlas platform, which provides a layer of security and governance for Hadoop. As big data projects open up more data to analyze, Hernandez said it's important to know who has access to the data.

"It's especially important when it moves to production," said Hernandez.

The two companies plan to move Atlas from Incubator status to a Top Level Project. In addition, both companies are planning to partner on Apache Spark, the open source framework for analyzing data sets across clustered environments.

(Editor's Note: This article was updated to include comments from an analyst.)

Related posts:

— Scott Ferguson, Editor, Enterprise Cloud News. Follow him on Twitter @sferguson_LR.

Scott_Ferguson 6/13/2017 | 1:13:57 PM
Re: access @Ariella: I think what also makes that interesting is that they are doing a lot of that through Apache, which means open source solutions that others can build on. It make it better than a locked-in commercial solution. 
Ariella 6/13/2017 | 1:11:41 PM
access <As big data projects open up more data to analyze, Hernandez said it's important to know who has access to the data.>

Very true. It's important to track access and keep it under control in any situation that involves sensitive data.
Sign In