Has Hadoop gone the way of 8-track tapes and Betamax? The technology that engendered so much excitement and optimism about the potential of big data has, at the very least, hit a speed bump as the two remaining independent providers — Cloudera and MapR — are each facing their own crisis.
MapR’s problems are existential. The company won a reprieve from imminent shut down last week, saying that it has signed a letter of intent with a potential buyer. That buyer is performing due diligence now and MapR now faces a July 3 deadline for shutting down if the deal with this buyer doesn’t go through.
Cloudera suffered a couple of disappointing quarters and announced its CEO is stepping down — news that was not well received by investors. The company blamed its slow quarters on big deals that had been delayed as it prepared to roll out its post-merger next generation data platform that incorporates multiple technologies beyond Hadoop.
What happened to Hadoop?
“Hadoop’s biggest problem is that it was built to be a giant single source of data,” Hyoun Park, founder and CEO of research firm Amalgam Insights told InformationWeek in an interview. But it’s challenging to use Hadoop across multiple data centers or multiple clouds. “The assumption with Hadoop is that you have it, and it holds everything you own. That’s a problem in today’s world where you have hundreds of apps.”
Today’s more modern set up has data coming in from hundreds of sources, according to Park, who noted that Looker and Tableau are both adept at handling that kind of data. Both companies were acquired in the last few weeks, by Google and Salesforce, respectively.
Ali Ghodsi, co-founder and CEO of platform-as-a-service company Databricks said Hadoop is not meant for the cloud, because it is not elastic in the same way the cloud is elastic. Databricks was founded as a PaaS distribution of the big data streaming technology Spark but has since evolved to also include many other big data technologies. Ghodsi said that going forward, Hadoop will be more of a niche solution.
“Hadoop is dead in the cloud for sure,” he told InformationWeek. Like mainframes, Hadoop will remain in place where it makes sense. “There are still IBM mainframes around 50 years later. But they are not something you buy and invest in these days.”
Ghodsi said that the cloud offers cheaper and more reliable storage options than are available in the Hadoop File System. He also believes that the old RedHat open source business model of offering on-premises software and selling support for it is headed towards extinction.
“The modern open source model is managed open source software in the cloud,” he said — the kind of service that is offered by Databricks. Ghodsi said the provider will operate it, make sure it is secure, and manage the complexity of it for a subscription-based price.
The model has been a successful one for Databricks so far. While the company is still venture-backed and privately held, the CEO said that Databricks just saw its biggest quarter ever, beating its internal number by 50%. Growth rates are “close to tripling year over year,” according to Ghodsi. “This quarter is insane. Demand is unprecedented.”
But what about Hadoop? Could it operate in the cloud like that? Gartner analyst Adam Ronthal said that while there are some native Hadoop options available in public clouds like AWS, they may not be the best solution for many applications.
“There’s a fair bit of complexity that goes into managing a Hadoop cluster,” he told InformationWeek. Non-Hadoop-based cloud solutions may look simpler and easier to organizations that are evaluating data and analytics solutions. But that doesn’t mean there’s not a place for Hadoop in the future.
Ronthal said that Hadoop is experiencing a “market correction” rather than an existential crisis. There are use cases that Hadoop is really good at, he said. But a few years back, Hadoop was the rock star technology that was the solution to every problem.
“The promises out there 3, 4, or 5 years ago were that Hadoop was going to change the world and redefine how we did data management,” he said. “That statement overpromised and underdelivered. What we are really seeing now is recognition of workloads that Hadoop is really good at, like the data science exploration workloads.”
Jessica Davis has spent a career covering the intersection of business and technology at titles including IDG’s Infoworld, Ziff Davis Enterprise’s eWeek and Channel Insider, and Penton Technology’s MSPmentor. She’s passionate about the practical use of business intelligence, … View Full Bio