Return Path analyzes emails for its clients, mostly email marketers and corporate security professionals. It’s a data intensive task because its clients send millions of emails a day.
The company was founded on the back end of the dot-com boom, and originally helped people keep their connections when they changed their email address. In 2005, the startup shifted to its present focus and soon their customer data was growing so fast that Return Path’s databases couldn’t keep up. The company needed a consistent and predictable way to scale out the email data it captured for its clients, and using only a traditional relational database was not working for them.
Every time his business signed on a new client, according to Return Path CTO Andy Sautins, Return Path’s storage needs would go up drastically, making the database performance drop off.
“It was definitely a challenge to keep up with getting new data,” he said. “We could run great and then, all of a sudden, we’d triple our data and then performance would go bad for a while.”
The drop off in performance wasn’t proportional to the size of the new data either, he said, so it was hard to tell how new data would affect the system. He needed something more predictable.
In 2008, the company began experimenting with Hadoop, making them a very early adopter. This gives Sautins perspective on how far Hadoop has come as an enterprise class technology.
Sautins said that in the last eight months that he has made as many service calls for his MapR Hadoop production deployment as he has for his Oracle database, despite Oracle’s technology being much better established.
“I’d put [MapR] on par with what we’re seeing from our Oracle database from a stability standpoint,” he said. “I feel like we’re starting to climb out of the Wild West days of Hadoop.”
Return Path had to find its own way to MapR, Sautins said.
In 2008, he wanted to upgrade Return Path’s Oracle relational database, but the price was prohibitive. “It just got to be a very expensive solution,” says Sautins. Return Path started looking for an alternative solution.
Return Path’s engineers tried writing their own custom software. “We still had some of the same challenges, if we got a lot of data, we’d have to scramble to figure out how to make it work,” Sautins said. “But, it worked a little better.” The downside was that Return Path became a software development shop and that sapped its focus from its clients.
Once Sautins and his team decided to try Hadoop, they first started using a version straight from Apache and then later Cloudera’s distribution.
“What we found when we moved to Hadoop and why we really like it is that it scales much more predictably,” he said. “I can go to anybody in the business and say, ‘Yes, I can sign that deal,’ because I know roughly what it’s going to take to support.”
Sautins said his company had an unusually large number of files for his early version of Hadoop. The way Return Path processed data he started to run into problems with the open source software.
When he made service requests, the response he got was that Return Path wasn’t the common use case for the Cloudera system.
“I get we weren’t the common use case,” said Sautins, “but I still had to run a business off of this.”
Return Path started working with MapR in 2011 and moved to production that December.
“MapR, for us, was great because they took a little bit more the view of the enterprise client,” says Sautins.
Dave Jesperson, vice president of professional services at MapR Technology, said that MapR’s early focus was solidifying the a storage services layer in Hadoop, minimizing name node failures and other glitches, to make sure that its clients would feel secure that their data was not going to be corrupted or lost. Sautins said that emphasis was attractive to Return Path.
Return Path’s early adopter culture gave it a head start in developing its Hadoop platform to the point where the database is as stable as its Oracle platform.
“To be fair,” Sautins said, “Oracle hasn’t been a bump free ride either. I’m not saying they’re both perfect, but I get the same expected number of bumps [from MapR] as I get with Oracle now. If once a year I have something to deal with with Oracle, that’s not unexpected. If once a year I have something to deal with with MapR, that’s not unexpected.”
Mark Smith, CEO of Ventana Research, said Hadoop distributions have had to focus on things like security and data reliability in order to attract larger enterprise customers. “Things have significantly changed in the past 12 months because [distributions like MapR, Cloudera and HortonWorks] have been coming out with new releases on a monthly basis to address the concerns of companies. At the same time, the skill sets of companies that work with Hadoop have been getting better.”
Companies like Return Path are increasing their competencies along with the rapid developments of MapR and its competitors. MapR, Cloudera, and HortonWorks have begun work on the second generation of Hadoop capabilities. Return Path is particularly interested in interactive queries, using technologies like Apache Drill or Cloudera’s Impala.” “I don’t know if [Drill] is a game changer, but it’s a nice natural evolution on Hadoop,” he said.
Sautins said Return Path still approaches new technology like it’s a start-up, and is willing to take risks. , “We’re a technology shop,” Sautins said. “We’re not one of those Fortune 1000s that have IT departments that are doing this. We’re a product group. It isn’t unnatural for us to pick [something new] up and go with it.”
Jesperson says that many of MapR’s customers, especially the larger customers, are much more conservative about upgrading and trying new things. Typically, IT departments at larger companies only upgrade once or twice a year and have a formal process for testing and evaluating a new release before they go into production, he said.
Jespersen said the experimentation of early adopters like Return Path helps work out the kinks for more conservative companies to adopt Hadoop, and Hadoop distributions are focusing on making the technology more reliable and easier for business analysts to use.