Is your business an “as-it-happens” business or an “after-it-happens” business? If you are only using big data to build models for optimizing processes every so often, then yours is an “after-it-happens” organization. You are responding to data after the fact.
Data processing is often a bottleneck in organizations. You might already be using Hadoop, but likely in a completely separate environment from your operational databases and customer-facing applications. These days, it’s no longer good enough to simply run a few queries to gain insight for the next business decision. To get real value out of today’s big and fast data, it’s critical that you merge these business analytics and your production and operational environments. By taking a data-centric approach to your infrastructure, you’ll be able to provide flexible, real-time data access and automate data-to-action for immediate business benefits.
A Peek Behind the Data-to-Action Curtain
Here are three successful companies that have made the leap to an “as-it-happens” business. They not only cut IT and storage costs, but also created long-term platforms for innovation.
The Rubicon Project is a digital advertising infrastructure company that has developed software for automating the selling/purchasing of online ads. The company created pioneering technology that created a new model for the advertising industry – similar to what NASDAQ did for stock trading. Each day around the world, the company performs more than 125 billion real-time ad auctions. Its automated advertising platform has surpassed Google in U.S. audience reach, as it touches 96 percent of Internet users in the United States. By choosing a fault-tolerant, mission-critical Hadoop platform, the company is able to run Hadoop along with the rest of its enterprise infrastructure in a “lights-out” data center.
Urban Airship is a company that enables brands to build relationships with their constantly connected customers through Urban Airship’s mobile push messaging service. As part of its service, Urban Airship sends more than 180 billion text messages every month. The company relies on an enterprise-grade Hadoop platform to deliver actionable information so that brands can strengthen their relationships with their always-connected customers.
Machine Zone, one of the world’s most innovative mobile gaming companies and the creators of Game of War, initially had its operations isolated from its analytics environment. Thus, the company’s ability to deliver actionable data was limited. The company’s game servers delivered data to its analytics clusters, which then caused synchronization problems, data corruption, and even data loss. In response, the company re-engineered its platform to deliver operations and analytics on the same platform. By synchronizing the analytics with the game platform, the data appeared in real time. Now, Machine Zone is able to now support more than 40 million users, with more than 300,000 events per second.
By unifying their operational and analytics environments together, these companies are able to benefit from the best of both worlds – they are able to reduce costs and latency while providing a world-class, as-it-happens customer experience.
The real time, data-centric enterprise is interested in more than data-to-insights. It’s really driven by the data-to-action cycle that touches customer interactions or business operations. While insights are nice to have, it’s the ability to take action on the information that makes the real difference in an as-it-happens business.
To pull this off, you need to make sure that your big data and fast data can work together harmoniously. Is there a new architectural approach that can bring these two together to ensure success? The answer is yes.
With this new architectural approach, you need to consider all of your hardware requirements: not just what you have set aside for Hadoop, but everything. You need to change your way of thinking by moving away from the old approach of static partitioning of hardware and toward a much more dynamic approach that enables you to expand and contract your resources on demand. For an example of such an approach, consider how Apache Myriad brings together Apache Mesos and Apache YARN for cross-data-center resource optimization.
In addition, you’ll need to use a distributed file system that can deliver all of your real-time storage and real-time processing needs for business continuity. Finally, you need to ensure that all of the distributed applications that drive your businesses will work seamlessly with these technologies.
Data Agility Key to Success
To have both your operations and analytics running smoothly on the same platform, you need to think about how you can move data in and how to manage and process it at scale. It’s imperative that you remove steps in legacy processes to shorten the data-to-action cycle. You don’t want to waste time creating and maintaining schemas where they are not required, or duplicating data. These laborious processes slow down the data-to-action cycle. You’ll need low latency, scalability, and integration with ubiquitous technologies like SQL and the BI tools that you are already using.
New technologies such as Apache Drill can help. This schema-free SQL query engine supports self-service data exploration without needing to pre-defined schema. Drill is ANSI SQL 2003 compliant and plugs right into all of your BI tools in an industry-standard way. With Drill, you simply query your data in place; there is no need to perform ETL or to move your data.
Keep in mind that an as-it-happens business is as much about streamlining your business processes as it is about the technology that’s running the business. That doesn’t mean that you need to replace all the tools you are currently using, it just means that you need rethink how you use your tools and where you can augment them to enable your as-it-happens business.
Steve Wooledge is vice president of product marketing for MapR Technologies. Steve brings over 12 years of experience in product marketing and business development to MapR. Steve was previously Vice President of Marketing for Teradata Unified Data Architecture, where he drove big data strategy and market awareness across the product line, including Apache Hadoop. Steve also held various roles in product and corporate marketing at Aster Data – an innovator in big data analytics – prior to being acquired by Teradata. Earlier in his career, Steve held product marketing positions at Interwoven and Business Objects, as well as had sales and engineering roles at Business Objects, Dow Chemical and Occidental Petroleum.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.