It’s well known that most big data projects fail. There are many reasons for this, and it turns out that very few are because of the particulars of unproven technology. In fact, often new projects around well-developed technologies fail spectacularly (ex – Obamacare website). Though most post-mortem surveys are quick to blame the scope, changing business requirements, even lack of project management, in our experience, projects typically fail because teams:
– don’t pay attention to changes required in operating process,
– don’t recognize the lack of operating skills in support staff,
– don’t consider and have poor operational integration, and
– don’t plan for sufficient operational oversight.
In short, it’s really missing operational planning that most often leads to big data project failure. In order to understand how to save your project, you need to understand how each of these operational hurdles impacts your organization and how to overcome them.
In my experience leading projects for hundreds of global enterprises, the single most common reason that big data projects fail is because no one is accountable for identifying and implementing the necessary changes to process. Consider as an example what seems to be a fairly basic process of preparing a forecast on a traditional data warehouse like Teradata, an analytic database like Vertica, and a big data system like Hadoop. From an underlying technology perspective, the data warehouse and the analytic database are most similar since both follow traditional relational database structures. Yet from a process perspective, moving a forecasting routine from a data warehouse to a big data system is more like lye to succeed.
The reason for this is that the process for data integration, managing system resources and running analytics is very similar for centralized data management systems like a warehouse or a big data system. IT teams have processes and policies for managing data lakes, allocating resources based on user requests, for chargeback of shared resources, and for monitoring signs of possible system issues. In contrast, processes for provisioning, sizing and monitoring an analytic database are drastically different. The isolation of data, the control over resources, and the chargeback mechanisms don’t map to existing warehouse infrastructure. Additionally the system performance and failure modes are completely different. While it seems like a minor technical distinction between a data warehouse and an analytic database, from an IT operating process, it means the difference between a successful project and likely failure.
Whether you hire experienced staff to develop new process or outsource it, your existing staff likely doesn’t have the skills to run these new processes. Projects that require many new technologies are particularly challenging because of the expanse of skills required to run these systems day to day. Consider again moving from a data warehouse to a big data system vs an analytic database. Big data systems likely require few process changes, and as long as the workflow is similar. IT staff trained in operating a centralized warehouse can readily learn any process changes for a big data system during pre-production.
Meanwhile, a cursory examination of analytic databases uncovers completely different systems and your team will need to learn new skills to operate these systems. Not only are the operating consoles different, but the underlying APIs, the provisioning units, the perceived performance and the problem remediation tools work completely differently from the tools for operating a centralized warehouse. While the simple solution is to hire new staff that has the requisite skills, this is often challenging due to lack of talent, especially for new technologies. Alternatively, retraining existing staff is more cost effective and reduces the risk that IT staff running this new project not fitting in with the rest of the staff. The best time to start training stuff is when kicking off a project and staff should have time to practice their skills in pre-production, not when the project is about to go live.
Most successful projects are developed and deployed completely standalone from the rest of the company infrastructure. These skunkworks projects are largely successful because they start from a blank slate, where the team needs to build new processes and learn the skills to operate all of the new systems. This is similar to how a startup company functions, where the team has to use their experience and bring in outside expertise in order to solve problems as they come up. The other important aspects of skunkworks that makes it more likely that they success is that they don’t need to be integrated with the existing operations infrastructure. Yet even successful skunkworks projects will likely be orphaned if they are never integrated with rest of the infrastructure and are the first to get cut when it comes time to tighten budgets.
You can think of an existing big data operation as a complex machine, built incrementally over years or decades, with tightly coupled gears that run only because they were built to operate in an interconnected fashion over the years. This is a machine that evolved organically, with no clear blueprint and is impossible to recreate in a lab. Now imagine inserting a new widget, while the machine is running. It becomes clear why most projects are crushed or simply rejected when it comes time to integrate. The solution, much like how a successful startup works with much larger customers, is to develop a clutch. New systems need to both develop a cadence of their own, as is the case for skunkworks projects, and then gradually spin up (or down) to match the cadence of the rest of the IT machinery. The best recipe for long term big data project success is to gradually integrate new systems.
The few big data projects that develop new processes, retrain their staff, and successfully integrate with existing big data operations are still rarely seen as true successes because of inconsistent oversight. Much like a tree falling in the woods, a new project that no one uses or a project whose impact is not recognized may run successfully but is never accepted and acknowledged as critical to the business. Most projects have poor operational oversight because big data projects are considered high risk.
Most project leaders don’t want to set expectations too high, or have too much visibility into their projects, because they believe the projects are likely to fail. As a result, new projects don’t get the same performance measures required to show company leadership the positive impact of their success. Even if leaders are told that a project is in production, they don’t know why that project is necessary to the business. As with lack of integration, when it comes time for budget cuts, new projects are the first to go, not because they’re not impactful, but because it’s not clear to that decision makers what impact these projects have. Just as with operational processes, skills, and integrations, the oversight and reporting for a new project must be considered and socialized up front, even if it is the biggest personal risk that a project leader takes.
Saving Your Project
Before embarking on any new big data projects, or if you have projects underway, stop and immediately consider the operations implications of your project. Map out the systems required for your new project and ask all of these questions from the perspective of an IT operator:
– How will these new systems function in three or five years?
– What new information does the team need to know in order to run these systems?
– Where do these new systems depend on existing infrastructure?
– How will you know that these new systems are meeting business needs?
By answering these questions up front, you will have a map to how to get your project into production and will save it from the majority of projects that end in failure. The key to success is to start planning for operations before you start any new big data project and make sure you have the supporting processes, skills, integrations and oversight to get your projects into production.
Omer Trajman is CEO and co-founder of Rocana, which provides enterprises with the ability to maintain control of their modern, global-scale infrastructure by managing huge amounts of operational data and providing analysis that shows a complete picture of IT operations.
Leading teams responsible for some of today’s largest modern database management system deployments, Omer has worked with customers and partners to identify where Big Data technology solutions address business needs. Omer’s experience in Big Data includes responsibilities as Vice President of Technology Solutions at Cloudera, focusing on Cloudera’s technology strategy and communication. Prior to this role, Omer served as Vice President of Customer Solutions at Cloudera, which included responsibility for Cloudera University, Cloudera’s Architectural Services and Cloudera’s Partner Engineering team. Before joining Cloudera, Omer was responsible for the Cloud Computing, Hadoop and Virtualization initiatives at Vertica.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.