The time has arrived for a Data First approach to creating information for business. Done right, this approach is all about getting more value, faster and at comparable or lower cost to implement. It veers away from some traditional thinking, but the early evidence is clear: It’s time for a change.
Instead of modeling the data first, then loading it, you can gain power and insight by loading the data first and then modeling it based on the content and meaning of the data. This process provides more time to better understand what we have and make faster, smarter decisions.
Data First ensures maximum value in the shortest possible delivery time by targeting data content – how to get at it, break it down, analyze it, mine it, scrape it, and link it with existing relational data to create new insights, new analytic views, and new business opportunities. It also assesses potential cost reductions by considering the use of big data technologies, resulting in less dependence on higher-cost software and platforms over time and allowing for a “brute-force” approach to crunching data at efficiencies of scale never possible before.
Why Data First Works
Data storage is cheaper and easier to manage on commodity hardware and open-source software, freeing capital for analytics and ideation geared toward new problem solving.
Data modeling in big data environments should focus on subject area breadth – collect all the relevant data in one place for many uses versus a select data set for a pre-defined, known use.
Make data discovery easier by allowing data to be more accessible, enabling faster turnaround, and deploying tools that are highly usable. Data should be versioned, curated, and tagged so that users can readily see the state of the data before using it. Governance should define “fit-for-purpose” acceptable-use policies to ensure that proper controls are in place. Users then can mine this data for potential analysis and new insights with total control while contributing to the overall ecosystem.
Business-driven analytics move at the speed of the data versus the speed of data modeling, Extract, transform, and load (ETL) code releases, database changes, and/or large batch jobs that run too long present day old data at best. On-demand and real-time analytics are within reach with new processing paradigms.
Data First Versus the Data Warehouse
Data warehouses are important tenets of an organization’s business operations, but they are falling short in delivering an agile, exploratory ideation facility for delivering new business capabilities.
A lot of time and money are spent to create a “slice and dice” environment that should give the business what it needs. The problem is that, in today’s environment, accounting for every question in one model is impossible. New data sources are emerging too quickly. New questions are sprouting up even faster. A highly engineered environment that takes only the data it needs up front is going to have difficulty adapting to rapidly changing requirements.
Faced with the inevitable complexity of rapidly growing data volumes and complexity, current data loading and transformation processes remain at a lag. Typical data warehouse development lifecycles are taking too long and cost too much to justify an incubation and ideation mindset. We can’t build a complex system to do only one thing and expect it to be agile.
It’s All About the Data
The “known” or traditional approaches to sourcing, loading, processing, and enriching data are shifting under the sands of big data. The key is to understand the “new” approach using big data technologies, which value speed-to-market and flexible loading paradigms – images, video, semi-structured XML files, text, and other formats all can be loaded in original form into an explorable platform such as Hadoop or the equivalent.
At the same time, it’s equally important to use and capitalize on the known or existing data warehouses and be able to integrate with the “new” approach to get the best-of-breed from both platforms.
For today’s businesses trying to extract the most value from their data, the answer is to start with the source data itself. As such an important asset, data should be available to those who need it. And that source data should be lightly integrated and standardized into easily understood subject areas for greater usability. Data First – it’s the piece that has been missing, now made possible by advances in big data technologies and approaches.
Avi Kalderon heads up NewVantage Partners’ Big Data Fast Track program, the result of three years of big data strategy and execution with more than a dozen Fortune 1000 and industry leading clients. He has extensive experience as a business leader and technology executive, most recently as senior VP of the architecture and advanced technology group for FINRA (Financial Services Regulatory Authority), the largest independent securities regulator in the United States.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.