Rather than just amassing data, organizations are finding ways to process, leverage, analyze and mine this vast resource and turn it into actionable intelligence. In examining this trend, attention has focused on big data, the process of collecting and analyzing a large portfolio of diverse structured and unstructured datasets.
Yet, in doing this, businesses must not forget small data. Small data comprises small bits or units of data that are not part of centrally-managed data stores and that invariably come into an organization via such diverse means as email attachments, social media, or within spreadsheets and reports. For many businesses, managing this small data for business advantage is often a greater challenge than leveraging huge volumes of big data.
Because it is not stored or managed centrally, small data often becomes difficult to update and can represent a drain on efficiency. Yet, it is important that it is not ignored. Small data is inherently important, with contents that are often key to corporate decision-making. Data from spreadsheets, the occasional flat file, price lists attached to emails and catalog data from vendors are all examples of small data. All have value to business processes – and yet because of the way they are kept or managed within organizations, all could easily be overlooked by the overall data management strategy.
Business users often exacerbate the problem. They are often unable to access the data they need, when they need it, and as a result they tend to create or maintain parallel systems – in the most extreme cases becoming a shadow IT department. The multiplication of these systems is one of the sources of small data – but not the only one.
The proliferation of small data poses a risk for companies, since undefined and uncoupled data is floating from spreadsheet to spreadsheet. This can result in false inferences being drawn because users of the data are likely to have an incomplete picture of it and will also be looking at it out of context. Consequently, incorrect decisions could be made, which cost time and money to resolve. Companies can waste significant sums trying to pull together the information contained within small data and the process typically won’t stand up to scrutiny from an audit or for regulatory compliance. Pursuing a unified data governance approach called total data management (TDM), can help companies achieve better insights and spur positive results, from improving revenues, to complying with industry regulations and cutting costs.
A Unified Approach to Data Governance
So how can organizations start to address these problems? How can they integrate small data into their overall data management process, achieve a holistic picture of all of their data and drive competitive advantage as a direct result?
The answer is through a total data management strategy. TDM is about providing a unified approach for all of an organization’s data no matter if that data resides in text files, enterprise databases, SaaS applications, email attachments, spreadsheets or big data clusters like Hadoop. The approach also addresses an organization’s data management needs, including big data integration, data quality and data governance, master data management (MDM), business process management and application integration.
TDM is distinct from MDM in that it provides a unified platform and architecture for all types of organizations across the enterprise. MDM focuses on providing the technology to create a single unified view of information across the organization and to manage that master view over time. Master data management thus becomes a subset of TDM.
Take Advantage of New Data Management Tools
So how can a TDM strategy work in practice to address the issue of small data? In rolling out such a strategy, organizations should first ensure they capture semi-structured and unstructured “small data” from dispersed locations across the enterprise. They also need to process the data in a way that yields business intelligence.
In the past, systems have been unable to cope with the rapid accumulation of small data within enterprises. Because systems have evolved to the point that they can manage this data, and perform analyses of combined datasets, businesses need to incorporate these datasets as part of their overarching data strategies.
In particular, open source data management technologies, such as Hadoop and NoSQL databases, are now coming on stream, enabling organizations to integrate large volumes of small data, much of it lying outside the ambit of the enterprise IT infrastructure, and incorporate it within an overall TDM strategy. These kinds of technologies can also be used to drive data quality, improve the completeness, accuracy and integrity of data, as well as remove duplicates.
Tips for Getting Started
Practical tips for making the move to a TDM approach include:
Create a sandbox. In evolving to this new approach, IT departments may well wish to first test the water by setting up sandboxes where users can experiment with data without potentially disrupting operations. The department should offer tools for technically-savvy business users to utilize as part of the process.
Launch pilot projects. To ensure it is comfortable with the new approach, the organization may then wish to adopt it on one specific, live project and build out from there, tracking results throughout the process. Once happy with results achieved, a comprehensive TDM approach can be put in place to achieve its complete data governance goals.
Focus on all the data not just some of it. When implemented well, TDM can incorporate all data within an organization, including small data. Businesses can profile a large volume of highly diverse and widely-distributed unstructured small datasets alongside corporate big data and then integrate them into single enterprise file stores. They can choose to aggregate data from these sources, or extract only the vital parts; apply data quality standards when necessary, use the data as part of a master data management (MDM) initiative; or analyze it to derive business intelligence.
Don’t forget training. Education and training also needs to be part of the small data reconciliation process. Organizations have to know the dangers of managing data outside the corporate data management system – and encourage data being used within the proper data management environment.
Ensure best practices are followed. Business users should be encouraged to stop storing spreadsheets and other business documents outside of the main enterprise data management process. With the TDM approach in place, there is no need for organizations to do this with their small data as tools can be applied to enable businesses, including small- and medium-sized businesses, to efficiently store and retrieve this small data, process it as part of a consolidated data store, including all organizational data, and then analyze that data.
With the right data management approach companies have both the potential and the opportunity to use a single working environment for managing all data, from big to small, and reap the rewards in improved efficiency and enhanced competitive edge.
Yves de Montcheuil is vice president of marketing at Talend, a company that provides data, application and business process integration solutions. He has 20 years of experience in software product management, product marketing and corporate marketing. Follow him on Twitter: @ydemontcheuil.