Simplify Data Cleansing for More Accurate, Timely Analytics

by   |   July 27, 2015 3:08 am   |   0 Comments

Rob Carlson, co-founder and CEO, UNIFI Software

Rob Carlson, co-founder and CEO, UNIFI Software

Data analysis has become an essential part of every business. And, like many office automation tools, data analytics tools have evolved from chalk marks on a blackboard to rich, dynamic, on-screen displays presenting the complete picture of a business represented by pie charts, scatter graphs, and three-dimensional bar graphs.

Not only has the form of data visualization changed dramatically over the past decade, but the sources of the data that are being displayed have grown dramatically at the same time. It seems that almost every day a new social media site or consumer behavior is front and center for analysis, and shifting demographics make the data analyst’s job even more complex. For example, gone are the teens and tweens from Facebook, regarding it, now that their parents and – worse – grandparents are on it, as an awkward family Sunday dinner. They have moved on to more instantaneous and short-lived social media sites, like Yik Yak and Snapchat. The nature of these services allow for more controlled, more private interactions with fewer people in your inner circle. No moms or grandpas invited.

More Analysis, More Accurate Results

The availability of first- and third-party data allows the analyst to understand the habits of customers and prospects at an intimate level. At Disney/ABC, for example, the data team collects more than 1 billion pieces of data every day. These data points represent the viewing habits of its consumers based on dozens of sources, from device type and content stream to geolocation and time of day. With this data, Disney/ABC is able to profile the viewing habits of individual users of its service and push relevant and timely content to a consumer, dramatically increasing the likelihood of engagement with sponsors’ messages.

Some Data Do Not Play Nice

Related Stories

5 Steps to Spring Clean Your Data.
Read the story »

Data and Information Management: A Journey, Not a Destination.
Read the story »

Optimize Your Big Data to Create Opportunities for Advanced Analytics.
Read the story »

How to Address Common Big Data Pain Points.
Read the story »

Due to the nature of the services that are generating data, some of the most potentially valuable data, such as that from social media networks or from news feeds like Twitter, arrive at the analyst’s desktop in a totally unstructured way. This poses a problem for the analyst, as this data cannot be viewed or combined with existing data services for analysis until the structure of the data has been normalized.

Cleansing and normalizing unstructured data is a highly technical and time-consuming task. Before business analysts can dedicate themselves to the task of discovering insights from the data, they must have the data presenting to their visualization tools in such a way that it can be represented. This requires the data to be normalized to a tabular form – that is, defined in terms of rows and columns.

The problem is compounded when tables of data generated from different sources need to be combined in order to derive valuable insight. For example, to understand the influence that social media trends are having on online sales, the analyst must combine CRM data with website click stream data and social media trend data. This sounds easy, but is actually quite complex.

Typically, data analysts will work with a developer in the IT department to identify the data sources that are available for the specific research they require. The programmer then collects the data sources and writes software to cleanse and normalize the data. The challenge in this process is that the technologist does not fully understand the business objectives or hypotheses of the analyst. In turn, the analyst may not appreciate the technical limitations of cleansing and joining data sources together. This operational disconnect can lead to a time-consuming and frustrating process, as each side tries to refine its requests and deliverable. The time that is required to actually deliver the insight can have a negative impact on the bottom line of the business, and opportunities to proactively react to consumer buying habits or other customer value may be lost entirely.

Improved Access to Data

For data analysis tools to achieve the same widespread adoption in the workplace that word processing and spreadsheet applications currently enjoy, the task of acquiring, cleansing, and normalizing data so that it can be viewed by analysis tools needs to improve dramatically.

Tools are emerging that remove the technical programming phase of data integration and free business analysts to explore available data sources and immediately combine sources together so they are visualized quickly. These tools deliver a user-friendly interface designed for the business user and programmatically cleanse and combine data sources seamlessly “under the covers” so the analyst is completely separated from and oblivious to the complexity of the task that is required to present normalized data to the data visualization tool.

The simplification of this mundane but essential element of any data analysis frees the analyst to pursue “what if” scenarios with the data, hypothesize about their business, and employ predictive analytics in a simplified way to gain business insights about their customers at the speed of thought.

Rob Carlson is co-founder and CEO of UNIFI Software. Rob has served the enterprise technology and Big Data market for over 20 years. In that time he has held senior leadership roles at Business Objects/SAP, EMC, PeopleSoft. Prior to co-founding UNIFI Software, Rob helped launch and grow early venture-backed Big Data companies Platfora and Alpine Data Labs.

Subscribe to Data Informed
for the latest information and news on big data and analytics for the enterprise.

Tags: , , , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>