How to Turn Dark Data from a Problem into an Advantage

by   |   August 23, 2016 5:30 am   |   1 Comments

Kevin Petrie, Senior Director and Technology Evangelist, Attunity

Kevin Petrie, Senior Director and Technology Evangelist, Attunity

When it comes to analytics, enterprises today have a surplus of data but a shortage of insights.

Why the data surplus? Data warehouses, data lakes, and other repositories are brimming as volume, variety, and velocity continue to grow. Even as the tide continues to rise, enterprises are tapping into new data sources, such as social media and Internet of Things sensors, in order to gain new analytics opportunities.

This explosion of information has made it difficult for some IT teams to generate the necessary insights thanks to bottlenecks like cumbersome manual coding for Extract, Transform, and Load (ETL) processes, a lack of understanding about how data is being used and, more importantly, how it can be used. The result: dark data.

Dark data is collected and stored as part of typical business activities, but it’s not used for anything other than compliance and retention purposes. Forrester estimates that the average enterprise analyzes just 37 percent of its structured data and 22 percent of its semi-structured and unstructured data.

And that dark data matters for two reasons. First, this data costs money to capture and manage, and it often necessitates capacity upgrades for premium data warehouses. Second, dark data can hold latent analytics insights that enterprises are failing to realize.

Today’s enterprises seek both to analyze more of their data and to reduce the costs related to their unanalyzed dark data. Here’s how enterprises can begin to achieve both goals.

Related Stories

Bring Your Dark Data into the Light.
Read the story »

Dark Data Compliance: Fuhgettaboutit.
Read the story »

Don’t Be Spooked by Dark Data.
Read the story »

Help Employees to ‘Upskill’ With Access to Information.
Read the story »

  • Analyze more data. The value of a data point often boils down to its correlation with other data points. Decision makers can better understand their financial standing, for example, by reviewing not just revenue by country, but revenue by customer and by sales rep within those countries, along with product mix and overall averages for each. This type of structured data typically resides in data warehouses. The data points are easily correlated for insights. Other data, such as unstructured and semi-structured data, might sit dark because it is not as easy to correlate and analyze. For example, records about customer-service interactions might be dark. But if the BI or data science team applies new semantic analysis to those records and correlates the records with external social media trends in a flexible platform such as Hadoop, they can extract new insights. This means, for example, they can make better decisions about customer-service policies and upselling opportunities. Efforts like this can bring dark data into the light.


  • Reconsider data storage architectures with an eye toward cost savings. Not all data holds immediate value. Old customer records or operational reports often grow dusty but still consume space in premium data warehouses in order to satisfy regulatory retention requirements. These files can reside far more cost-effectively in Hadoop or the cloud.


We find that enterprises can best reduce the amount and cost of dark data by adopting three basic best practices.

  • Automate. IT organizations can lose valuable time and energy to manual, error-prone ETL processes. Replacing this drudgery with intuitive, automated software enables IT to deliver more analytics-ready data to the business faster. Zurich Insurance, for example, has used data warehouse automation solutions to reduce ETL coding time from 45 days to two, and to accelerate EDW updates from twice annually to a monthly pace. As a result, the company has freed up resources for analytics and has lit up more of its dark data.


  • Try new technologies and platforms. Apache Spark and Apache Kafka are just two emerging methods for analyzing and acting upon data streams in real time. Kafka, for example, can stream real-time transaction updates from customer databases to big data platforms such as Hadoop, where those transactions can be correlated with individual smartphones and physical store sensors to make location-based retail offers to repeat customers. Without the Kafka real-time feed, that transaction update might have become dark data. Instead, it creates a cross-selling opportunity.


  • Track data usage. Enterprises across industries can realize significant savings by identifying unused tables and databases in their data warehouses and rebalancing them to economical platforms such as Hadoop or the cloud. This frees up premium data warehouse resources, improves query performance, and postpones costly hardware upgrades.


IT organizations understand that dark data is unused data, and that unused data assets are liabilities. By extracting new value from once-dark data and reducing management costs associated with dark data, they can improve the economics of their analytics initiatives.

Kevin Petrie is a technology marketing leader at Attunity, with 20 years’ of experience in high tech, including marketing, big data services, strategy, and journalism. He is a frequent speaker and blogger, and has published recent articles in various technology publications. He is a bookworm, outdoor fitness nut, husband, and father of three boys.

Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.

Improving access to data across your company/partner ecosystem

Tags: , , , , , , , , , , , , ,

One Comment

  1. Oladipo
    Posted August 24, 2016 at 2:37 am | Permalink

    I believe strongly that the unravelling of the dark data kept by most enterprises today is highly necessary, as BI tools and big data analytics have not been put to good use. I believe that, to a very large extent, the laborious manual coding used in the Extact, Transform, and Load of data warehouses has been eased by some application of new technology to keep track of the online transactional processes in order to ensure the EDW stays effectively refreshed and updated with new data inputs. I don’t know much about Apache Spark and Apache kafka. Wouldn’t mind having much more insights concerning its usage. Thanks for this great article @kevin petrie.

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>