How to Use Big Data Without Creating Bad Data

by   |   May 4, 2017 5:30 am   |   0 Comments

Deryk Anderson, Director of Technical Product Management for GE Digital

Deryk Anderson, Director of Technical Product Management for GE Digital

Across many industries today, Big Data has become a core component of business decisions. It’s no longer just a buzzword—almost 70 percent of companies report they have been using Big Data for more than a year. But data is nothing without analytics. Data analytics can provide organizations with competitive advantages, including the performance insights required to lower costs and increase accountability.

Despite the proven value of Big Data, many industrial organizations are faced with messy, siloed data sets as they transition to fully digital operations. Large manufacturing and energy organizations collect megabytes of data across hundreds of thousands of instruments each day, providing ample opportunity for data analytics, but their limited data maturity results in mismanaged data and inaccurate insights.

When data is derived from a sizable number of physical assets and human operators in a plant environment, it’s often stored and organized in different systems, with different categorization. This is particularly critical when reviewing machine performance and system health. When characterizing an asset failure, for example, the first step is recording the failure mechanism and cause, followed by the maintenance performed to address it. In Computerized Maintenance Management systems (CMMS) like SAP, a field called “breakdown indicator” is used to record whether the asset failed, and a drop-down list is then available from which to choose the failure mechanism. If an operator is manually recording the issue and does not know how to define the asset failure or even the asset category, the field may be completed incorrectly or left blank.

In one account, a reliability team at an energy facility used raw data to try to determine how often their assets failed and to prioritize the fleet. Based on the raw data, the team calculated a turbine’s average time between failures to be 16,000 months. After looking at the data more closely, the team realized that the breakdown indicator field in their CMMS was rarely populated, so it was difficult to determine whether the asset actually failed. The team investigated the data further, and after reclassifying the data to accurately reflect failure, the average time between failures turned out to be just 14 months.

In the age of Big Data, it is critical that data be characterized correctly and consistently across all management systems to ensure organizations have an accurate, holistic view of machine and system performances. When one hour of unplanned downtime can cost a company more than $100,000 in lost production, optimizing asset performance must be a priority. Asset performance management (APM) systems and machine learning technology enable organizations to automate this classification and organization process using advanced software algorithms to translate text and other descriptors into distinct categories. Even if the data describes the same failure in different ways, these new technologies ensure that the data is reclassified with the same descriptor and coding, resulting in better, more accurate results.

Beyond reclassification, data can also be useless when it’s siloed. For example, a coal-mining company that operates more than 50 trucks in approximately 200 mines experienced huge production losses due to unanticipated in-field failures. While the company had thorough asset utilization and work management data in which root causes of the failures were embedded, it lacked the tools to decipher causes from the raw data, allowing for in-field failures to continue occurring. After reexamining the data, management discovered the data had been siloed in different operating systems within the organization, limiting the ability to draw accurate reports and conclusions, and therefore predict and prevent failures. By implementing APM and inter-inspection time changes for fleets of trucks across different mines, the company was able to obtain a more holistic view of the data and take actionable steps to minimize unanticipated failures and optimize maintenance operation.

As software and computing capabilities continue to advance with APM and machine learning, systems will be able to more intelligently reorganize and restructure data from various sources to make it useful to decision makers. These technologies help ensure there is no bad data infecting smooth operations with incorrect assumptions or misleading insights. The power of Big Data analytics can be significant in driving down costs and increasing productivity, but as they say, “with great power comes great responsibility.” Companies need to be able to trust their data to succeed in the era of the industrial internet. With the appropriate tools and infrastructure in place, organizations can maximize the value of their existing data without creating bad data.


Deryk Anderson is a Director of Technical Product Management for GE Digital. He is responsible for product delivery of the Machine and Equipment Health components of GE Digital’s Asset Performance Management solution. Deryk has over thirty years’ experience in the management of industrial assets as a manager and consultant across a variety of industries including, oil and gas, mining, petrochemical, manufacturing, food processing and utilities.


Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.



Tags: , , , , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>