Finding Hidden Insights in Big Data

by   |   December 21, 2017 4:55 am   |   1 Comments

Dave Oswill

Dave Oswill, Product Marketing Manager, MathWorks

Data is one of the most valuable assets a company has at its disposal. Valuable insights can be gained from data to drive better business decisions, and the technology that enables the collection and measurement of this detailed data is making it easier than ever to leverage these insights for the development of more intelligent products, services and manufacturing processes.

The prospect of integrating big data-fueled insights into products and workflows is enticing, and it can be straightforward if domain experts – scientists and engineers – are provided with the appropriate tools. Software analysis and modeling tools, such as MATLAB, enable domain experts to accomplish tasks previously exclusive to data scientists, including accessing and combining multiple datasets, creating predictive models and ultimately bringing previously hidden insights to their organizations’ decision makers.

Take Big Data Down to Size

Engineers and scientists need scalable tools that provide access to a wide variety of systems and formats to efficiently capture and incorporate the benefits of big data (Figure 1). This is especially important because companies often use more than one type of system or format to store and manage data. For example, sensor or image data stored in files on a shared drive may need to be combined with metadata stored in a database, or in certain instances, data of many different formats must be aggregated to develop a predictive model.

Figure 1: Access a wide range of big data. Copyright: © 1984–2017 The MathWorks, Inc.

Understand What Is in the Data

To understand the behavior of a system before developing predictive models, domain experts need scalable tools to access and explore big data before employing machine learning techniques.

Software analysis and modeling tools can simplify the process of observing, cleaning and effectively working with big data. These tools also can  help domain experts decipher which algorithms should be used across large datasets when creating a model using machine learning techniques. Before domain experts create a model or theory, it’s important to first understand what is in the dataset, as that may have a major impact on the final result.

Often software can help decipher the data and identify:

  • Slow-moving trends or infrequencies spread across the data
  • Bad or missing data that needs to be cleaned before a valid model or theory can be established
  • Data that is most relevant for a theory or model

Additionally, big data tools are adept at finding additional information that may be derived for use in later analysis and model creation.

Explore and Assess Large Datasets

There are numerous capabilities within data analysis software that allow users to easily organize large quantities of data into digestible configurations. Summary visualizations, such as binScatterPlot (Figure 2), provide a way to easily view patterns and quickly gain insights. Data cleansing removes outliers, replaces bad or missing data and is a programmatic way to cleanse data that enables new data to be automatically cleaned as it’s collected (Figure 3). Data reduction techniques, such as principal component analysis (PCA), help to find the most influential data inputs, thus reducing the number of inputs and enabling the creation of a more compact model. Data processing at scale enables engineers and scientists not only to work with large sets of data on a desktop workstation but also to use their analysis pipelines or algorithms on an enterprise-class system such as Hadoop.

Figure 2: binScatterPlot in MATLAB. Copyright: © 1984–2017 The MathWorks, Inc.


Figure 3: An example of filtering big data with MATLAB. Copyright: © 1984–2017 The MathWorks, Inc.

Create Models and Solve Real-World Problems

For companies to take advantage of the value of big data, their domain experts must be supported by software analysis and scalable tools throughout the entire process – from accessing data to developing analytical models to deploying these models in production (Figure 4). When incorporating models into products or services, organizations typically bring in enterprise application developers and system architects. This can create a challenge because developing models in traditional programming languages is difficult for many engineers and scientists.

Figure 4: Integrating models with MATLAB. Copyright: © 1984–2017 The MathWorks, Inc.

To avoid this issue, enterprise application developers should look for data analysis and modeling tools that are familiar to engineers and scientists. By leveraging certain software analysis and modeling tools, scientists and engineers can explore and design models with big data using familiar functions and syntaxes, while leveraging their models and insights when developing products, systems and operations. Simultaneously, the organization can rapidly integrate these models into its products and services by using production-ready application servers and tools with code-generation capabilities.

Engineers at Baker Hughes, a provider of services to oil and gas operators, were faced with the challenge of improving equipment maintenance to reduce costs and maximize the productivity of their oil and gas extraction trucks. If a truck at an active site experienced a pump failure, the company had to immediately replace the truck to ensure continuous operation. Sending spare trucks to each site costs the company tens of millions of dollars in lost revenue that could have been saved if the vehicles were active at another site. The inability to accurately predict when valves and pumps require regular maintenance underpins other cost issues as well. Too-frequent maintenance is wasteful and results in parts being replaced when they are still usable, while too-infrequent maintenance risks damaging pumps beyond repair. To solve this problem, Baker Hughes engineers developed a predictive maintenance system that alerts employees when equipment performance begins to degrade. They used MATLAB to collect terabytes of data from the oil and gas extraction trucks and then developed an application that strikes a balance and predicts the right time to service or replace equipment.

Leveraging Insights for Better Outcomes

Domain experts with access to scalable, efficient tools give their companies an advantage in the global marketplace by performing tasks previously exclusive to data scientists. The combination of knowledgeable domain experts capable of accessing the trove of insights hidden in data and an IT team capable of leveraging those insights into products, services and business operations creates better, more intelligent outcomes for customers.

As product marketing manager at MathWorks, Dave Oswill works with customers in developing and deploying analytics along with the wide variety of data management and business application technologies in use today.



One Comment

  1. asit
    Posted January 15, 2018 at 8:00 am | Permalink

    Nice article.Thank you for sharing this information. keep posting
    Big Dta Training

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>