Digital technologies have opened the door for huge opportunities in heavy industry. Advancements in connectivity, sensors, big data technologies, data science and automation are making theoretical paradigms such as Industry 4.0 and the Industrial Internet of Things a reality. Despite the promise of these technologies, however, implementation is still inconsistent.
Companies are having trouble just getting started, let alone deploying solutions that leverage the very best of modern digital and analytical technologies. Deploying data science and machine learning capabilities, in particular, can seem daunting. That’s especially the case for industrial companies that very often work with siloed, legacy and mission-critical systems. That said, concentrated investments will reward early adopters given the certainty that data science and machine learning will reshape the industrial sector as we know it.
This three-part series will cover basic terminology, considerations for adopting data science within your organization and pointers on how to tackle common deployment challenges. The goal is that you’ll walk away empowered to take better advantage of these technologies.
In this first piece, I break down what data science is, define other relevant terms and talk about some best practices for finding the right talent.
Keep It Simple
Data science can become very complicated, very fast. Fortunately, if you peel back the layers to really pinpoint what data science is trying to accomplish, it’s quite simple. In leveraging data science we are trying to:
- Collect and use data
- Learn about our world using that data
- Leverage these learnings to drive action in business settings
- Repeat to get better over time
This is the key benefit of implementing data science solutions. They get better with more data, more feedback and additional iterations. Now for the hard part: getting started.
When we think of giants of the 17th-century scientific revolution, Johannes Kepler comes to mind. Less widely known, however, is the scientist whose meticulously detailed observations made Kepler’s breakthroughs possible. Tycho Brahe, Kepler’s contemporary and colleague, collected the most detailed observational data of the stars and planets up to that time. It was Brahe’s data that Kepler used to develop his now famous model of planetary motion, a discovery that would change the world. Insights get the headlines, but data collection has to happen first.
The Terms You Need to Know
Let’s start with data because it’s foundational. Raw data is the untapped resource that will fuel all your data science solutions. You may be familiar with sayings such as “data is the new oil” or “data is the new currency.” Whatever analogy you’ve heard, it’s probably understating the importance of data in today’s world. Fortunately, recent trends have made it easier and more affordable to gather, store and serve data.
The proliferation of big data technologies make it possible to work with datasets that are too large, varied and complex for traditional data systems to handle. A complex form of data that has become increasingly important is unstructured data. This data type is not organized in a predefined manner and makes up the largest percentage of all the new data created. Unstructured information can come in the form of repair comments, audio recordings, worksite satellite images or machine vibration readings. Being able to work with this type of data is important today, but will soon be critical for many businesses.
A good place to start is by simply creating a data catalog of all the relevant data sources that (1.) you use to make decisions now and (2.) that may be potentially helpful in the future. With this catalog in place, you’ll make it easier for data science professionals to get quickly acclimated to your workflow and data sources.
You may have data coming from multiple sources that individually may be interesting, but combined and operationalized data will fuel new business discoveries. Data harmonization involves merging into one usable dataset all the disparate and siloed data sources (ERP systems, device control systems, etc.) your business cares about. Additionally, the rise of third-party data sources related to interesting data points such as weather, satellite imagery or social media can generate additional value from data harmonization projects.
As you develop your data catalog, make note of data ownership. This will vary by system, asset, device and industry. In some instances, the asset owner will own the data coming off of the machine. In other instances, it may be the manufacturer or equipment dealer. Whatever the case may be, be sure to note who owns the data stream to avoid future complications.
As a general rule, you should capture and store whatever data you can get your hands on that pertains to your vital business operations – even in its crudest forms such as logs, system records, maintenance records and inventory records. Having this data available to future data science teams or advisors can come in handy. Even in messy, clunky forms a future data scientist may be able to extract crucial data from your records or at least get a sample of what that data source looks like.
Data wrangling (sometimes referred to as data munging) is a large part of data science and involves techniques for processing, transforming and mapping data from a “raw” form to a format that makes it more useful and valuable for analytical uses. These techniques will continue to advance, so take care when deleting business datasets. You don’t know what insight can be gleaned from them in the future.
The focus at this point should be to catalog and capture datasets that are relevant to your business. As your company’s analytical capabilities mature, you’ll develop a data engineering practice that will help automate and streamline the capturing, storing, processing and serving of the data. Data engineering helps build and manage data infrastructure (also known as data architecture), and it will be key to operationalizing your data to create what gets the “headlines” – insights and action.
Finding the Right Talent
You may have noticed that I keep referring to some future data science functionality that you will either build or buy access to. That’s because most industrial companies do not have the existing in-house personnel to manage data or deploy data science solutions at a large scale. Data science skills are in high demand, and for good reason. Combining expertise in computer science, mathematics and statistics makes data science a hard skill to find. As a first step, consider taking inventory of the skills and interests of your existing staff. Willing individuals with backgrounds in engineering or operations research might quickly (and cheaply) supplement their skills with courses from popular massive open online courses (MOOCs) platforms such as Udacity, Coursera, Datacamp and Udemy.
If you look to hire data scientists, consider a couple of points. The data science profession at large has not had enough time to really build pipelines for a steady stream of data scientists. Some data scientists will graduate from data science graduate programs, but that type of professional still remains the minority. In most cases, data scientists will be experts in fields other than data science and augment that profile with a skill to become a data scientist. For example, you might find someone who has a degree in statistics, but has learned computer programming via a bootcamp to become a data scientist. Finding the right data science talent is difficult, but building a strong core team to build out data infrastructure and data science processing makes it worth it
In part two of this series, I focus on learning from data and partnering with the right internal stakeholders, as well considerations to bear in mind when choosing to work with outside vendors.
Manny Bernabe leads and develops strategic relationships for Uptake’s data science team. He works with industry partners, university advisers and business leaders to understand the opportunities for aspiring data-driven organizations.
Manny brings background from the financial services sector and, specifically, asset management. He has deep expertise in research and deployment of quantitative strategies in exchange traded funds (ETFs).
To learn more about Uptake’s data science capabilities, visit https://www.uptake.com/expertise/data-science.