**Bayes Theorem**

One of the challenges in analyzing Big Data is its volume—there is just so much of it. Then mix in high velocity, or fast data, and standard analytical methodologies to make sense of its breakdown, and it becomes even more cumbersome and ineffective.

Machine-learning techniques that self-adjust and improve over time are a more cost-effective approach than a traditional rules-based approach to analysis. Bayes, a machine-learning methodology, is an effective tool for classifying or categorizing data as it streams in. It is not dependent on modeling or on managing complex rule sets.

However, people often get confused when Bayes is described to them. My purpose in writing this article is to clear up confusion and boil Bayes down to very simple concepts that will help you understand it.

In Wikipedia, Bayes is defined as a way of determining “the probability of an event, based on prior knowledge of conditions that might be related to the event”. But Bayes has also been termed a “statistical method for classification.” For instance, Wikipedia defines Naïve Bayes as a “family of simple probabilistic classifiers based on applying Bayes theorem.” So, is Bayes a way of predicting the future, or is it a way to classify “data”? The answer is that it is both. You can classify the most likely outcome of a future event based on historical data from the past.

So how can we use Bayes in Computer Science to predict what will happen in the future? Or, worded another way, how can we use Bayes in Computer Science to classify what something will “most likely” be in the future?

To answer this question, you must first understand that, for a Bayes Classifier to work, it must first be “trained” with data. As we are in a Big Data world, we have lots of historical data with which to train it. And it must be trained with data for which we already know the answer. Data that is used to train a Bayes Classifier or to query a Bayes Classifier for predictions is referred to as features. The classifications that Bayes Classifiers comes up with are referred to as categories.

**Predicting the Weather**

Let’s create an example—in this case we want to predict the weather. Included in this, we want to predict for the following weather categories: rain, snow, and sunshine (based on certain atmospheric conditions) and weather features, including: temperature, air pressure, wind speed, and wind direction. To use this data to train with, we must know what happened in the past. So, we would gather the features from the past along with knowledge about the actual weather (categories) for those days in the past (**Figure 1**). This data will be used to train our Bayes classifier.

**Figure 1** shows the data that is required to train our Bayes Classifier with 12 months of data from January to December of 2016. Then, this Classifier data can be used for 2017, to predict the type of days we will have that year.

So, let’s assume that it’s now January 1, 2017. It’s 6:00 a.m. We know the temperature, air pressure, wind speed, wind direction, etc. Stream this data into our Bayes Classifier. This is like what we did when we were training it, only this time, we are not telling the Bayes Classifier the category, we are asking for it. It’s like we’re asking it, “Hey Bayes, based on how these features determined the category of day in the past, what do you think it will be like today? Will it be rainy, snowy, or will the sun shine?” To re-iterate, when we trained our Classifier, we told it what the days in the past were like, now we are asking it to predict today’s weather based on past data, so we are asking it what the weather will be like today.

The more data that is used to train the Bayes Classifier, the more accurate it will become over time. So, if we continue to train it with actual results in 2017, then what it predicts in 2018 will be more accurate. Also, when Bayes gives a prediction, it attaches a probability. So, it may answer the above question as follows, “Based on past data, I predict with 60% confidence that it will rain today.”

So, the Classifier is either in training mode or predicting mode. It is in training mode when we are teaching it—using historical data. In this case, we are feeding it the outcome (the category). It is in predicting mode when we are giving it the features, but asking it what the most likely outcome will be.

**How Does Bayes Work?**

You may be wondering how the Bayes Classifier determines predictions. Without giving too elaborate an explanation (as those can be found by Googling), Bayes comes up with its predictions by using statistical measures on real data that occurred to see how features cause events to happen. Using these statistical measures, it is able to predict with certain probabilities of confidence.

**Bayes and jKool**

jKool, also referred to as AutoPilot Insight, has incorporated Naïve Bayes Theorem into its analytics. We’ve found this to be a helpful tool for customers wishing to predict things such as sentiment analysis or if a new deployment of software is going to perform well or not, etc. You can see an example repository using Bayes by logging into jKool and viewing the sample mobile repository.

You can make use of Bayes within jKool by creating a Bayes Classifiers and specifying how to train it with learning data (hardcoded values) and/or learning queries. When this is done, newly streamed data will be classified, the classification will be updated on the SetName field, and the Bayes probability of confidence will be updated as a property.

**jKool Sample Repository demonstrating Bayes**

Let’s use the sentiment analysis in the sample mobile repository to explain this further. In this example, we wish to predict if a customer will have positive or negative sentiment with a company based on the notes taken by the Customer Service Department. To do this, we train Bayes with data about what a customer with negative sentiment is by querying for data about the customer immediately before they cancelled an account. And we also train Bayes to understand what a customer with positive sentiment is by querying for data about the customer immediately before they placed an order. Training in this example, uses a known outcome and assumes that customers that have cancelled their accounts have negative sentiments and customers that place orders have positive sentiments.

This is demonstrated in jKool’s Sample Dashboard for Bayes Classification. The Viewlet (a dynamically generated screen) in **Figure 2** demonstrates what was used to train Bayes:

- Get Event Fields tokenize(message,’ ‘) where name=’cancel.account’
- Get Event Fields tokenize(message,’ ‘) where name=’place.order’

jKool used the tokenized words of the message field. In this example, the message field represents customer service notes. So, to train Bayes to recognize a customer with positive sentiment, we tokenize customer service notes immediately before an order was placed. To train it to recognize a customer with negative sentiment and we tokenize customer service notes immediately before an account was cancelled.

It’s important to note that client-side instrumentation must be setup to stream the data into jKool in this manner (with customer service notes being retrieved and placed in the message field when accounts are cancelled or orders placed). This instrumentation is done with minimal code using jKool open-source collectors.

The Viewlet in **Figure 2** demonstrates how newly streamed data is predicted. For newly streamed data, the customer service notes are tokenized and passed through the Classifier. Based on past results and probabilities, whether the customer will have positive or negative sentiment will be predicted. **Figure 3** shows the breakdown of happy and unhappy users.

The screen in **Figure 4** is another example of Bayes. It demonstrates how a Bayes Classifier predicted that customers running iOS version 10.2 with app version 3.1 are likely to cancel their accounts. This would give the company who created the app an indication that there may be an issue (such as a bug) with version 3.1 of the app running on iOS version 10.2 .It is also clear from the chart that the problem occurs across all carriers and thus is not carrier-specific.

**Summary**

Bayes Theorem is an effective technique for classifying data in real-time and also for predicting future behavior based on historical results. By combining classification with prediction, Bayes can be valuable in understanding current customer sentiment, potential customer actions, and many other types of observational data either at rest or in motion.

*Catherine Bernardone BS, Computer Science University of Stony Brook, has been a Software Engineer for many years and currently works at Nastel Technologies. She was involved in Artificial Intelligence early on in her career and also has extensive experience building enterprise systems that utilize J2EE, database (relational and big data), mobile, and cloud technologies.
*

**Subscribe to Data Informed**** ****for the latest information and news on big data and analytics for the enterprise.**

## 5 Comments

Informative. Very well written. Machine learning is the future, applicable everywhere from robotic cars, predicting and preventing diseases.

This is a very great and informational publication I loved the infographics!

The Bayes Theorem is such an excellent, clean system of predicting numerous variables even with the real life factors. I wonder how one could use this systemic approach in following franchises’ success abroad like Papa John’s Qatar. If only I could predict other idiosyncrasies of human nature.

I found this to be a very insightful article on how to apply a sophisticated technology, machine learning to real-world business problems.

How you can calculate that??