How to Get Real-time Insight with Machine Learning and Centralized Data

by   |   September 8, 2016 5:30 am   |   0 Comments

Padraig Stapleton, VP of Engineering, Argyle Data

Padraig Stapleton, VP of Engineering, Argyle Data

Enterprises today rely on data as the foundation of business success, whether the goal is to better understand customers, build new or better products and services, or manage cost and risk. Data is now the prime raw material for creating value; across all industries, it’s the norm to hold vast stores of data.

An issue that remains unresolved, however, is how well and how efficiently data can be applied.

Firms are still wrestling with the challenge of making big data work for them, in use cases ranging from enterprise analytics, customer 360, and product personalization to revenue assurance and fraud detection. All the data in the world has no value unless it’s accessible and actionable. Transforming modern data volumes into usable information requires new approaches to analytics.

Big data tends to held in silos – billing, credit history, customer transactions, marketing. Some organizations warehouse their data after just a few months, which makes it very difficult to access. You can’t get a 360-degree view of a customer without looking across all areas of her interactions with an organization, and you need to analyze transactions over a time frame of 13 months in order to understand typical activity over a year. Data is of far more value and utility when held centrally.

As data volumes grow at exponential rates and new types of data become available every day, users demand more and faster access. It also becomes increasingly burdensome to move all that data around for each new business question or use case. At the same time, IT must ensure SLA performance, control costs, and manage security and compliance.

Many organizations realize their existing systems alone are not sufficient to keep pace with this rate of change and turn to a new approach to complement their existing investments: an enterprise data hub. As a unified platform that can economically store unlimited data and enable diverse access to it at scale, the enterprise data hub is emerging as the architectural center of a modern data strategy.

Related Stories

Top 4 Ways a Data Lake is Different from a Data Warehouse.
Read the story »

Machine Learning and the Evolution of Twitter.
Read the story »

Yield Big Results with Data Lakes and Automation.
Read the story »

How Shutterstock Uses Machine Learning to Improve the User Experience.
Read the story »

The enterprise data hub does much to overcome the data silo issue. The evolution of Hadoop and Hadoop-based enterprise data platforms such as Cloudera and Hortonworks have been key to the emergence of the enterprise data hub, transforming the economics, scalability, and flexibility of storing and using massive amounts of data.

There still remains the thorny problem of how to access, analyze, and leverage the data to optimize business opportunity. Communications service providers (CSPs) are paving the way in this regard by using real-time, machine-learning applications to detect and prevent fraud and bolster financial results.

Fraud Points the Way

Revenue assurance and fraud prevention and detection are major focus areas for CSPs. Fraud alone is a $38 billion a year problem for them, as evidenced by the Communications Fraud Control Association’s 2015 survey. Operators worldwide are experiencing a huge surge in high-velocity attacks and sophisticated, new fraud types that are invisible to traditional detection methods.

Facebook, Google, and LinkedIn have pioneered big data and machine-learning approaches to protecting their subscribers and gaining insight into vast amounts of data. CSPs are now using these advanced big-data approaches to detect and analyze fraud. They start by combining their data silos into one vast data lake that can be tapped – using a combination of big data, Hadoop, and machine learning – to provide real-time information about anomalous behavior as it happens. When you have enough data and you have access to that data in real time, you can detect fraud in real time.

Machine Learning

Using unsupervised machine learning and data lakes rather than rules-based approaches – which can identify only known fraud types – it is possible to detect and stop old and new types of fraud, identify criminals and crime rings, and pre-empt large attacks.

In a traditional machine-learning environment, the end user cooperates with the machine-learning algorithm because there is mutual benefit. For example, an end user will get better recommendations and more personalized advertisements, while Amazon sells more books and Google gets better click-through rates. When the user is a fraudster and the input provided is fraudulent, machine learning becomes adversarial. In this process, machine learning evolves to defend quickly against both new attacks and mutations of existing attack vectors, and to contain fraud costs.

Google, Facebook, and LinkedIn are masters of adversarial machine learning. Facebook has developed what it calls an “immune system” that detects fraud at massive scale. Machine learning is being used to detect anomalous (or high usage) behavior in seconds or minutes as opposed to hours or days.

The beauty of continually analyzing data, and lots of it, across a data lake is that anomalous behavior becomes easy to identify. Through machine learning, the needles in the haystack stick out like sore thumbs. Through graph theory, accomplices also become very obvious.

Machine learning and visualizations are highly complementary. Humans often want to see visualizations that convey data for extra understanding and insight into the fraud technique or method. Anomaly detection visualizations show outliers, but lose useful context. When you combine the two, you have a very powerful detection and analysis approach.

This approach can be extended to any area using data analytics. Fraud is a natural first deployment for big data machine-learning applications, since the ROI is easily and immediately proven, providing a model for other usage scenarios. Major global communications providers already have turned to this new approach to fraud detection, based on Hadoop economics and the agility of machine learning/artificial intelligence applied to massive data lakes. Some of the earliest adopters are now beginning to extend the use of their big data machine-learning applications to customer 360 and other areas.

Using the Hadoop platform in conjunction with machine-learning applications, data can be collected, stored, processed, explored, modeled, and served in one unified and extremely economical platform. Within the enterprise data hub, the performance and value of many functions and applications, including fraud, security, Internet of Things, and revenue assurance, can be improved exponentially by machine learning. The techniques and applications being pioneered in the communications industry point the way for other industries and organizations to gain even more value from their enterprise data hubs.

Padraig Stapleton is VP of Engineering at Argyle Data. He brings years of industry-leading management and technical expertise across a number of areas, including mobile telecommunications and big data. Most recently, he was VP of Engineering and Operations for the Big Data group in AT&T, responsible for development of their big data platform. Previous to that, he was involved in a number of successful startups as VP of Engineering, building development teams and delivering innovative products to the marketplace. Padraig has held senior leadership roles in various companies, including Telephia, which was acquired by Nielsen, and InterWave Communications. Connect with Padraig on LinkedIn.

Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.

Anzo Smart Data Lake [Whitepaper]

Tags: , , , , , , , , , , , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>