Machine Learning Sees Defrauding with the Trees

by   |   June 16, 2015 5:30 am   |   0 Comments

John Canfield, VP of Risk, WePay

John Canfield, VP of Risk, WePay

Machine learning initiatives play an important role at WePay. The kinds of problems we use machine learning for shape how we build technology to address them, and the unique challenges of the payments industry shape our approach.

Let’s look at an actual fraud problem we face – shell selling – and how we built the algorithm that we are now using to solve it.

Shell Selling

Fraud is a concern pretty much anywhere money is exchanged for a good or service. But certain types of fraud are unique to platforms – those services that act as an intermediary in a transaction for the purposes of making it easier. A traditional buyer or seller just has to worry whether the other party is a fraudster, but platforms need to be worried about both sides of the transaction. If either party is committing credit fraud, it often will be the platform that will be footing the bill for the refund when the real cardholder finds out and reverses the charges.

Shell selling is a type of fraud that is of particular concern in this situation. Basically, it’s what happens when both sides of a transaction are fraudulent — there’s a criminal with two accounts paying himself with a stolen credit card. By the time the real cardholder finds out, the fraudster has disappeared with the stolen money, and the platform is left to foot the bill for the chargeback.

Shell selling can be tough to spot because these fraudsters keep a low profile. They generally don’t have many real customers, so you can’t rely on user feedback scores the way you can with more traditional scammers. And while it’s obvious when a merchant gets a bunch of payments from different cards at the same IP in a short amount of time, the fraudsters who perpetrate this crime tend to be more sophisticated than that. They often employ all manner of techniques to hide their identity and evade detection.

Because shell selling is a common problem that’s difficult for humans to spot, we decided to build a machine learning algorithm to help us catch it.

Machine Learning Algorithms

At WePay, we build our entire machine learning pipeline in Python, using the popular, open-source scikit-learn machine learning package. If you haven’t used scikit-learn, I highly suggest you check it out. For things like fraud modeling, where you need to retrain constantly and deploy quickly, it offers a lot of advantages:

    • Scikit-learn uses a uniform API for model fitting and prediction across different machine learning algorithms, making it really efficient to reuse code from algorithm to algorithm.


    • Scoring web services can be directly hosted with Python using Django or Flask, making deployment much simpler. One need only install scikit-learn and copy the exported model file and necessary data processing pipeline code to the web service instance to start.


  • The entire model development and deployment cycle is self-contained in Python. This gives us an advantage over other popular languages for machine learning like R or SAS, which require converting the model to another language before it runs in production. In addition to simplifying development by eliminating unnecessary steps, this gives us more flexibility to try different algorithms, because many don’t handle this conversion process especially well and thus would be more trouble than they are worth in another environment.


Random Forest

Getting back to shell selling, we tested several algorithms before we settled on the one that gave us the best performance: Random Forest.

Related Stories

Using Evolutionary Biology to Inform Machine Learning Algorithms.
Read the story »

Analytics Lessons from Spy Work: Machine Learning Applied to Unstructured Data.
Read the story »

Cyber Security Skill Shortage: A Case for Machine Learning.
Read the story »

Why More Data and Simple Algorithms Beat Complex Analytics Models.
Read the story »

Random Forest is a tree-based ensemble method developed by Leo Breiman and Adele Cutler, and first put forward by Breiman in a peer-reviewed article in the journal Machine Learning in 2001. Random Forest trains many decision trees on random subsets of the training data, and then uses mean prediction from individual trees as the final prediction. The random subsets are sampled from original training data by sampling with replacement (bootstrapping) on the record level, and random subsampling on the feature level.

Random Forest gave us the best precision at fixed recall of the algorithms we tried, followed closely by neural networks and another ensemble method, AdaBoost. Compared to other algorithms, Random Forest had a number of advantages for the kinds of fraud data we work with, which are ultimately why it won out:

    • Tree-based ensemble methods can handle well both non-linearity and non-monotonicity, which are quite common in fraud signals. By comparison, neural networks handle non-linearity well but get tripped up by non-monotonicity, and logistic regression cannot handle either. For the latter two methods to deal with non-linearity and/or non-monotonicity, extensive and proper feature transformations are required.


    • Random Forest requires minimum feature preparation and transformation and does not require standardization of input variables the way neural networks and logistic regression do. Also, it doesn’t require binning and risk-rating conversion for non-monotonic variables.


    • Random Forest gives the best out-of-the-box performance compared with other algorithms. Another tree-based method, Gradient Boosted Trees, can achieve comparable performance but requires more parameter tuning.
    • Random Forest outputs feature importance as a by-product of model training, which is very useful for feature selection.


  • Random Forest has better tolerance for overfitting compared with other algorithms, and it can handle a large number of variables without seeing much overfitting because overfitting can be reduced with more trees. Variable selection and reduction are not as critical as they are for other algorithms.

Here’s how Random Forest performed versus the competition:

Click to enlarge

Click to enlarge


Training the Algorithm

Our machine-learning pipeline follows a standard procedure, which includes data extraction, data cleaning, feature derivation, feature engineering and transformation, feature selection, model training, and model performance evaluation:

Click to enlarge

Click to enlarge



After an extensive period of training, our Random Forest algorithm for shell-selling identification is now live and actively stopping fraud. It was a lot of work to select, train, and deploy this algorithm, but it has made our risk processes even more robust and capable of catching more fraud with less manual review. At the same fraud recall rate, the model precision is about 2 to 3 times higher than constantly tuned and optimized rules.

In addition to the obvious benefits of having the algorithm live, we also learned a lot about our data and our approach in the process:

    • Through a feature-selection process, we found that the most predictive features for this kind of fraud are velocity-type variables. These include things like transaction volume by user, device, true IP, and credit cards. We also found that account-linking features by device ID, bank accounts, and credit cards are quite useful, like multiple accounts logged on to one device, and multiple withdrawals to one bank account.


    • Risk ratings of some categorical variables, such as email domains, application ID, user country, and hours of the day also proved highly predictive.


    • Digital footprints such as browser language, OS fonts, screen resolution, user agent, and flash version were somewhat useful in fighting fraud. Somewhat more predictive was the presence of practices that people use to hide their digital footprints, like VPN tunneling or the use of virtual machines and TOR.


    • We also found that model performance deteriorates quickly. This wasn’t really a surprise –  fraudsters change their methods constantly to avoid detection, so even the best model eventually will become outdated if it doesn’t change as well. But we were surprised at how quickly this happens. For shell selling, precision drops by half in just the first month after the model has been trained. Hence, refreshing model frequently to maintain the high detection precision is crucial to the success of fraud detection.


    • Unfortunately, frequent refreshes present their own problems. Refreshing the model as frequently as possible is ideal, but one must be careful when using the most recent transaction data for training models. Fraud labels can take as long as a month to mature, so using data that’s too recent may actually contaminate the model. Despite what we initially assumed, online learning with the most current data doesn’t always deliver the best results.


  • Random Forest is a superior machine learning algorithm for producing high-performance models. However, it is mostly used as a black box method. This is an issue because we are not trying to cut humans out of the process entirely, and probably couldn’t even if we wanted to. Human analysts always want reason codes that tell them why things were flagged to guide their case review. But Random Forest by itself can’t readily provide reason codes. Interpreting data from the model is difficult and may involve digging into the structure of the “forest,” which can significantly increase the scoring time. To combat this problem, WePay’s data science team actually had to invent a new proprietary method for generating reason codes from Random Forest algorithms, which we have filed a provisional patent on.


At WePay, risk management technology is the core of what we do. It’s the thing that lets us bear the considerable fraud risk we shoulder on behalf of the more than 1,200 platforms that use us to settle funds between their users.

Risk management isn’t just about technology. It’s about a seamless partnership between humans and technology. It’s still largely humans who have to think of the ways that fraudsters can attack a payment system and write rules to block them, and it’s still an experienced professional who has to make the judgment call whether to block a transaction when it falls in the gray area between “obvious fraud” and “obviously legitimate,” as it so often does.

And that’s why we’re so excited about machine learning and artificial intelligence. That might seem a bit weird if you have been raised on a steady diet of movies in which the robot overlords enslave the human race. But it makes perfect sense when you think of what these technologies let you do.

This isn’t Terminator II. When WePay thinks about machine learning, we are not trying to replace human beings, we just want to make the machines they work with better. We want machine intelligence to be smarter so we can focus human intelligence on the hard problems, where it makes the most difference.

John Canfield is the VP of Risk for WePay. WePay provides a payment API specifically designed for companies that want to enable many small users to accept credit cards on their platform without taking on the fraud risk and operational burdens associated with payments. WePay powers some of the top platforms, including GoFundMe, StayClassy, CustomMade, Honeyfund and hundreds more. Prior to WePay, John was Sr. Director of Risk at eBay.

Subscribe to Data Informed
for the latest information and news on big data and analytics for the enterprise.

Improving access to data across your company/partner ecosystem

Tags: , , , , , ,

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>