For more than 40 years Axciom has provided marketing professionals with data-driven insights into consumer behavior, so a luxury goods maker can target the right households for a national product launch, or a health club chain can segment its customers to retain the most loyal exercise enthusiasts. The Little Rock, Ark.-based company excels at applying analytics to enormous databases and has built its own tools to assist its analysts move analytical models from their development systems to a production environment.
However, according to Jim Parker, an analyst with the firm, Axciom “is moving toward more interoperability with our clients.” That’s why, he said, the company is starting to use PMML, or the predictive model markup language, as an alternative to its own internally-developed tool that has been used to migrate a model from development to production. By adopting PMML it will be easier for Axciom to collaborate and potentially alter models through their lifespan.
PMML is an XML-based industry standard developed by the Data Mining Group, an industry consortium, and was conceived 15 years ago as a way to speed up the deployment of predictive analytics models from an analyst’s desktop to the data warehouse or analytics database.
According to Cindi Howson, president of BI Scorecard, an analytics consultancy in Sparta, N.J., “The vision [of PMML] is to allow predictive language created in one tool, such as SAS or SPSS, to be readily consumable in another tool, platform or interface, in the same way that we can now query any database using SQL.”
The value of the standard is significant. “By making it easier to deploy models, PMML ensures more analytic projects are successful and reduces the time to value for analytic projects. PMML’s growth has come as organizations move from batch to real-time scoring, with PMML providing the standards-based mechanism for moving predictive analytic models out of the back office and into the day-to-day, operational environment for real-time decision-making,” said James Taylor, CEO of Decision Management Solutions, consultants in decision management and predictive analytics based in Palo Alto, Calif. “Finally PMML has allowed organizations to scale their analytic efforts by allowing multiple teams with multiple analytic workbenches to create models in a standard format ready for deployment.”
No Time for Recoding
Predictive analytical models are created with mathematical and algorithmic precision because the applications that depend on them are too complex for even advanced SQL queries. Shifting the models from one realm to another, from development to production, was a laborious and expensive process.
While a few enterprises, like Axciom, built their own migration tools, historically, once a model satisfied the analyst’s goals, it was re-coded from top to bottom specifically for the production database and the data was transformed again for the new environment, taking weeks, months, even up to a year or longer to complete, depending on the model’s complexity.
In today’s modern business world, those kinds of delays are unacceptable. Predictive analytics have become an integral part of many organizations tactical operations. Whether implementing daily price changes on e-commerce sites based on live trending data or making real-time offers to cellphone consumers to reduce churn, predictive analytics increasingly give businesses competitive advantage. But the interoperability gulf between development and production often undermined the value models bring to an enterprise.
The Interoperability Argument
That’s where the PMML standard comes in, said Alex Guazelli, vice president of analytics at San Diego-based Zementis Inc., a provider of analytics tools, including the PMML Universal Plug-In used by third-party developers to output PMML-compliant models. “Without a standard, it is all custom code. There is no interoperability. Once you build a predictive solution using a statistical tool, it remains there since it cannot be understood by any other system. If you need to deploy the solution in your production environment, it will need to be completely recoded.”
That recoding process was rife with problems. For example, custom code used made it difficult or impossible to reuse, in part because custom code is notoriously poorly documented. The PMML standard is fully documented, giving a level of transparency to the model’s migration path.
Today, most of the leading data mining and database vendors, including IBM, Oracle, SAP, Microsoft, Zementis, Microstrategy, SAS, Fair Isaac, and many others, offer PMML compliant products. If you develop a model with one tool, you can output PMML code that can then be moved to a production system quickly. This interoperability saves enormous amounts of time.
Selected Projects Using PMML
Company/Project Software Supported Model Type Augustus/Open Data Group Augustus Decision trees, regression, naive Bayes IBM InfoSphere Warehouse V9.5, DB2 Data Warehouse Edition V9.1 Varies by PMML version, and includes: sequence models, naive Bayes models, logistic regression models, decision trees, neural networks, association rules KNIME KNIME 2.4 Neural networks, regression and general regression models, clustering models, decision trees, support vector machines Microsoft SQL Server Decision trees, clustering models Microstrategy Microstrategy Data Mining Services 8.0 and above Regression models, decision trees, mining models, clustering models, neural networks, general regression, support vector machine models, rule set models, association rules, time series SAS SAS Enterprise Miner, versions 5.1, 5.2, 5.3 Linear regression, logistic regression, decision trees, neural networks, clustering models, association rules SPSS Versions of Clementine, PASW, SPSS Varies by PMML version, includes: association rules, clustering models, decision trees, neural networks, regression models, rule set models, sequence models, support vector machines, naive Bayes Teradata Teradata Warehouse Minder V5.3.1 Regression models, decision trees, neural networks, clustering models, mining models Zementis Zementis PMML converter and other products Decision trees, support vector machines, neural networks, regression and general regression, clustering models, association rules, mining models, naive Bayes, ruleset models
Source: Data Mining Group
PMML is a mature standard. Among its attributes are the header used to describe the PMML document itself, including the version of the standard. The current version is 4.1. The standard covers a model’s taxonomy, statistics, targets, and more. It includes a data dictionary and a mining schema. It also handles a number of data transformations, including value mapping, normalization, and aggregation. And, of course, there is the model itself, incorporating its function and algorithm among other attributes. PMML supports a broad range of analytics models such as Naïve Bayes Classifiers, Linear and Logistic Regression, Neural Networks, Decision trees, and many others.Payne said before XO embraced PMML, when the company hired external experts to build predictive models, they often used development tools not in XO’s portfolio. That meant XO often had to invest in that tool to work with the model developed by the outside experts.In addition to the benefits of interoperability and transparency, another PMML plus can be lower direct costs in model development and migration. Cris Payne, senior manager for customer intelligence at XO Communications LLC, one the nation’s largest communications service providers for business based in Herdon, Va., said that diversity of tools for creating analytic models can be part of the problem.
While the models supported by PMML are comprehensive, they are not complete. Payne said, “There are limits to PMML. It does not include every model, but most are supported.”
However, Guazelli, who is the co-author of PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics, says the PMML standard continues to evolve. More models are bound to be added over time as well as new capabilities such as enhancements to how PMML can incorporate multiple or ensemble models within a predictive analytics application. He said, “This allows for flexible testing and for component models to be easily swapped in and out of production.”
This means that unstructured data can be modeled and then output in PMML to create analytics applications.
A survey conducted by IBM in late 2011 revealed that CIOs see business intelligence and analytics projects as their top priority in the coming years. Enterprises need to get better insight into the vast amounts of data flowing through their businesses in order to grow, retain customers, decrease costs, improve supply chains, develop new products and services. And they need to do these things faster. PMML is proving to be a model standard to achieve those goals.
Mark Everett Hall is a long-time technology writer who lives in the Willamette Valley. Contact him at email@example.com.