When Netflix announced that it would not be putting into production the $1 million prize-winning algorithm to improve its recommendation engine accuracy, it may have seemed like another bone-headed decision by a management team that months earlier reversed its decision to spin of its traditional DVD-by-mail service. After all, there was no shortage of pundits praising the company when it first announced the Netflix Prize open innovation contest to improve one of its most valued assets. Even CEO Reed Hastings had been quoted as saying that such increased performance of its Cinematch system would be worth well in excess of the prize money.
But the Netflix Prize story is not that simple. And in its complexity, the story offers up some valuable lessons about the difficulty of implementing complex machine learning systems, strategic planning in a dynamic industry and properly framing an open innovation challenge.
Netflix leaders said the 10 percent performance improvement provided by the million-dollar solution—a combination of 800 algorithms developed by seven participants from the United States, Austria, Canada and Israel—would not justify the the engineering effort required to roll it out. And besides, they pointed out, their business focus had shifted from mailing red envelopes across America to streaming media around the world.
“Streaming has not only changed the way our members interact with the service, but also the type of data available to use in our algorithms,” read an April 6 Netflix blog post. The company said it now realizes the prize’s objective—more accurate prediction of a customer’s movie’s rating—is just one of the many components of an effective recommendation system.
Designing a Prize for a Business in Flux
Aside from the difficult integration problems identified by Netflix, a key business lesson of the Netflix announcement is not to underestimate the importance of framing questions and challenges properly for a company’s changing market conditions, said Michael Schrage, a research fellow at MIT Sloan School’s Center for Digital Business and author of Serious Play and the upcoming Getting Beyond Ideas.
“The real story here is not the implementation ‘failure’ of the Netflix Prize, but the company’s inability to anticipate where its own market, technology, and business model was going,” Schrage said. “The prize did an excellent job of solving the challenge Netflix had in a given moment in time. Alas, time moves on but the prize design parameters were neither creative nor flexible enough to deal with new realities.”
The complexity of framing these design parameters grows when a company opens the door to outsiders for help. Open innovation can be difficult to implement when business realities change at a rapid pace and participants are on the outside.
“Crowdsourcing can be very effective when a complex problem is broken down into smaller, simpler tasks,” said Ari Lightman, distinguished service professor at Carnegie Mellon University’s Heinz College, whose students develop analytics to measure the business impact of emerging technologies for corporate partners.
“These big-bang approaches often fail to meet their mark since the folks who are developing the ultimate algorithm or technology do not have intimate knowledge of the company’s strategy, industry dynamics, implementation capability or support structure. There are examples to the contrary [of successfully crowdsourcing] a scientific breakthrough or solutions to a challenging problem facing a research community—but not innovation that needs to be tightly integrated into a corporate structure.”
A Netflix spokesman declined an interview request for this story, saying the company would stand by its public statement on the issue.
The Netflix Prize Timeline
• October 1, 2006: Netflix announces its first Netflix Prize: a $1 million grand prize to the first developer of an algorithm that could surpass the performance of its Cinematch algorithm at predicting customer ratings by 10 percent and $50,000 progress prizes for the most improvement each year.
• November 13, 2007: Netflix awarded team KorBell (also known as BellKor) a progress prize for improving ratings predictions by 8.43 percent.
• December 10, 2008: Netflix awarded team BellKor in Big Chaos a progress prize for improving ratings predictions by 9.44 percent.
• September 21, 2009: Netflix awarded the $1 million grand prize to BellKor’s Pragmatic Chaos team for improving rating predictions by 10 percent. Netflix also announced a new Netflix Prize aimed at using demographic and behavioral data to predict customer rentals.
• March 12, 2010: Netflix cancels its second Netflix Prize in response to member privacy concerns.
• April 6, 2012: Netflix announces that it will not implement the grand prize-winning algorithm due to the cost of implementation and shifts in business strategy.
Two Earlier Algorithms Implemented
Ultimately, Netflix did implement two of the 107 algorithms that in 2007 won its first $50,000 Progress Prize (given each year of the contest to the team that showed the most improvement over the previous year’s accuracy bar) for an 8.43 percent recommendation improvement. To get those into production, Netflix noted on its blog, the company had to weave the two solutions together and overcome their inherent limitations—they were built to handle just 100 million ratings when Netflix has more than 5 billion, and they could not adapt as customer added more ratings.
The grand prize solution was even more intricate. “[It] was achieved by averaging together many, many models,” said Chris Volinsky, director of the statistics research department at AT&T Labs in Florham Park, N.J., who led the prize-winning teams. “The complexity of the solution was due to the integration of many different teams. This was necessary to squeeze the final blood from the stone, so to speak, but was not feasible for a production system.”
The idea that the ideal big data system may require combining many solutions is one that may be valuable to Netflix as it makes changes to its production environment to accommodate the even bigger data world of streaming video, adds Volinsky, who has spent the last 20 years working on large scale data analysis and predictive modeling.
When developers of a big data solution have no involvement with its implementation, however, problems are bound to arise, said Lightman of Carnegie Mellon. “The teams that developed the algorithms weren’t going to be putting it in place. They said, ‘Hey, cool problem! Let’s solve it!’” Lightman said.
“That’s why it was too difficult to implement. Netflix bought into the idea that if you throw a lot of money at really smart folks, they’ll come up with great solutions. But they won’t be thinking about feasibility or implementation characteristics.”
From the Netflix blog:
“If you followed the Prize competition, you might be wondering what happened with the final Grand Prize ensemble that won the $1 million two years later. This is a truly impressive compilation and culmination of years of work, blending hundreds of predictive models to finally cross the finish line. We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment. Also, our focus on improving Netflix personalization had shifted to the next level by then.”
Stephanie Overby is a Boston-based freelance writer. Find her on Twitter: @stephanieoverby.