Hooters is about to go down in history for more than just its wings, beer, and well, you know. The U.S. restaurant chain is embroiled in a multimillion-dollar commercial indemnity lawsuit stemming from the sale of the company. What’s unique about the case, however, is that Vice Chancellor J. Travis Laster, the Delaware judge presiding, has recommended the use of a controversial data analysis tool, known as predictive coding, to resolve the legal dispute.
By requesting that the plaintiff and defendant “show cause why this is not a case where predictive coding is the way to go,” the October 2012 court order marks the first time a judge has demanded serious consideration be given to predictive coding – a history-making request that’s thrusting this innovative litigation tool into the spotlight. (In a transcript of the court proceedings, attorneys for the parties did not raise immediate objections, but legal industry and electronic discovery analysts buzzed about the order.)
Predictive coding is a process that uses predictive analytics to find key documents quickly. Forget about bleary-eyed law associates poring through bankers’ boxes of legal documents. Predictive coding relies on computer software to review reams and reams of documents at a high speed and for a fraction of the cost.
Keyword search technology, such as Boolean logic used by today’s search engines, has been around for decades. But Boolean keyword searches locate only 22 percent to 57 percent of the total number of relevant documents, according to tests by the Text Retrieval Conference (TREC), an international workshop co-sponsored by the National Institute of Standards and Technology and the Defense Department that assesses various information retrieval approaches.
Predictive coding, on the other hand, raises the bar significantly by relying on a combination of machine-learning technology and human eyeballs to “predict” the relevance of documents. Essentially, predictive coding selects only those documents that relate to a particular issue, and then prioritizes them based on relevance. Next, a human being reviews these documents, confirms or rejects their relevance, and then reprioritizes them accordingly. This curated batch of documents is then fed back into the predictive coding system for a second iteration, in a way, ‘training’ the computer to make more accurate selections.
But technology for technology’s sake isn’t what’s prompting legal authorities like Judge Laster to push for predictive coding from vendors including Recommind, Symantec and Equivio. “We’re seeing massive explosions in data growth,” says Matthew Nelson, e-discovery counsel for Symantec. “At the same time, parties involved in e-discovery have a legal obligation to identify documents that are requested from the other party. They need to review these documents and find the right responses. Predictive coding makes that model more efficient by reducing the time and expense associated with electronic discovery.”
A Relevance Filter
Unlike keyword search technology, which can still produce mountains of paperwork in need of parsing, predictive coding turns this mountain into a mole hill by carefully classifying documents into digestible batches. By way of hypothesis, consider a company that sues Oracle over its Java programming language. But what if one of the parties has a branch in Java, Indonesia?
The predictive coding technique is a great relevance filter for massive datasets, says Craig Carpenter, vice president of marketing for Recommind, a predictive coding software provider. “The problem with any technology that doesn’t employ predictive coding is that you’re going to get back answers that are all lumped together, some of which will have to do with office space in Indonesia or ordering coffee from Colombia; others having to do the Sun Microsystems programming language,” Carpenter says.
By identifying relevant documents based on criteria such as the contextual similarity of certain documents, or the timing of a particular event, Carpenter says that predictive coding “puts those clumps of documents into different groups right out of the gate and organizes them by topic.” All of which results in huge time savings for a faster resolution.
Lawyers’ Billable Hours, Applied to Gigabytes
Predictive coding can also drastically reduce the costs associated with legal review. According to a 2012 study entitled, “Where the Money Goes — Understanding Litigant Expenditures for Producing Electronic Discovery,” by Rand Corp. researchers Nicholas M. Pace and Laura Zakaras, the total cost of reviewing a single gigabyte of legal data is approximately $18,000. “A relatively small case today would be in the 50-gigabyte range,” says Carpenter. “That’s well north of a baseline threshold for using technology in litigation, including predictive coding.”
In fact, narrowing down the months required for legal review is partially what prompted the parties involved in the case Da Silva Moore v. Publicis Groupe to turn to computer-assisted review. The high-profile gender discrimination case in federal court in New York, in which the defendant collected about 3 million documents in response to electronic discovery requests, marks one of the first times a court has approved the use of computer-assisted review to search for electronically stored information.
A Technology Subject to Procedural Dispute
Although the Da Silva case signifies a growing acceptance of review technologies, it also highlights how increasing discomfort around predictive coding can create its own set of delays and added legal expenses. In the Da Silva case, for example, plaintiffs argued against the defendant’s use of predictive coding to identify relevant documents. Negotiations broke down until late November when a district court judge upheld the decision to allow the defendant’s use of computer-assisted review in the ongoing case.
Requests to employ technologies such as predictive coding “create a situation where there’s an immediate level of mistrust,” says Kenneth J. Withers, director of judicial education at The Sedona Conference, a non-profit research institute for the study of law and policy. “When one side proposes it, the other questions, ‘What wool are you trying to pull over our eyes?’”
Other hindrances to the widespread adoption of predictive coding include software acquisition costs which can run in the hundreds of thousands, training investments and administrative headaches such as integrating the solution with legacy systems and maintenance tasks.
The Bottom Line
But perhaps the most obvious threat to predictive coding’s popularity is a question of financial interests of law firms. Withers of the Sedona Conference asks: Where’s the “economic incentive on the part of large law firms, which have made billions off of selling the hours of first-, second- and third-year associates in document review?” If anything, he says, you’d think that there would be “economic resistance to a system that effectively reduces their ability to make money by 95 percent.”
But as long as the court document deluge continues, and more judges like Judge Laster’s back computer-assisted review, predictive coding has the potential to gain traction as a powerful litigation tool.
Cindy Waxer is a Toronto-based freelance journalist and a contributor to publications including The Economist and MIT Technology Review. She can be reached at firstname.lastname@example.org or via Twitter @Cwaxer.
Home page image of gavel by StockMonkeys.com via Flickr.