An estimated 22 military veterans take their own lives each day—one almost every hour, according to recent research by the U.S. Department of Veterans Affairs. Yet predicting who is likely to commit suicide remains a challenge for mental health professionals.
That’s where computer scientist Chris Poulin and a semantics-based prediction tool enter the picture. Poulin and his company, Patterns & Predictions, had developed a commercial Bayesian analytics tool for predicting events—most notably financial events—based on historical analysis. “You have a stock that went bust on a certain date,” explains Poulin. “What were the forensic features that led up to that stock going bust?”
Ten years ago, as Poulin was launching the company, his best friend committed suicide. “He posted a suicide note—and what turned out to be pre-suicide notes—on social media,” says Poulin. As time went on, Poulin began to consider whether a similar event prediction model could parse the social media behavior of veterans to uncover those who might be about to harm themselves. Later, as a researcher at Dartmouth, Poulin partnered with Paul Thompson, an instructor at the university’s Geisel School of Medicine who specializes in computational linguistics and also lost a friend to suicide, and they took their pitch to the Pentagon. Three years ago, they were awarded a $1.7 million contract by the Defense Advanced Research Projects Agency (DARPA) to combine Thompson’s linguistics work with Poulin’s event-focused text analytics to create a model to predict those with suicidal or other harmful tendencies.
Dubbed the Durkheim Project (after French sociologist Emile Durkheim known for his 19th century study of suicide data) the researchers ultimately hope to use opt-in data from veterans’ social media and mobile content to create a real-time predictive analytics tool for suicide risk. While the team behind the project is optimistic about its abilities to make predictions with 65 percent accuracy, the challenge at this stage is about gaining the cooperation of veterans to join the effort to gain insights into their well-being.
Assembling a Test Dataset
The first step was to build the text-driven algorithm, but there was a hitch. “There was no suicide-positive dataset out there,” says Poulin. “It’s such an emotionally charged issue that people didn’t know what to collect or how to collect it while being respectful of the veterans and their families.” Instead, the researchers asked the Veterans Administration for a set of veterans’ data with which they could train the model. “They had this data, but it wasn’t being analyzed at a large level,” says Poulin.
They obtained access to information on 300 veterans who had previously agreed to let their records be used for research—100 had no observed psychiatric issues, 100 had sought psychiatric care, and 100 had committed suicide (and may or may not have sought help). The data was de-identified at the White River Junction VA Medical Center and transferred to the Dartmouth-Hitchcock Medical Center, where it remains.
Taking a text-driven approach—specifically looking for keywords, keyword pairs, and key phrases in the medical records—they conducted a brute force computational analysis to determine which terms were most distinctly predictive for each cohort. “It’s more than simple card counting,” explains Poulin. “We were looking for the distinctive terms or combinations of terms that consistently showed up over time and distinguished one group from another.” In February, they were able to validate the system’s language-mining approach with consistent accuracy results of 65 percent of more in predicting suicide risk.
This spring, the Durkheim team used that model as the basis for a high-performance tool that could analyze in real-time the social media activity of a much larger pool of veterans—100,000 to be exact—rather than the language that appeared in a few hundred medical records. By June, the team had in place the mobile apps, social media connections, database, and integration between that database with their machine learning libraries. That enabled the team to input new data—social networking profiles and activity along with mobile information like user location and de-identified text messages—into the team’s model.
Cloudera provided Hadoop support and helped the researchers build the statistical classification tools for calculating risk quickly. Attivio provided real-time enterprise search indexing capabilities that enable the tool to index and query incoming content. The application (available on Facebook as well as for iPhone and Android devices) automatically uploads relevant content from veterans’ online into the medical database. The resulting text repository is updated and analyzed by machine-learning systems. At Dartmouth-Hitchcock, a clinical dashboard monitors and displays participants mental health status with updates every 30 seconds, classifying each as red, yellow or green based on statistically correlated tendencies for harmful behaviors.
Social Media Data as Useful Health Signals
The biggest issue for the Durkheim team to figure out was adjusting for the change in narrative mode of interpreting participant’s mental states. The expanded system would not be taking in third-person notes from clinicians but instead would be analyzing first-person texts or posts to Facebook or Twitter. “It was not a completely foreign problem to us, thankfully,” says Poulin, who has used synonym or concept maps successfully in his other event prediction work. A doctor might say a patient is “agitated” while a veteran might us the word “upset” instead. A psychiatrist might note “chronic pain management” while a former soldier would text about his Demerol use. The synonym mapping could cause some slight performance problems compared to the VA data model, but the goal is to improve that performance over time.
Signing on Facebook as a distribution partner, this summer the Durkheim Project began enlisting participants—a couple dozen as of August. The recruitment has been slow and demographically targeted. And those at risk tend to be socially withdrawn.
At this phase, the project is non-interventional. No official diagnoses are authorized based on the analysis, nor are the Durkheim researchers authorized to intercede in any participant’s mental health situation. “[Participants] don’t get anything out of it other than trying to help us with this system,” says Poulin.
Durkheim has three psychiatrists on its team working on intervention protocols, which would come in phase three. If a veteran has a clinical relationship, they might authorize that person to monitor their status and intervene.
Researchers are also evaluating the merits of a buddy system. A kind of “wingman looking out for you,” says Poulin.” If someone was looking out for your interests on the frontline in Afghanistan, you might want to specify him.” Some participants may not have such a support, so Durkheim researchers are evaluating possible automated messages that could be sent based on clinically researched narratives for calming people down.
Poulin is confident in the algorithm behind the system; its 65 percent accuracy would be a home run for his previous Wall Street clients, even though those in the medical community would like to see that go to 80 percent, he says. “What I’m not confident of is getting the number of participants we need,” he said. “It’s all in the recruitment. But if we can build in the intervention, more people will opt in.”
Stephanie Overby, a contributing editor at Data Informed, is a Boston-based freelance writer. Follow her on Twitter: @stephanieoverby.