In March 2012, the Obama Administration announced it was investing $200 million in big data projects to advance scientific research in areas like health care, energy, and the environment. On November 12, the White House came back with an update, unveiling a set of partnerships among government agencies, technology companies, universities and nonprofit institutions designed to advance these research efforts and build data science skills as a way to prompt economic growth.
At the “Data to Knowledge to Action” event, government officials, along with leaders from business, academia and the nonprofit sector sought to highlight a collection of new and in-progress projects that showcase a range of use cases for big data analytics to address. “The challenges are too big for government to solve alone,” said Thomas Kalil, deputy director for technology and innovation and the White House Office of Science and Technology Policy.
Farnam Jahanian, assistant director for computer and information science and engineering at the National Science Foundation, was one of several speakers to note that progress of the field will need continued investments to harness data and build skills to make sense of it all. The opportunities to develop a “deeper understanding of causal relationships based on advanced data analysis” require investments in research and development, programs that produce a qualified workforce and advancements in data management and data curation practices to preserve privacy as the market for data services grows, Jahanian said.
Among the projects highlighted:
Open source analytics. The University of California at Berkeley’s Algorithms, Machines and People (AMP) Lab is working on a new data analytics paradigm integrating machine learning methods, cloud and cluster computing architectures and crowdsourcing activities with the aim of “solving huge societal problems” using large datasets. Applications so far have reduced processing costs for genomic research, and an application to improve smartphone battery life. The National Science Foundation and the Defense Advanced Research Projects Agency are sponsors along with corporate funders.
Skills assessment. IBM unveiled Analytics Talent Assessment, an online platform to measure the preparedness of university students for careers in big data-related jobs. Constituents include prospective employers as well as universities and students, who can get personalized reports on how they can strengthen their skills. Eight universities are piloting the program.
Data science philanthropy. Pivotal, a developer of cloud-based big data software applications, said it would sponsor its employees to work for three-month stints with DataKind, a nonprofit agency that works on public service data projects.
Cancer research. The American Society of Clinical Oncology launched CancerLinQ, a five-year, $80 million initiative to analyze large, anonymized datasets on patient experiences to inform treatment decisions. The project has support from life sciences industry players like Amgen and Genentech BioOncology as well as nonprofit foundations.
Drug discovery and clinical trials. Pharmaceutical industry giants Eli Lilly & Co., Novartis and Pfizer are partnering to enhance the Clinicaltrials.gov website to make the process of recruiting patients interested in participating in clinical drug trials more successful at matching candidates to the trials.
Earth science research and education. Amazon Web Services will host space data from the NASA Earth eXchange to make data more accessible to the public. The aim is to spur crowdsourcing projects like Zooniverse.org where citizens analyze space data to solve questions.
The White House lists all the projects here.
Education Is Top of Mind
Along with highlighting the breadth of ongoing projects, the event broadcasted via the Web featured academics and leaders of charitable foundations who said they were working to revamp higher education programs to meet the demand for analytics skills.
Creating career paths for data scientists remains an important challenge, said Yann LeCun, director of the Center for Data Science at New York University. He showed a popular Venn diagram that shows an intersection of three knowledge areas: mathematics and statistics, computation, and domain science expertise. “We’re going to try to create new education programs that have all three components,” he said.
Big Data in the Big Apple
While the event was a day to discuss the far-reaching implications for big data technologies, one speaker provided a down-to-earth reminder that pragmatism pays off when implementing analytics.
To incorporate new data tools in existing New York City government processes requires understanding how city employees do their jobs, said Michael Flowers, the analytics director in the city’s Office of Policy and Strategic Planning. Take the process of inspections.
“If I were to go out into the field, the actual field level, and go to a fire inspector or even a consumer affairs inspector, and say, ‘Yes, I hired these [data scientist] kids, and they’re going to tell you how to do your job.’ I mean they will look at them the same way they look at McKinsey when they show up. Maybe a little more disdainful,” Flowers said. “The reality is that we have to sit there and figure out a way to understand that people are going to be acting on this insight and then tailor it accordingly.”
In his work, it is the cultural changes required to implement analytics that are more challenging than technological hurdles, said Flowers, a former Manhattan prosecutor and Justice Department lawyer in Iraq. He said he had to create demand for the products his team developed by showing their value to city employees’ work.
“Anything that we came up with that disrupted [an existing process] was never going to go anywhere. So we had to understand culturally, how those services are serviced on the ground in order to get the wonderful insight that my team was able to glean. And that insight is real.
“We do know, for example, that if a property has a foreclosure or tax lien on it, there is a very statistically significant relationship with whether or not that place is going to have a fire. I don’t know if it’s got a causal or anything, I doubt it sincerely, but I know that we know that. And we are now obligated ethically and morally to act on that,” Flowers said. “These are all very common challenges. I don’t care if you’re a government picking up trash, or you’re trying to sell someone a product. If you don’t understand the humanity and the processes behind the data that you are looking at, then you’re just going to go off the rails. You’re not going to produce anything worthwhile.”
Michael Goldberg is the editor of Data Informed. Email him at Michael.Goldberg@wispubs.com.