The Obama Administration’s announcement that it is investing $200 million in a big data initiative reads like a checklist for leaders of advanced organizations who want to derive ever-more sophisticated insights from data sets growing in size and complexity by the hour.
• Advance the means for data scientists to manage, analyze, visualize and extract useful information from large and diverse data sets. Check.
• Create a cloud-based system that allows researchers to access a 200-terabyte data set. Check.
• Develop new scalable software tools for analyzing large volumes of structured and unstructured data in distributed data stores. Check.
• Create human-computer interaction tools to facilitate “rapidly customizable visual reasoning.” Check.
• Launch an online innovation marketplace for qualified bidders to contest for R&D “data to decisions” projects. Check.
A Lack of Big Data Investment
These and other projects unveiled March 29 are a response to a recent White House assessment that the federal government was underinvesting in information technologies that enable scientists, researchers and analysts to “move from data to knowledge to action,” said John P. Holdren, director of the White House Office of Science and Technology Policy.
Holdren said that while the private sector will take the lead on advances in the field, and universities create new courses of study in big data-related topics, the government can provide support through long-term research and development. He cited the potential benefits to economic growth, with the government’s projects leading to new products and services that help organizations figure out how to use big data; advances for scientists and researchers in fields such as healthcare, education, environmental science and astronomy; and national security applications.
While $200 million is not a lot of money—it’s barely a blip in the government’s $79.5 billion IT budget for 2012—the scope of the announcement was interesting, said Dan Vesset, a business analytics analyst and program vice president at IDC. “What’s interesting is the effort at collaboration among the different agencies,” Vesset said, adding that the agencies have an opportunity to develop some best practices about big data projects.
Indeed, a recurring theme of the projects announced was the fact that no one agency was doing anything by itself. For example:
• The National Science Foundation and the National Institutes of Health (NIH) are supporting a big data project to advance the scientific and technological means for data scientists to manage, analyze, visualize and extract insights from large and diverse data sets.
• The NIH announced an agreement with Amazon Web Services to make publicly available to researchers 200 terabytes of human genetic data gathered in the 1000 Genomes Project. Data access is free, and researchers pay for computing services.
• The White House said the XDATA program launched by the Defense Advanced Research Projects Agency (DARPA) to develop new software tools for machine- and human-based decision support will support open source software toolkits “to enable flexible software development for users to process large volumes of data” with results that meet battlefield timeliness.
• The Defense Innovation Marketplace connects industry and government users in an online innovation marketplace to win R&D “data to decisions” projects. The marketplace represents the Pentagon’s “big bet on big data.”
• The Energy Department established the Scalable Data Management, Analysis and Visualization Institute, a joint project at six national laboratories, six universities and Kitware, a company that supports specialized visualization software, to develop more sophisticated simulations.
• The U.S. Geological Survey said its John Wesley Powell Center for Analysis and Synthesis in Fort Collins, Colo., had selected for funding big data research projects by scientists analyzing ecological changes in the Great Lakes and the Great Barrier Reef, assessing the risk of mercury accumulation in the western U.S., Canada and Mexico, and improving global earthquake probability modeling.