A graduate student’s inventive use of graphical processing units (GPUs). A former Googler’s argument for using more data and simple algorithms to make predictions. A data warehouse veteran’s take on big data project failures. The rise of Hadoop and the value of open source R to business. These are among the most popular articles, opinion pieces and podcasts according to Data Informed audience data in the past year.
Taken as a group, this list illustrates several ongoing themes of the big data analytics field, among them: Innovation can come from creative people tinkering with commodity tools. The “old guard” of the IT industry has a lot of experience to share about management issues and making new technologies work with existing systems. And practitioners, while interested in the capabilities of cloud systems, remain concerned about securing their data.
Below, the top 10 ranked by popularity.
Todd Mostak’s first tangle with big data didn’t go well. As a master’s student at the Center for Middle Eastern Studies at Harvard in 2012, he was mapping tweets for his thesis project on Egyptian politics during the Arab Spring uprising. It was taking hours or even days to process the 40 million tweets he was analyzing. So over the next year, Mostak created a cost-effective workaround and his inventive approach has the potential to benefit others in both academia and business.
Published: April 22, 2013. Read more.
More data and simple algorithms work because having more data allows the “data to speak for itself,” instead of relying on unproven assumptions and weak correlations, writes Garrett Wu of WibiData in this opinion piece.
Published: August 7, 2013. Read more.
When approaching big data, the industry places a lot of focus on ‘the three Vs”: Volume, variety, and velocity. Yet, not nearly enough emphasis is being placed on the most important V – value. For this reason, too many big data projects are being undertaken without the kind of results that are possible, writes Stephen Brobst of Teradata in this opinion article.
Published: March 5, 2013. Read more.
R is on the rise as a business analytics tool. This episode of the Data Informed podcast features a discussion about commercial implementations of the R programming language with David Smith, vice president of marketing and community at Revolution Analytics.
Published: November 11, 2013. Read more and listen here.
Using technology called Automatic Dependent Surveillance-Broadcast (ADS-B), FlightRadar24 has established a network of more than 500 ADS-B receivers around the world, installed and operated by volunteer plane spotters. During flight, an aircraft gets its GPS location from satellites, and many use an onboard ADS-B unit to transmit a signal containing that location and other information. The receiver picks up the signal and, using custom FlightRadar24 software, feeds the data into the company’s MySQL open source database.
Published: October 19, 2012. Read more.
By embracing a new, more flexible and more modular architecture for data warehousing that includes Hadoop, major vendors like IBM, Oracle and Teradata are starting to move into a market that until now has been populated by startups, niche players and open source initiatives. They are somewhat behind the curve—anyone interested in building their own Hadoop cluster has been able to for years—but the vendors bring credibility to Hadoop by trading on their trusted names in enterprise business settings.
Published: July 24, 2013. Read more.
A dozen potentially disruptive technologies could deliver as much as $33 trillion in global economic value by 2025, according to the McKinsey Global Institute. A co-author of the report, Michael Chui, said that while “big data underlies all of them in some way or another” the large datasets and analytics tools applied to them are directly relevant to four areas of activity: knowledge work, advanced robotics, next-generation genomics and the Internet of Things.
Published: May 23, 2013. Read more.
The information security practitioners at the Cloud Security Alliance know that big data and analytics systems are here to stay. They also agree on the big questions that come next: How can we make the systems that store and compute the data secure? And, how can we ensure private data stays private as it moves through different stages of analysis, input and output? It’s the answers to those questions that prompted the group’s latest 39-page report detailing 10 major security and privacy challenges facing infrastructure providers and customers.
Published: June 20, 2013. Read more.
There are often some misconceptions surrounding the presumed ease of developing big data applications, especially when the barriers to acquiring big data system software are lowered through open source availability. While it may be straightforward to download and install the core components like Hadoop and MapReduce, designing, developing, and deploying analytic applications still requires some skill and expertise. This article examines the prototypical big data platform using Hadoop, and how Pig, Hive, HBase, Zookeeper and Mahout address these pieces of the puzzle.
Published: July 23, 2013. Read more.
Starbucks executives say their latest major expansion plan to add hundreds of stores globally will make better use of location analytics than an earlier effort begun before the 2008 recession. The profitable results of the past two years show its latest effort is working, the company asserts.
Published: January 10, 2013. Read more.