How to Manage Big Data’s Big Security Challenges

by   |   May 13, 2014 5:30 am   |   3 Comments

Jeff Markey, Data Scientist, ThreatTrack Security

Jeff Markey, Data Scientist, ThreatTrack Security

As the amount of data being collected continues to grow, more and more companies are building big data repositories to store, aggregate and extract meaning from their data. Big data provides an enormous competitive advantage for corporations, helping businesses tailor their products to consumer needs, identify and minimize corporate inefficiencies, and share data with user groups across the enterprise. With a growth rate of 58 percent in 2013 alone, these technologies and their benefits are here to stay.

Unfortunately, legitimate organizations aren’t the only groups that are going big. Large sets of consolidated data are a tempting target for cyber attackers. Breaching an organization’s big data repository can provide criminal groups with bigger payoffs and more recognition from a single attack. And when attackers set their sights on big data repositories, the effects can be devastating for the affected organizations. Terabytes of data in these repositories may include a company’s crown jewels: customer data, employee data, and trade secrets. The recent data breach at Target is estimated to cost the company upwards of $1.1 billion, and the PlayStation breach cost Sony an estimated $171 million. A breach in a big data repository could be even more damaging at a financial institution or healthcare provider, where the value of the data is extremely high and government regulations come into play.

Related Stories

Cloud Analytics Expert: Adopt Data Security When You Begin a Project.
Read the story »

Hadoop Use Cases Grow with Added Security for Open Source System.
Read the story »

How Analytics Can Manage Risks from Advanced Network Security Threats.
Read the story »

Cloud Computing Experts Detail Big Data Security and Privacy Risks.
Read the story »

Securing big data comes with its own unique challenges beyond being a high-value target. It’s not that big data security is fundamentally different from traditional data security. Big data security challenges arise because of incremental differences, not fundamental ones. The differences between big data environments and traditional data environments include:

    • The data collected, aggregated, and analyzed for big data analysis


    • The infrastructure used to store and house big data


  • The technologies applied to analyze structured and unstructured big data


The Data

The variety, velocity and volume of big data amplifies security management challenges that are addressed in traditional security management. Big data repositories will likely include information deposited by various sources across the enterprise. This variety of data makes secure access management a challenge. Each data source will likely have its own access restrictions and security policies, making it difficult to balance appropriate security for all data sources with the need to aggregate and extract meaning from the data. For example, a big data environment may include a dataset with proprietary research information, a dataset requiring regulatory compliance, and a separate dataset with personally identifiable information (PII). A researcher might want to correlate their research with a dataset including PII data, but what restrictions should be in-place to ensure adequate security? Protecting big data requires balancing analysis like this with security requirements on a case-by-case basis.

Whitepaper: The Painful Reality of Customer Data Management


In addition, many of the repositories collect data at high volumes and velocity from a number of different data sources, and they all might have their own data transfer workflows. These connections to multiple repositories can increase the attack surface for an adversary. A big data system receiving feeds from 20 different data sources may present an attacker with 20 viable vectors to attempt to gain access to a cluster.

The Infrastructure 

Another big data challenge is the distributed nature of big data environments. Compared with a single high-end database server, distributed environments are more complicated and vulnerable to attack. When big data environments are distributed geographically, physical security controls need to be standardized across all accessible locations. When data scientists across the organization want access to information, perimeter protection becomes important and complicated to ensure access to users while protecting the system from a possible attack. With a large number of servers, there is an increased possibility that the configuration of servers may not be consistent – and that certain systems may remain vulnerable.

The Technology 

An additional big data security challenge is that big data programming tools, including Hadoop and NoSQL databases, were not originally designed with security in mind. For example, Hadoop originally didn’t authenticate services or users, and didn’t encrypt data that’s transmitted between nodes in the environment. This creates vulnerabilities for authentication and network security. NoSQL databases lack some of the security features provided by traditional databases, such as role-based access control. The advantage of NoSQL is that it allows for the flexibility to include new data types on the fly, but defining security policies for this new data is not straightforward with these technologies.

Securing Big Data 

So what can be done to help bring the security of traditional database management to big data? Several organizations describe and define different security controls. The SANS Institute provides a list of 20 security controls.  The list contains several controls that I would recommend to address the security challenges presented by big data.

  • Application Software Security.Use secure versions of open-source software. As described above, big data technologies weren’t originally designed with security in mind. Using open-source technologies like Apache Accumulo or the .20.20x version of Hadoop or above can help address this challenge. In addition, proprietary technologies like Cloudera Sentry or DataStax Enterprise offer enhanced security at the application layer. Specifically, Sentry and Accumulo also support role-based access control to enhance security for NoSQL databases.


  • Maintenance, Monitoring, and Analysis of Audit Logs. Implement audit logging technologies to understand and monitor big data clusters. Technologies like Apache Oozie can help implement this feature. Keep in mind that security engineers in the organization need to be tasked with examining and monitoring these files. It’s important to ensure that auditing, maintaining, and analyzing logs are done consistently across the enterprise.


  • Secure Configurations for Hardware and Software. Build servers based on secure images for all systems in your organization’s big data architecture. Ensure patching is up to date on these machines and that administrative privileges are limited to a small number of users. Use automation frameworks, like Puppet, to automate system configuration and ensure that all big data servers in the enterprise are uniform and secure.


  • Account Monitoring and Control. Manage accounts for big data users. Require strong passwords, deactivate inactive accounts, and impose a maximum permitted number of failed log-in attempts to help stop attacks from getting access to a cluster. It’s important to note that the enemy isn’t always outside of the organization. Monitoring account access can help reduce the probability of a successful compromise from the inside.

Organizations that are serious about big data security should consider these first steps. Cyber criminals are never going to stop being on the offensive, and with such a big target to protect, it is prudent for any enterprise utilizing big data technologies to be as proactive as possible in securing its data.

Jeff Markey is a Data Scientist with ThreatTrack Security, supporting corporate data mining efforts and product development.  He has 7 years of experience implementing data analytics in the cyber security field.  He holds a Master of Science in Computer Science and Mathematics from Johns Hopkins University and is certified as a Global Information Assurance Security Expert (GSE).

Tags: , , ,


  1. Enrique Vega
    Posted May 15, 2014 at 3:27 am | Permalink

    In reality, Big Data is more about the processing techniques and outputs than the size of the data set itself, so specific skills are required to use Big Data effectively. There is a general shortage of specialist skills for Big Data analysis, in particular when it comes to using some of the less mature technologies.

  2. Jessica Dodson
    Posted May 20, 2014 at 2:47 pm | Permalink

    “big data technologies weren’t originally designed with security in mind.”

    I think that is something we need to change. Security can’t be a secondary concern or something you worry about at the end. If you build technology/applications with security in mind you’re building a much more secure system overall.

  3. Olap on hadoop
    Posted August 21, 2015 at 2:33 am | Permalink

    Big data has a lot of capacity to profit organizations in any kind of industry, ubiquitously in world. it is a useful to decision-making and helpful to improve the financial position of any organization. But with these every organization data need security, and we can say it is a big challange for big data softwares and technologies.

    nice to read the post.

    Priyanka Jain

Post a Comment

Your email is never published nor shared. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>