The Cloud Security Alliance’s Big Data Working Group today released The Big Data Security and Privacy Handbook: 100 Best Practices in Big Data Security and Privacy. The handbook presents 10 considerations for each of 10 top challenges in big data security and privacy. The following excerpt from the handbook discusses best practices for real-time security and compliance monitoring. (All section numbers within the excerpt reflect the organizational paradigm of the handbook and the placement of the excerpted section within it.)
5.0 Real-Time Security/Compliance Monitoring
Big data is generated by a variety of different gadgets and sensors, including security devices. Real-time security and compliance monitoring is a double-edged sword. On one hand, big data infrastructures have to be monitored from a security point of view. Questions – Is the infrastructure still secure? Are we under attack? – need to be answered. On the other hand, entities that utilize big data can provide better security analytics compared to those that do not (e.g., fewer false positives, more fine-grained and better quantified security overviews, etc.). The following practices should be implemented to adhere to best practices for real-time security/compliance monitoring:
5.1 Apply big data analytics to detect anomalous connections to cluster
To ensure only authorized connections are allowed on a cluster, as this makes up part of the trusted big data environment.
Use solutions like TLS/SSL, Kerberos, Secure European System for Applications in a Multi-Vendor Environment (SESAME), Internet protocol security (IPsec), or secure shell (SSH) to establish trusted connections to and – if needed – within a cluster to prevent unauthorized connections. Use monitoring tools, like a security information and event management (SIEM) solution, to monitor anomalous connections. This could be, for instance, based on connection behavior (e.g., seeing a connection from a “bad Internet neighborhood”) or alerts being filed in the logs of the cluster systems, indicating an attempt to establish an unauthorized connection.
5.2 Mine logging events
To ensure that the big data infrastructure remains compliant with the assigned risk-acceptance profile of the infrastructure.
- Mine the events in log files to monitor for security, like in a SIEM tool.
- Apply other algorithms or principles to mine events (such as machine learning) to get potential new security insights.
5.3 Implement front-end systems
To parse requests and stop bad requests. Front-end systems are not new to security. Examples are routers, application-level firewalls and database-access firewalls. These systems typically parse the request (based on, for instance, syntax signatures or behavior profiles) and stop bad requests. The same principle can be used to focus on application or data requests in a big data infrastructure environment (e.g., MapReduce messages).
Deploy multi-stage levels of front-end systems. For example, utilize a router for the network, an application-level firewall to allow/block applications, and a dedicated big data front-end system to analyze typical big data inquiries (like Hadoop requests). Additional technology, such as a software-defined network (SDN), may be helpful for implementation and deployment.
5.4 Consider cloud-level security
To avoid becoming the “Achilles heel” of the big data infrastructure stack. Big data deployments are moving to the cloud. If such a deployment lives on a public cloud, this cloud becomes part of the big data infrastructure stack.
- Implement other CSA best practices.
- Encourage Cloud Service Providers to become CSA STAR-certified compliant.
5.5 Utilize cluster-level security
To ensure that security methodology for big data infrastructure is approached from multiple levels. Different components make up this infrastructure – the cluster being one of them.
Apply – where applicable – best security practices for the cluster. These include:
- Use Kerberos or SESAME in a Hadoop cluster for authentication.
- Secure the Hadoop distributed file system (HDFS) using file and directory permissions.
- Utilize access control lists for access (e.g., role-based, attribute-based).
- Apply information flow control using mandatory access control.
The implementation of security controls also (heavily) depends on the cluster distribution being used. In case of strict security requirements (e.g., high confidentiality of the data being used), consider looking at solutions like Sqrrl, which provide fine-grained access control at the cell level.
5.6 Apply application-level security
To secure applications in the infrastructure stack. Over the last years, attackers have shifted their focus from operating systems to databases to applications.
- Apply secure software development best practices, like OWASP (owasp.org) for Web-based applications.
- Execute vulnerability assessments and application penetration tests on the application on an ongoing and scheduled basis.
5.7 Adhere to laws and regulations
To avoid legal issues when collecting and managing data. Due to laws and regulations that exist worldwide – specifically those that relate to privacy rights – individuals who gather data cannot monitor or use every data item collected. While many regulations are in place to protect consumers, they also create a variety of challenges in the universe of big data collection that will hopefully be resolved over time.
Follow the laws and regulations (i.e., privacy laws) for each step in the data lifecycle. These include:
- Collection of data
- Storage of data
- Transmission of data
- Use of data
- Destruction of data
Physical and virtual locations for each step in the data lifecycle may not be the same.
5.8 Reflect on ethical considerations
To address both technical and ethical questions that may arise. The fact that one has Big Data doesn’t necessarily mean that one can just use that data. There is always a fine line between what is technically possible and what is ethically correct. The latter is also impacted and related to legal regulations and the organization’s culture, among other factors, to name a few.
There are no clear guidelines concerning ethical considerations related to big data usage. At minimum, big data users must take into account all applicable privacy and legal regulations. Additionally, users should consider ethical discussions related to their organizations, regions, businesses, and so forth.
5.9 Monitor evasion attacks
To avoid potential system attacks and/or unauthorized access. Evasion attacks are meant to circumvent big data infrastructure security measures and avoid detection. It is important to minimize these occurrences as much as possible.
As evasion attacks evolve constantly, it is not always easy to stop them. Following the implementation of a defense in-depth concept, consider applying different monitor algorithms (like machine learning) to mine the data. Look for insights related to potential evasion of monitoring besides signature-based/rule-based/anomaly-based/specification-based detection schemes.
5.10 Track data-poisoning attacks
To prevent monitoring systems from being misled, crashing, misbehaving, or providing misinterpreted data due to malformed data. These types of attacks are aimed at falsifying data, letting the monitoring system believe nothing is wrong.
- Consider applying front-end systems and behavioral methods to perform input validation, process the data, and determine right from wrong as much as possible.
- It is also crucial to authenticate sources of data and maintain logs not only for preventing unauthorized data injection but also for establishing accountability.
- Utilize the monitoring system for strange behavior, like a spike in the central processing unit (CPU) and memory load for prolonged periods of time, or disk space running full quickly.
To download a copy of The Big Data Security and Privacy Handbook: 100 Best Practices in Big Data Security and Privacy, please visit the Cloud Security Alliance website.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise, plus get instant access to more than 20 eBooks.