These days, data breaches affecting both consumers and businesses make headlines regularly, including the massive, high-profile heists of personal information from Anthem and iCloud. The intricacies of securing the growing amount and variety of data that organizations possess present unique challenges that are difficult to overcome.
At the core of the risk of data breach are the very technology solutions that organizations are implementing to leverage the power of big data. Platforms such as Hadoop offer many compelling advantages including cost savings, innovative analytic capabilities and scalability. However, they also expose organizations to the enormous risk of the large trove of data stored in massive data lakes falling into the wrong hands.
So where exactly do the threats originate? One of the most common vectors for a data breach these days is through compromised user accounts. Once the hackers obtain the credentials of a legitimate user, they can extract vast amounts of data if the appropriate controls are not in place for those users’ accounts. In a survey conducted by the Ponemon Institute, 71 percent of corporate users said that they had access to data that they felt they shouldn’t be able to see.
Data is even more exposed in newer platforms, like Hadoop, that were not designed with security or access controls in mind. This begs the question: How can companies maximize the power of Hadoop without letting sensitive data walk out the door? The answer lies in data access policy engines that give users access only to the exact information needed to do their job, but not a byte more.
Data Entitlement Policies: Good for the Dollar, Bad for the Breach
Data access policy engines are engineered to apply control over different data repositories within a Hadoop cluster, depending on a user’s role or clearance. As hackers increasingly target individuals and roles inside organizations with privileged access to sensitive data, defining the scope of that privileged access becomes imperative. For instance, a server administrator needs to manage the system but does not need to see the actual data residing on that system. Or a stockbroker might be authorized to access their clients’ data but not the data of another broker’s clients.
To achieve the level of need-to-know granularity necessary to minimize risk of unintended access, data access policies require real-time data access control and masking on a row, column, and cell level for each data query. This control is particularly important for any organization or department that allows different individuals to view the same data sets, including regulated global enterprises or multi-tenant environments. In the case of Hadoop adoption, rather than deploying extreme measures such as permitting full data access or crudely blocking access entirely, enterprises can deploy adequate data access policies.
Data access policy engines serve as a bridge between the data repository and the business potential that is gleaned from the data. An effective data access policy will empower users to access and benefit from the data that they are entitled to, based on their function in the enterprise, while maintaining enterprise-level security that reduces the risk of being breached.
When reviewing data access policies, organizations should consider the following questions:
- How does the policy engine safeguard data? Does it integrate and protect data across a range of data sources and programming languages, including both Hadoop and legacy databases, regardless of the application used to access the data store?
- To what degree can data access be personalized and customized? What are the means of managing access? Can it be customized based on user role and purpose?
- How can the rules of data access be controlled? Does the policy engine offer real-time enforcement? Can sensitive data be masked on the fly?
- Does the policy engine offer a means of audit? Full visibility into both successful and denied data access attempts can allow suspicious patterns to be spotted quickly, before data leakage occurs.
- Can it work across any application? A hallmark of big data analytics is the flexibility of using different tools and applications to avoid single-application lock-in and risk of compromised data.
Got Data? Security Tactics to Consider
The reality of today’s enterprise big data deployments is that many organizations already have initiated pilot projects, if they are not yet in production. They are turning to cloud environments to deploy their big data projects and utilizing hybrid solutions including SQL databases along with the Hadoop infrastructure. With such a patchwork infrastructure, the ability to define and administer consistent policies across multiple platforms and managing user access is a key consideration.
Cross-platform policy capabilities will be key to meeting enterprises’ need for securing sensitive big data, as will the following technical tactics:
- Fine-grained user access control. Using roles and user-specific or general attributes, such as location, is a must for any organization that wants to deploy data security best practices such as limiting access to enterprise data based on a least-privilege model or on a need-to-know basis.
- Support for row, column, and cell-level data access policies. The ability to apply policies down to the most granular element of data provides organizations with the greatest degree of control and flexibility over who has access to which specific data.
- Granular and dynamic data masking. Early adopters of Hadoop have gone as far as duplicating datasets to enable masking capabilities. Advanced security approaches for big data environments should deliver masking policies that support the same level of granularity as user and data access control policies without the need for data duplication.
- Ability to use data without revealing it. This capability enables enterprises to use sensitive data in the query logic of their applications without actually revealing the data itself to end users.
- Data access auditing. The ability to log and track which user has access to which data will simplify the process of auditing for various compliance requirements.
- Vendor neutrality. A data access solution should perform consistently across different Hadoop vendors (Cloudera, Hortonworks, MapR) and SQL data stores with support for cloud deployments.
To harness the full value of a big data deployment such as Hadoop, enterprises need to deliver to users the exact data they need to make critical decisions. And to prevent unauthorized use that could compromise the organization’s reputation, not a drop more.
User access policies that adhere to these technical tactics are fast becoming integral to big data deployments as Hadoop is deployed more broadly, and organizations demand security solutions that are on par with their company security requirements for information management. Companies need a comprehensive data entitlement system to protect sensitive data and parse out all the necessary rules that keep a company safe. So if you are trying to keep up with the new security landscape, it’s time you look at your own user.
In today’s environment, providing the right system of data entitlements using very fine-grained data access controls is one of the most important steps you can take to keep your big data safe.
Eric Tilenius is CEO of BlueTalon, a leading provider of data access control solutions for Hadoop, SQL, and big data environments. The BlueTalon Policy Engine provides authorization, fine-grained access control, data masking, and stealth analytics.
Subscribe to Data Informed for the latest information and news on big data and analytics for the enterprise.