When it comes to security, what does it take to make Hadoop "enterprise ready?"

It may seem like a strange question because the open source Big Data processing system is clearly an enterprise tool. So why is it not yet enterprise ready?

The simple answer is that Hadoop, as it was first designed, was never intended to be used in the ways that it is used now. It was a straightforward tool for one person or a small group in an organization to run MapReduce jobs on large amounts of data. This data was siloed in the Hadoop cluster, and in that context only basic security was necessary.

But in the last year or so there has been a shift in the way that Hadoop is being used. Far from being a siloed and fairly specialist application, Hadoop has grown in many organizations to be an enterprise data platform. Many users from different business groups may now use Hadoop to access a vast data lake filled with data taken from all parts of the organization.

This type of Hadoop implementation, a multi-tenancy environment where data that never previously co-existed is suddenly "unsiloed" and brought together, presents new security challenges.

This has led Hadoop vendors like Cloudera and Hortonworks to offer "enterprise ready" Hadoop distributions designed to address these new security challenges. "From a security standpoint, we want to be able to put the data in, and then the platform takes care of security," said Balaji Ganesan, senior director of enterprise security strategy at Hortonworks.

Vendors do this by adding open source, or sometimes, proprietary, products at the platform level to address areas where security needs a boost.

Hadoop Security Challenges

What are these problem areas? It turns out that there are four key Hadoop security challenges:

  • Authentication. How can you ensure all users who access the Hadoop system are who they say they are and are allowed to access it?
  • Access control. How can you ensure users who access Hadoop can only access the data they are entitled to access, with the same policies applied consistently however they access the Hadoop system.
  • Auditing. How can you ensure that all users' data access histories are recorded for compliance and other purposes -- such as forensics, if the worst should happen.
  • Data protection. Essentially this comes down to enterprise-grade encryption for data at rest and in motion.

Of these by far the most pressing challenges are authentication/access control followed by encryption, according to Stuart Rogers, a technical architect at SAS professional services. " Out of the box, Hadoop still only applies a Linux-like file system type of authentication and access control: This group can read, this group can write," he explained.

To overcome this weakness, vendors apply open source solutions such as Apache Sentry or the Apache Knox gateway which enhance authentication and rights control. Sentry, for example, makes it straightforward for a company to add column-level security so that access controls can be applied not just to a file but to a part of the file.

The Knox gateway goes further by adding federation/single sign on capabilities and auditing as well as authentication and access control.

Hadoop Security Solutions

Santa Clara-based Centrify has released a new version of its unified identity management suite which works with Cloudera and Hortonworks Hadoop distributions, enabling organizations to use their existing Active Directory Kerberos and LDAP capabilities as an authentication and access control system for Hadoop clusters.

When it comes to auditing, Cloudera's solution is Cloudera Navigator. This records Hadoop activity details including:

  • a timestamp
  • the object that was accessed
  • details of the operation performed on an object
  • the user
  • the IP address of that user
  • the service instance through which the data was accessed

Centrify also offers auditing capabilities, by tracking user activity and associating it with an individual in Active Directory.

Hortonworks provides auditing capabilities through the XA Secure software that it acquired in May 2014.

For data protection, Hadoop's HDFS does offer some data (at rest) encryption capabilities. However, Cloudera's Sam Heywood, a director of product management, said this is not sufficient for enterprises.

"It begs the question of where the key management is. In HDFS, the key management isn't enterprise ready," he said.

Instead Cloudera offers Navigator Encrypt for Hadoop data, and Navigator Key Trustee, a "virtual safe-deposit box" for managing encryption keys, certificates and passwords. Both are integrated into Navigator.

Hortonworks provides data protection by offering a range of open source and partner-provided encryption solutions such as Protegrity's Vaultless Tokenization for Hadoop, Extended HDFS Encryption and Protegrity Enterprise Security Administrator for advanced data protection policy, key management and auditing.

Solutions that address most of the security requirements needed to make Hadoop enterprise ready are available today, although they are still under development and still increasing in sophistication. But if there's one thing to think about carefully when going for a large scale Hadoop implementation, it is where on the network you put it, said SAS's Stuart Rodgers.

"You may get a proof of concept running in isolation and it works fine for three or four developers," he said. "But you have to consider where it is in relation to your enterprise key management and authentication solutions. If you don't, when you roll it out you are going to hit issues because you can't access the supporting security services."

Paul Rubens has been covering enterprise technology for over 20 years. In that time he has written for leading UK and international publications including The Economist, The Times, Financial Times, the BBC, Computing and ServerWatch.