Data Loss Prevention (DLP): Keeping Sensitive Data Safe from Leaks

Share it on Twitter  
Share it on Facebook  
Share it on Linked in  

Between the need to protect corporate data and regulations requiring that consumer data be protected, organizations are under more pressure than ever to keep their data safe. Data loss prevention (DLP) technology can help.

Jump to:

EU General Data Protection Regulation (GDPR) have upped the stakes. GDPR assesses hefty fines – up to 4 percent of global revenues – for failing to adequately protect consumer information, especially medical and financial data.

"I can see the difference from before GDPR and after GDPR," said Angel Serrano, senior manager of advanced risk and compliance analytics at PwC UK in London. "Even if I have a tiny office somewhere, I need to check for confidential data." And automating this scrutiny is the only way to effectively manage it, he says.

DLP tools were one of the top IT security spending priorities in eSecurity Planet's 2019 State of IT Security survey – and also one of the technologies users have the most confidence in.

See user reviews of the top data loss prevention solutions.

What is DLP?

Also called "data leak prevention" and "data loss protection," DLP is designed to prevent unauthorized users from sending sensitive or unauthorized data outside the corporate network.

While years ago there was talk of companies plugging workstation USB ports with glue to prevent insiders from taking company data, modern DLP products are much more sophisticated as they tackle the more complex and evolving challenge of keeping information safe and secure.

Those challenges involve categorizing and labeling intellectual property files and other sensitive business assets for degree of confidentiality required, then using business rules to enable an administrator to control what information users can transfer and how.

However, there's no real consensus in the market about the terminology or the required capabilities, according to GTB Technologies co-founder Uzi Yair.

There are systems that do merely after-the-fact "data loss notification," Yair said, as well as "data loss recovery" systems, which get into different territory, and add to the confusion.

"DLP is not one thing, like a tomato," Yair said, referring to GTB's enterprise suite of products. In addition to more traditional practices such as scanning endpoints, network and storage as well as policy management and workflow tools, it includes an information rights management (IRM) policy server that applies file-level control over who has access to what, where – it might be solely on-premises – and when.

Causes of data loss

Data can be lost from workers inadvertently sharing it, malicious insiders taking it, or from the missteps or insecure practices of business partners.

A Google-Uber lawsuit illustrates one of the classic cases of data leakage: Google's self-driving car subsidiary alleges that a former employee downloaded 14,000 files containing Google intellectual property before leaving to start his own self-driving car company, Otto, a startup that Uber later acquired.

The health privacy law HIPAA has made covered entities such as doctors and hospitals responsible for ensuring the privacy practices of business associates, such as contractors and subcontractors, with fines in the millions for lost laptops, misconfigured servers and contractor security lapses that expose patient data. Developers of consumer health technologies such as Fitbit and Apple's HealthKit must take notice, too.

Yet Serrano said he's seen company files with sensitive consumer data from 20 years ago posted on file-sharing services. He's working on security practices for unstructured data such as spreadsheets, PowerPoint and Word documents, information that's easy to expose merely by sending an email attachment to the wrong person.

Cost of data leaks

There's significant room for improvement in business DLP practices, according to Ponemon Institute research sponsored by Intel Security. It puts the average data breach cost per record in a range of $80 for government to more than $350 in healthcare. 

Among the highlights of the report:

  • Major organizations around the world deal with an average of 20 data loss incidents every day.
  • Even though 83 percent of organizations report a fully deployed solution that meets most or all of their requirements, 33 percent report that they are still suffering from significant data loss, and many others may be without knowing it.
  • DLP solutions have multiple methods for detecting incidents, including regular expressions, dictionary-based rules, and unstructured data. Yet only 40 percent of respondents said they use only one of these methods. And 5 percent said they did not know how the technology works.
  • Many companies only use DLP for email or similar business applications, rather than covering the range of ways data can be leaked.
  • The report also found that an increase in breaches topped compliance concerns as the primary driver for growing interest in DLP.

Meanwhile, 49 percent of organizations participating in a recent Haystax survey of 508 members of the Information Security Community on LinkedIn reported they have no idea whether they have experienced an insider attack in the past 12 months.

Seventy-four percent said they feel vulnerable to insider threats, a 7 percent increase over the previous year's survey. Fifty-six percent attribute that uncertainty to more frequent insider attacks, while 54 percent point to the increasing number of devices with access to sensitive data.

DLP products and vendors

Gartner said DLP products take two forms:

  • Enterprise DLP products packaged in agent software for desktops and servers, physical and virtual appliances for monitoring networks and agents, or soft appliances for data discovery. Symantec, Intel Security (MacAfee), Digital Guardian and Forcepoint (Websense) are market leaders in this category.
  • Integrated DLP products that may offer more limited functionality, such as a focus solely on endpoint security, that are integrated with other security products or portfolio of products.

There are also an array of startups that say the cloud has introduced new challenges to leak prevention that legacy systems are not prepared to handle. Those challenges include growing use of personal and enterprise cloud services; the array of BYOD endpoint devices and operating systems; and people working on free Wi-Fi at Starbucks and airports.

Enterprises on average use 1,031 cloud services, according research from Netskope, a cloud access security broker (CASB), and half of all workers using a sanctioned cloud storage service also have a personal account on the same service. Meanwhile, 66 percent of cloud services are not secure enough for GDPR.

"All these web applications like Google Drive and Office 365 are integrating with other satellite applications," said Krishna Narayanaswamy, founder and chief scientist at Netskope. "Salesforce uses Google Drive as a place to store files. DocuSign can put documents in Google Drive. You need to be at all the points where data is going into these applications. You need to be able to inspect that data at rest and determine who uploaded that data. Also inspect and apply policies to outgoing email."

Some companies still are not addressing all the channels.

"The new generation considers email a dinosaur. They go to social media – Twitter, LinkedIn, Facebook – you have to cover those as well. More and more communication is coming via SSL, and that's a big blank spot that many DLP vendors have not considered," Narayanaswamy said.

The web operates on an API (application programming interface) model. Legacy systems look at all the post transactions – the calls going out – then scan content for sensitive data such as credit card numbers.

"When you look at the web, there are many reasons for sending data from inside to the outside," Narayanaswamy said. "Modern applications constantly post information about how users are using the application, response times, and so forth, to improve user experience. When you look at every post transaction, there's a potential for many false positives," which have been the bane of DLP.

Among the newer DLP capabilities:

  • Data classification: This technology helps companies tag their assets by level of sensitivity and how they're used. Varonis, for instance, uses a metadata platform that outlines users and groups, permissions for access, and activity using the data. It also offers cross-platform data classification as well. Classification has to involve more than just scanning for file extensions, which can easily be changed or broken, according to Narayanaswamy. Newer systems also scan for a magic header, which tells the operating system how you want to treat the file.
  • Digital fingerprinting: This technology uses algorithms to tag sensitive assets, link them with applicable DLP policies, then monitor how they flow within the business, such as through emails, printers, TCP and FTP traffic, or web uploads.
  • OCR (Optical Character Recognition): The Russian company ABBYY teamed up with Symantec to provide scanning of images in documents and emails. Google, which previously offered OCR for gmail and Drive, recently announced an expanded API in beta for Google Cloud Platform. The feature performs deep content analysis to find matches against a list of more than 40 sensitive data types. GTB's Yair demonstrated how its technology could block an email containing an upside-down image and credit-card numbers, or text in various languages.
  • Proximity analysis: A scanned document might not be suspicious at all if, say, a name, credit card number and date are in different locations, Narayanaswamy explained. To reduce false positives, proximity analysis looks for critical data points in close proximity, for example, to suggest the date really is an expiration date, which used with the other elements, could be used for fraud.
  • Behavior analysis: Amazon Web Services acquired harvest.ai, a San Diego-based startup focused on using artificial intelligence-based algorithms to apply analysis of user behavior across multiple systems to DLP. And in a blog post, the behavioral analytics company Prelert, acquired by Elastic, explained how it can be used to detect data exfiltration.

Education and user behavior

Preparing for GDPR and controlling data leakage also involves cleaning up data and educating employees.

According to Serrano, you should identify the contents of your assets, identify information that hasn't been opened in two months, then put that data in quarantine. Any user who wants to get it back must submit a business case.

"We recommend you scan, prioritize, create rules to either use it or get rid of it," he said.

However, implementing DLP isn't a set it-and-forget-it undertaking. ISACA says that it's only the first step in creating a culture that can effectively handle sensitive information. 

Without the cultural component, DLP systems become an expensive toy, according to Yair.

"These are breaches," he said of the alerts produced. "If you've got somebody monitoring this all day, you've got a problem. I'd say you need employee training." 

ISACA advocates starting with an overall risk assessment involving a cross-departmental team, creating meaningful policies and procedures, and implementing effective event review and oversight.