Stephen Dodson, CTO of Prelert, was a data scientist before data science was cool.
An engineer with a Ph.D. in computational methods, he wrote an algorithm that helped a large financial institution uncover IT operational issues that created performance and liability problems, explained Mike Paquette, Prelert's VP of Products. "He found problems within a few days that they were struggling to find for months," Paquette said. Dodson also created an algorithm that helped the city of London identify issues that contributed to traffic accidents.
Dodson realized he had the makings of a commercial product but knew he needed to identify repeatable use cases. He brought in Mark Jaffe, a 20-year veteran of the software industry who had worked for companies like Siebel Systems, Securify and McAfee. The two created a product called Anomaly Detective that uses machine learning algorithms to help IT teams identify operational issues in real time, thus reducing the time and effort devoted to troubleshooting.
It did not take long for some of Prelert's customers to ask if the company could apply its algorithms to log data to expose possible security issues, Paquette said.
"That is where the most pain is right now. You do not go a day without reading about another big data breach. Prelert applies its technology to security log data, system log data and application log data to find activities that are otherwise really, really hard to find," he said, noting that many of the company's customers use both its IT operations and enterprise security products.
DNS Tunneling Example
One example of hard-to-find activity, Paquette said, is DNS tunneling, a method that involves DNS protocol to hide communications. DNS tunneling has been implicated in data breaches such as last year's massive Home Depot breach that exposed 56 million credit cards and cost the company tens of millions of dollars.
"There are so many DNS requests in most organizations' logs that to be able to look through and find behaviors associated with tunneling is harder than you'd think," he said. "We have a pre-configured packaged use case that finds DNS tunneling. We get access to a giant store of DNS logs, then use special extractions, calculation and anomaly detection to build a model of what represents normal behavior. After that, the kinds of DNS tunneling associated with malware like the kind used in the Home Depot breach stick out like a sore thumb."
Anomaly Detective helps users zero in on the most unusual activities in an existing event stream, which cuts down tremendously on false positives and "noise," Paquette said. An intrusion detection system (IDS) generates thousands of alerts per day, for example, and Prelert's product can look at unusual IDS event IDs or other dimensions and rank anomalies by the degree of abnormality so security teams know which ones to prioritize.
"Over the course of a few months or so, the security team can become confident that it only needs to look at those that are prioritized by Prelert as unusual," he said. "It gives them a massive reduction in the pain of false positives and, even more important, it gives them a documented repeatable process for compliance reasons on which events they investigate. So it solves both the practical and compliance sides of the problem."
Prelert customers are mostly large companies with dedicated teams for both IT operations and security, Paquette said. The customer rolls include names such as NASA, DirecTV, AlertLogic and CA Technologies.
"They have some data source that is aggregating and indexing logs from a variety of sources," he said. "Any organization with those characteristics is aware of the pain they are suffering from false positives or from taking too much time and effort to get to the root cause and, on the security side, not finding the things they are looking for. They want an automated way to detect threat activity, and they want a way to prioritize alerts so they know what to address first."
Because anomaly detection is most useful when an organization already has an aggregated index for its data, Prelert has partnered with Splunk, a prominent provider of log management software that has large customer bases in IT operations and security. "It's a natural technology fit for us," Paquette said.
Prelert has packaged its anomaly engine in a native Splunk app so users can get anomaly detection without out their moving data anywhere else, he explained. "You can do a Splunk search, pull data of interest, and that data is piped right into Prelert's anomaly detection engine. What pops out the other end is what is unusual about it."
Prelert also has an API for its Anomaly Detective so customers can deploy the software on platforms other than Splunk, he added.
Dealing with Data Constraints
Prelert's product possesses the two things needed to make data analytics useful, Paquette said: algorithms that can accurately model data and pick out the statistical outliers and, perhaps even more important, the ability to deal with constraints of data or the environment in which it runs.
While the mathematics of Prelert's algorithms came together nicely due to Dodson's initial work, it took several years of trial and error to learn how to deal with data constraints, he noted.
"You have to deal with how much memory is available on a computer to build these models and automatically scale down the complexity of the models based on available memory, for instance," he said. "You must be able to deal with these constraints so you have machine learning analytics that are able to be easily operationalized. Doing that was as challenging as the original data science."
Fast Facts about Prelert
Founders: Stephen Dodson and Mark Jaffe
Product: Anomaly detection for both security and IT operations
HQ: Framingham, Mass., with offices in the UK
Employees: About 35
Customers: Oracle, DirecTV and NASA, among others
Funding: $13.3 million, from investors including Sierra Ventures, Fairhaven Capital Partners and Intel Capital
Ann All is the editor of Enterprise Apps Today and eSecurity Planet. She has covered business and technology for more than a decade, writing about everything from business intelligence to virtualization.