SIEM solutions can be expensive and difficult to manage, so one company built its own – and is pleased with the results.
At last week’s Black Hat USA, NYC-based financial technology firm Two Sigma Investments took the virtual stage to outline why their existing solution didn’t cut it, the work needed to create an in-house security information and event management (SIEM) system, and the project’s implications. Presenting their experience on behalf of Two Sigma was Ethan Christ (VP of Security Identity, Monitoring, and Response) and Bret Rubin (Security Engineer).
The benefits are clear, but the reality is that this is not a universal solution. With the future of your network security in mind, this article looks at how Two Sigma did it.
Deploying the Homegrown SIEM
Receiving Batch Loads
Using the Google Cloud Platform (GCP), Two Sigma employed Google’s enterprise data warehouse solution, BigQuery (BQ), to ingest CSV and JSON files. Periodic syncing ensures batch log files are accessible starting in a GCP bucket before loading them into the BQ. To ensure the data pipeline goes undisrupted, BQ helps maintain metadata tables for job failures, parsing errors, and data sources.
Streaming Log Ingestion
Most logs move to the GCP or internal infrastructure for additional parsing, routing, and geographic load balancing for the appliances mentioned. Two Sigma utilizes two open-source software tools to assist in log management between the organization’s on-premises network, GCP account, and BigQuery space.
Through direct calls to the GCP logging API, the organization can establish third-party integrations. The first is Fluent Bit, a Linux-based log processor and forwarder compatible with Docker and Kubernetes. The second is Fluentd, a data collector that offers intermediate aggregation, parsing, and routing. Together, these capabilities provide for:
Reliable Log Forwarding
The Fluent Bit agent runs on standard system images and routes directly to the organization’s GCP platform. With a large compute infrastructure, automatic routing of logs to GCP minimizes the amount of data the organization aggregates. GCP also offers load balancing capabilities, while Fluent Bit is lightweight, fast, and has low memory overhead.
The Fluentd aggregator offers complex parsing, routing, and enrichment of data and logs. Fluentd also serves as a receiver for records that don’t fall under standard log formats for unique, bespoke systems. With added flexibility and efficacy, parsing and routing logs just became that much easier.
API Integrations for SaaS
In the universe of SaaS products, plenty of solutions don’t offer periodic logging. In lieu, Two Sigma produced internal tools to manage integrations that could fill the gap. With a network-specific approach, the admin can set integrations to streamline workflows for developers and admins while shifting the burden from security analysts.
In the below screenshot from Two Sigma’s presentation, two panes show the administrator’s shell on the Linux home server on the left and their browser open with the GCP Logs Explorer open on the right. The next two sections describe how an event gets processed and delivered as an alert.
Schedule Query Execution
Once the above two legs (batch loads and streaming ingestion) are complete, the organization needs to execute a scheduled query testing datasets. In a four-step system, Two Sigma starts by collecting query definition files and internal metadata that contain query frequency, MITRE ATT&CK TTP, and more from a GitLab repository.
From there, files get read in the GCP bucket, and queries get executed at intervals. Persistent result storage comes about as the output appears in the BigQuery tables and the PubSub topic with metadata for routing and actions. The files from the PubSub topics become outputs for pertinent technical staff via Jira, PagerDuty, Slack, or another integrable destination.
Events get logged by individual systems and devices on the network, and through Fluent Bit, security event data move to the organization’s GCP platform and API. Processing datasets like JSON payloads requires using the cloud logging filter expression to match specific events. Upon a successful match, the event is set to a PubSub topic where it’s read, formatted, and transferred to the pertinent alert destinations. With clear visibility into the problem at hand, administrators can take action.
The next screenshot shows an example of what an alert looks like upon arriving at its destination – in this case on a private Slack channel devoted to SIEM alerts.
Protecting Sensitive Data
As a financial and technology organization, Two Sigma doesn’t lack sensitive or proprietary data and needs to categorize and protect sensitive data. From banking details to a first’s pet name, the range of particulars held in on-premises or the cloud means specific network segments need a tailored security approach. Two Sigma deployed access control lists (ACL) for sensitive datasets and synced these to the internal director service to address this. Sensitive data access requires signoff from multiple parties and users, and automated queries offer audit visibility into such access.
Benefits From Building Your Own SIEM How Effective Is Your SIEM?
When Two Sigma evaluated its third-party SIEM provider, it was evident that the solution was lacking. From daily ingestion caps set at 1TB to delayed notification of pertinent alerts from downstream queries, the organization touched on the areas where a homegrown SIEM could improve logging capabilities:
- Quantity of data the organization ingests daily
- Speedy and prompt alerts are essential to responding effectively
- Reliability of log records, including forensic data and availability
- Flexibility utilizing data parsing, ingestion logic, and bespoke standards
- Retention of memory when needed and when scaling
- Searchability within large volumes of data for fast query responses
- Security includes systems for protecting sensitive data and ACLs
- Alerting streamlined to automate downstream actions and direct notifications
Two Sigma did not consider two areas that may be important to some SIEM users: pre-canned queries and threat models. Because Two Sigma deals with a diversity of data standards, canned queries would be less applicable. As the organization doesn’t run internet-facing web apps, its threat model is limited to internal vulnerabilities.
Results, Lessons, and Implications from Two Sigma
Home-grown SIEM Results and Lessons
From start to finish, Two Sigma had a minimal viable product of its homegrown SIEM in six months – which isn’t far off from the implementation effort a third-party SIEM can require. After another three months of working through complicated use cases and a total of 6,000 lines of code, the company was able to deploy its alternative logging solution. Highlights of the results include:
- Ingestion Capacity: an increase from 1TB to 5TB with no slowdown
- Cost Savings: $3.5 million in upfront licensing, and $600k in annual maintenance
- Query Speed: pertinent alerts in seconds versus minutes
- Ingestion Overhead: offloaded data pipeline management to reduce security overhead
Talking about how the new framework compares relative to log rehydration and ingestion limits, Security Engineer Bret Rubin stated: “We never have to store archive logs from BQ or pay any search for older data. In our previous system, we often rolled off long data sets because we didn’t have storage capacity, and index size affected performance.”
Lessons learned over the project’s lifetime included:
- The critical importance of monitoring and daily tests.
- Streaming costs can add up, but it still beats the TCO of a third-party SIEM.
Beware and take caution. Two Sigma noted they had to learn the first item the hard way.
Since deploying their SIEM, Two Sigma has identified three areas of implications for the organization moving forward. Starting with data, the new SIEM is far more agile than its predecessor, with a greater capacity for ingestion and data feeds. By adding external threat intelligence feeds, internal security appliance logs, and specific network telemetry, administrators only improve their ability to analyze and remediate anomalies and threats.
While Two Sigma had to train staff who would use BQ, the universal nature of the framework makes personnel more equipped to handle nonstandard data. Besides real-time detection and response, BQ’s ability to support complex record types without the added engineering effort is a success in itself.
Why Build A SIEM?
Using Existing Infrastructure to Your Advantage
Already utilizing the Google Cloud Platform (GCP) as their primary cloud server provider, developing the SIEM with GCP tools made the most sense. With Google’s Big Query (BQ) tool, organizations can run searches on-demand paying per query or through dedicated query spots of fixed-rate and storage size. Leveraging their existing relationship, Two Sigma negotiations with GCP drove down data storage costs and reduced project overhead and latency projections.
To Buy or To Build
Before developing their SIEM, Two Sigma had a yearlong license of $1 million, paying 18% on annual maintenance fees for an on-premises product. While moving away from the vendor-supplied SIEM would mean a sizable sunk cost, the benefits of ingesting more data can offset this.
The next consideration is the cost of infrastructure. Whether it’s an on-premises or third-party appliance or cloud-native, the amount of hardware, lifecycle management, and overhead can widely differ. By going with a cloud-native solution, Two Sigma reduced these costs. With a cloud-compatible system, log collection can also be more cost-effective and scalable.
Homegrown SIEMs Aren’t Immutable
Investing to build a homegrown SIEM isn’t the path for every organization. Two Sigma emphasized their decision to create a SIEM aligned with their other strategic decisions.
Like most internally developed frameworks, trial, error, and remediation are critical to ensuring the SIEM evolves as technology does. The budding in-house SIEM at Two Sigma continues to process and validate unique datasets like new SaaS provider logs or ingesting new threat intelligence feeds.
As systems for data and security events adjust to the new predefined controls, admins can prioritize other projects. When alerted to an anomaly, the cloud-native SIEM offers incredible speed for a quick resolution and further configuration based on the newest threats. Of course, the question for any organization requiring SIEM capabilities is how to fine-tune alerts for only the most important. How Two Sigma fine-tunes and configures alert priorities might have to wait until their next Black Hat presentation.