Aqua Blog

In-depth Analysis of the PyTorch Dependency Confusion Administered Malware

January 4, 2023

Recently, a dependency of the widely used PyTorch-nightly Python package was targeted in a dependency confusion attack, resulting in thousands of individuals downloading a malicious binary that exfiltrated data through DNS. The individual responsible for this attack claimed to be a security researcher whose research had gone awry. In this blog, we will provide an explanation of this attack and how to safeguard against similar supply chain attacks.

What is PyTorch and Torchtriton?

PyTorch is an open-source machine learning library for Python used for applications such as natural language processing. It has gained popularity in the deep learning community for its simplicity and ease of use and has become a popular choice for developing and training deep learning models. PyTorch Triton is a library that allows you to use models trained in PyTorch in TensorRT, an inference accelerator developed by NVIDIA.

What happened?

The PyTorch team issued a warning that the PyTorch-nightly dependency chain was compromised between December 25th and December 30th, 2022. Torchtriton, a dependency of PyTorch-nightly (the Linux packages installed via pip), is usually obtained from PyTorch’s official repository. However, on December 25th, a security researcher uploaded a malicious package with the same name (and a higher version) to the Python Package Index (PyPI) code repository, resulting in a dependency confusion.

In a PyPI dependency confusion attack, if there are two packages with the same name in both a private repository and the public PyPI, Python will automatically download the package with the higher version. If the name of a private package is available on PyPI, an attacker can exploit this by uploading a malicious package with a higher version and launching a supply chain attack.

An analysis of the attack

At first glance the Torchtriton package looks completely innocent.

Torchtriton package structure

We conducted a comparison of the original legitimate package and the malicious one. The two packages are almost 100% identical. We only saw two differences. Under the runtime folder, the attacker inserted a malicious binary called triton (MD5: 908596ffe11c30d1669431f3f4cb54f2). In addition, the __init__.py file was slightly modified to run the binary. In the screenshot below, you can see the 2 differences:

Packages content comparison

The attacker inserted to the __init__.py file the lines 4-13 which were designed to run the binary.

the ‘__init__.py’ file content under the runtime folder

We analyzed the malware using both static and dynamic techniques. Our analysis found that the malware is designed to retrieve sensitive data from the machine, encrypt and encode it, and then exfiltrate it through DNS. Upon closer examination of the code, the types of data being exfiltrated and the method used to send it seemed strange for a security researcher.

The binary static analysis

The main function of the malware is to gather sensitive information, such as the passwd and hosts files, and to collect various other data including information about the current user, SSH data, and environment variables.

The dynamic analysis showed that the logged in users’ details were collected, and our network analysis shows that this data was exfiltrated via DNS. The data is sent to the domain h4ck[.]cfd, using the DNS server wheezy[.]io.

A decryption of some of the exfiltrated data taken from our traffic analysis

In our network analysis, we slightly modified N1ght-W0lf‘s function to decrypt the traffic. As shown in the screenshot, we were able to decrypt some of the traffic containing information that was taken from our machine. The malware had recorded and sent this information, including our hostname, current working directory, and network details, outside of our system.

Mitigation recommendations

Specific immediate steps

The PyTorch team announced that users of the stable PyTorch packages are not impacted by this issue. However, if you installed the PyTorch-nightly package on Linux through pip between December 25th and December 30th, 2022, it is highly recommended that you uninstall it, along with the torchtriton package, and use the latest nightly binaries that were released after December 30th, 2022.

$ pip3 uninstall -y torch torchvision torchaudio torchtriton

$ pip3 cache purge

General steps

To prevent future supply chain attacks, it is recommended to take protective measures for your environments. One way to do this is to use tools that scan your code and artifacts for dependency risks such as creating a software bill of material (SBOM) to identify the open source software and its dependencies being used. You can also scan for known malicious packages as a form of defense. An additional layer of protection is to apply security rules that evaluate the code and can generate a risk score to the package and its dependencies. For example, the execution of a binary file in __init__.py is unusual and may be cause for suspicion. Aqua’s platform offers OSS dependency scanning which provides such solutions to help developers to mitigate such risks.

Another way to increase detection and protection is to run the package in a sandbox or use runtime detection and response solutions to identify malicious processes and network activity. The Aqua Platform offers runtime protection against known and unknown threats and detects such malicious behavior.

Mapping the attacks to the MITRE ATT&CK framework

Here we map the components in the attacks described above to the corresponding techniques of the MITRE ATT&CK framework:

Map of the components in the attacks to the corresponding techniques of the MITRE ATT&CK framework

Published under: SECURITY RESEARCH

Tags: Security Threats, Software Supply Chain Security

Aqua Nautilus

Aqua research team Nautilus focuses on cybersecurity research of the cloud native stack. Its mission is to uncover new vulnerabilities, threats and attacks that target containers, Kubernetes, serverless, and public cloud infrastructure — enabling new methods and tools to address them.