Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security

02:05 PM

Researchers Scan for Supply-Side Threats in Open Source

A recent project to scan the main Python repository's 268,000 packages found only a few potentially malicious programs, but work earlier this year uncovered hundreds of instances of malware.

Open source repositories form the backbone of modern software development — nearly every software project includes at least one component — but security experts increasingly worry that attackers are focused on infecting systems by inserting malicious code into popular repositories.

A number of projects have kicked off this year to search for such Trojan horses. Last week, Stripe engineer Jordan Wright published the results of a home-brew research project that downloaded every Python component from the Python Package Index (PyPI) and looked for system calls that could indicate malicious intent. Overall, he found hundreds of packages that created network connections — most by including a common dependency — and a few packages that seemed risky. These included two that appeared to be test cases — one named "i-am-malicious" and another named "maliciouspackage" — and a third that used obfuscation to hide commands.

Related Content:

Attackers Aim at Software Supply Chain with Package Typosquatting

The Changing Face of Threat Intelligence

New on The Edge: An Inside Look at an Account Takeover

However, none of the scanned packages seemed outright malicious, Wright said in his analysis.

"Looking through the data, I didn't find any packages doing significantly harmful activity that didn't also have 'malicious' somewhere in the name, which was good," he said. "But it's always possible I missed something, or that [attackers installing malicious code] would happen in the future."

In fact, such attacks have already happened. Two years ago, for example, an attacker compromised a developer's account and published malicious versions of two components of the popular Javascript package ESLint to the Node Package Manager (NPM) service. While the package has millions of weekly downloads, the project group received a warning and unpublished the packages within two hours, limiting the impact.

The attack often takes another form: typosquatting, where attackers create Trojan horses that have names similar to common packages. In April, an attacker seeded the Ruby package repository, RubyGems, with more than 760 malicious packages with names similar to legitimate packages. Such attacks attempt to take advantage of mistyped install commands — relatively rare, perhaps, but devastating if they produce a compromise.

Last year, the Python core development team asked the community for ways of finding malicious code inserted into the modules and packages used by Python. For open source projects, these issues are particularly challenging, said Mike Myers, principal security engineer at Trail of Bits, a software security consultancy, in an answering comment.

"[T]he Google and Apple app stores have both invested heavily in runtime analysis sandboxes and static analysis approaches for detecting malice in their app stores," he said. "The difference there being, they can run their detections in secret, and adversaries can't develop an evasion in advance without disclosing it in a submission."

A team of researchers from the Georgia Institute of Technology carried out a similar analysis for three major repositories: Python's PyPI, the Node Package Manager (NPM), and RubyGems. Their system, dubbed MalOSS, combines metadata analysis, static code analysis, and dynamic runtime analysis to determine whether a package is behaving maliciously. The researchers found seven malicious packages in PyPI, 41 in NPM, and 291 in RubyGems, according to their paper published in February 2020.

Inspired by the Georgia Tech work, Wright aimed to look for signs that attackers inserted malicious code into packages by analyzing the system functions called during installation. Using the PyPI API, he downloaded 268,000 packages into a container, installed each, and watched for suspicious changes. The entire process cost about $120 in cloud fees, he said.

Wright plans to expand the effort to continuously monitor PyPI and add repositories for other platforms in the future.

"This found a few instances of potentially malicious behavior that you can find in the post, but the real power will be setting up continuous monitoring moving forward," he stated on Twitter.

Overall, Wright makes the case that each of the major repositories need to implement their own security and continuously monitor for malicious supply chain attacks in the future. Otherwise, installing packages from code in the repositories presents too great a risk, he said.

"I still don't like that it's possible to run arbitrary commands on a user's system just by them pip installing a package," Wright said. "I get that the majority of use cases are benign, but it opens up risk that must be considered. Hopefully by increasingly monitoring various package managers we can identify signs of malicious activity before it has a significant impact."

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio

Recommended Reading:

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Cyberattacks Are Tailored to Employees ... Why Isn't Security Training?
Tim Sadler, CEO and co-founder of Tessian,  6/17/2021
7 Powerful Cybersecurity Skills the Energy Sector Needs Most
Pam Baker, Contributing Writer,  6/22/2021
Microsoft Disrupts Large-Scale BEC Campaign Across Web Services
Kelly Sheridan, Staff Editor, Dark Reading,  6/15/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
The State of Cybersecurity Incident Response
In this report learn how enterprises are building their incident response teams and processes, how they research potential compromises, how they respond to new breaches, and what tools and processes they use to remediate problems and improve their cyber defenses for the future.
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-06-22
Trusty TLK contains a vulnerability in the NVIDIA TLK kernel function where a lack of checks allows the exploitation of an integer overflow on the size parameter of the tz_map_shared_mem function.
PUBLISHED: 2021-06-22
Trusty TLK contains a vulnerability in the NVIDIA TLK kernel�s tz_handle_trusted_app_smc function where a lack of integer overflow checks on the req_off and param_ofs variables leads to memory corruption of critical kernel structures.
PUBLISHED: 2021-06-22
Trusty TLK contains a vulnerability in the NVIDIA TLK kernel where an integer overflow in the tz_map_shared_mem function can bypass boundary checks, which might lead to denial of service.
PUBLISHED: 2021-06-22
Trusty contains a vulnerability in TSEC TA which deserializes the incoming messages even though the TSEC TA does not expose any command. This vulnerability might allow an attacker to exploit the deserializer to impact code execution, causing information disclosure.
PUBLISHED: 2021-06-22
Trusty contains a vulnerability in all TAs whose deserializer does not reject messages with multiple occurrences of the same parameter. The deserialization of untrusted data might allow an attacker to exploit the deserializer to impact code execution.