Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Application Security

Researchers Find Bugs Using Single-Codebase Inconsistencies

A Northeastern University research team finds code defects -- and some vulnerabilities -- by detecting when programmers used different code snippets to perform the same functions.

Repeatable, consistent programming is considered a best practice in software development, and it becomes increasingly important as the size of a development team grows. Now, research from Northeastern University shows that detecting inconsistent programming — code snippets that implement the same functions in different ways — can also be used to find bugs and, potentially, vulnerabilities. 

In a paper to be presented at the USENIX Security Conference in August, a team of researchers from the university used machine learning to find bugs by first identifying code snippets that implemented the same functionality and then comparing the code to determine inconsistencies. The project, dubbed "Functionally-similar yet Inconsistent Code Snippets" (FICS), found 22 new and unique bugs by analyzing five open source projects, including QEMU and OpenSSL.

Related Content:

Developers Need More Usable Static Code Scanners to Head Off Security Bugs

Special Report: Assessing Cybersecurity Risk in Today's Enterprises

New From The Edge: Ghost Town Security: What Threats Lurk in Abandoned Offices?

The research is not meant to replace other forms of static analysis but to give developers another weapon in their arsenal to analyze their code and find potential errors, says Mansour Ahmadi, a former post-doc research associate at Northeastern University who now works as a security engineer at Amazon.

Other static analysis approaches have to have previously encountered an issue or be given a rule to detect an issue to recognize the pattern, he says. 

"If there is a bug in the system with no previously found variant, [those approaches] will fail to find the bug," Ahmadi says. "In contrast, if there are correct implementations of the functionally similar code snippets to the buggy counterpart, FICS can detect that."

The research uses machine-learning techniques — not to find matches to know vulnerability patterns, as many other projects do — but to find functionally similar code that is implemented in different, or inconsistent, ways. Such bugs can be easily verified by developers and testers when presented with both implementations, the researchers stated in a prepublication paper.

"[F]rom basic bugs such as absent bounds checking to complex bugs such as use-after-free, as long as the codebase contains non-buggy code snippets that are functionally similar to a buggy code snippet, the buggy one can be detected as an inconsistent implementation of the functionality or logic," the researchers state. "This observation is more obvious in software projects of reasonable sizes, which usually contain many clusters of functionally-similar code snippets, often contributed by different developers."

The FICS system aims to find bugs and not vulnerabilities, but it is not uncommon that the issues found impact security, Ahmadi says. The list of bugs found by the researchers include memory leaks, missing checks of values, and bad typecasting. 

The researchers believe that some of the issues should be considered vulnerabilities, but the developers maintaining the project produced patches for the defects without much consideration for their exploitability.

"We have requested CVE for a couple of the bugs, without providing the exploits that we found. While we were acknowledged by the developers for our findings, the developers did not proceed to assign CVEs to them as they believe the bugs are not exploitable," Ahmadi says. "Overall, this is the drawback of all static analyzers as it is hard to prove if a bug is exploitable without providing a proof-of-concept."

The researchers used two types of unsupervised clustering, in which the machine-learning system organizes data with similar features into groupings. First, the researchers transformed code into functional constructs so that parts of a program's code could be clustered together based on their functionality. After that, the researchers compared code in the same clusters and used machine learning to group them by implementation. A code snippet that accounted for the majority of implementations in a specific functional cluster is considered to be the correct way of coding.

False positives are a problem. The researchers used filtering to reduce the total reported consistencies by a factor of 10, which still left 1,821 identified inconsistencies. Of those, 218 are considered valid cases. The high level of false positives is an issue with all static analyzers, but specifically in the case of FICS, is not a showstopper because verification is fairly simple, says Ahmadi.

"The manual vetting effort is not as heavy as required to validate results from many other static analyzers," he says. "The ease of manual validation of FICS's reports is largely due to the presence of both the consistent and the inconsistent constructs and the highlighted differences."

The technique could be fooled into deciding the wrong code snippet is the correct one if the developer used the incorrect method more often than the correct one. Yet, this error is rare and only occurred in a single instance during the research, when two similar code snippets were incorrect and the single inconsistent code snippet was correct, Ahmadi says.

The research team also included Northeastern University PhD students Reza Mirzazade Farkhani and Ryan Williams, and Long Lu, an associate professor of computer science.

Veteran technology journalist of more than 20 years. Former research engineer. Written for more than two dozen publications, including CNET News.com, Dark Reading, MIT's Technology Review, Popular Science, and Wired News. Five awards for journalism, including Best Deadline ... View Full Bio
 

Recommended Reading:

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Commentary
Ransomware Is Not the Problem
Adam Shostack, Consultant, Entrepreneur, Technologist, Game Designer,  6/9/2021
Edge-DRsplash-11-edge-ask-the-experts
How Can I Test the Security of My Home-Office Employees' Routers?
John Bock, Senior Research Scientist,  6/7/2021
News
New Ransomware Group Claiming Connection to REvil Gang Surfaces
Jai Vijayan, Contributing Writer,  6/10/2021
Register for Dark Reading Newsletters
White Papers
Video
Cartoon Contest
Write a Caption, Win an Amazon Gift Card! Click Here
Latest Comment: Zero Trust doesn't have to break your budget!
Current Issue
The State of Cybersecurity Incident Response
In this report learn how enterprises are building their incident response teams and processes, how they research potential compromises, how they respond to new breaches, and what tools and processes they use to remediate problems and improve their cyber defenses for the future.
Flash Poll
How Enterprises are Developing Secure Applications
How Enterprises are Developing Secure Applications
Recent breaches of third-party apps are driving many organizations to think harder about the security of their off-the-shelf software as they continue to move left in secure software development practices.
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
CVE-2020-36388
PUBLISHED: 2021-06-17
In CiviCRM before 5.21.3 and 5.22.x through 5.24.x before 5.24.3, users may be able to upload and execute a crafted PHAR archive.
CVE-2020-36389
PUBLISHED: 2021-06-17
In CiviCRM before 5.28.1 and CiviCRM ESR before 5.27.5 ESR, the CKEditor configuration form allows CSRF.
CVE-2021-32575
PUBLISHED: 2021-06-17
HashiCorp Nomad and Nomad Enterprise up to version 1.0.4 bridge networking mode allows ARP spoofing from other bridged tasks on the same node. Fixed in 0.12.12, 1.0.5, and 1.1.0 RC1.
CVE-2021-33557
PUBLISHED: 2021-06-17
An XSS issue was discovered in manage_custom_field_edit_page.php in MantisBT before 2.25.2. Unescaped output of the return parameter allows an attacker to inject code into a hidden input field.
CVE-2021-23396
PUBLISHED: 2021-06-17
All versions of package lutils are vulnerable to Prototype Pollution via the main (merge) function.