Dark Reading is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Physical Security

01:00 PM
Connect Directly
E-Mail vvv

Data Bias in Machine Learning: Implications for Social Justice

Take historically biased data, then add AI and ML to compound and exacerbate the problem.

Machine learning and artificial intelligence have taken organizations to new heights of innovation, growth, and profits thanks to their ability to analyze data efficiently and with extreme accuracy. However, the inherent nature of some algorithms such as black-box models have been proven, at times, to be unfair and lack transparency, leading to multiplicated bias and detrimental impact on minorities.

There are several key issues presented by black-box models, and they all work together to further bias data. The most prominent are models fed with data that is historically biased to begin with, and fed by humans who are biased by nature. In addition, because data analysts can only see the inputs and outputs but not the internal workings of how results are determined, machine learning is constantly aggregating this data, including personal data. But this process lacks transparency on how the data is being used and why. The lack of transparency means that data analysts have no clear view of inputs and outputs, and algorithms are making analyses and predictions about our work performance, economic situation, health, preferences, and more without providing insights into how it came up with its conclusion.

Related Content:

Are Unconscious Biases Weakening Your Security Posture?

Special Report: How IT Security Organizations Are Attacking the Cybersecurity Problem

New From The Edge: How to Protect Vulnerable Seniors From Cybercrime

In the infosec realm, this is important as more security platforms and services increasingly rely on ML and AI for automation and superior performance. But if the underlying software and algorithms for these same products and services reflect biases, they'll simply perpetuate the prejudices and errant conclusions associated with race, gender, religion, physical abilities, appearance and other characteristics. This has implications for both information and physical security, as well as for personal privacy.

One of the most prominent examples of bias presented by these key issues emerges in the justice system and risk scores. In law enforcement, risk scores are used to predict the likeliness or risk of there being a crime committed by a group of people, a person, or in a certain location. When police departments ask "What locations have higher crime rates?" in order to inundate law enforcement in crime-prone areas, they look at geolocation's risk scores. But dispatching more police officers to a certain location equates to more arrests, and the more reported arrests of any kind in that area equates to more officers being sent to the location by the risk score. It's a never-ending cycle.

A study of risk scores conducted by ProPublica found that Black defendants were 77% more likely to be pegged as "higher risk of committing a future violent crime" and 45% were "more likely to be predicted to commit a future crime of any kind." They also found that the risk score formula was "particularly likely to falsely flag Black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants" (emphasis added). 

Recently, Boston Celtics players published an opinion piece in The Boston Globe calling out the various bias implications of facial recognition technology in minority communities. Facial recognition technology, which also uses black-box models, has had a history of misidentifying Black people and people of color. A test run by the ACLU, comparing congressional headshots to mugshots, showed that 40% of those who were misidentified were people of color. Just last year, Robert Julian-Borchak Williams was misidentified by the Detroit Police Department via facial recognition technology for shoplifting.

In healthcare, black-box models are typically used to help professionals make better recommendations on care and treatments based on the patients' demographic, such as age, gender, and income. This is great, until we realize that some data are likely to favor just one treatment, but one generic treatment will not work for everyone. For example, if my colleague and I had the same diagnosis and were recommended the same treatment, the treatment could work on one of us and not the other because of our genetic makeup, which is not accounted for in the algorithm. 

In the end, data in itself is neither good nor bad. But, without transparency of how black-box models project results, it presents skewed information that becomes difficult to reevaluate or fix without insight on the actual algorithm being used. As data professionals, we are responsible for ensuring that the information we are gathering and the results being projected are fair to the best of our knowledge, and most importantly, does no harm, especially to vulnerable and underprivileged communities. It's time we go back to the basics — relying on interpretable models such as regressions and decision trees and understanding the "why" of certain data points before analyzing or extracting the data. Even if it means, at times, sacrificing accuracy for fairness.

Christelle Kamaliza, Market Research Specialist, IAPP Christelle Kamaliza is a Market Research Specialist at the International Association of Privacy Professionals (IAPP). She is in charge of the market and customer insights and supports the IAPP Research team on data ... View Full Bio

Recommended Reading:

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Visit the Web's Most Authoritative Resource on Physical Security

To get the latest news and analysis on threats, vulnerabilities, and best practices for enterprise physical security, please visit IFSEC Global. IFSEC Global offers expert insight on critical issues and challenges in physical security, and hosts one of the world's most widely-attended conferences for physical security professionals.

Cyberattacks Are Tailored to Employees ... Why Isn't Security Training?
Tim Sadler, CEO and co-founder of Tessian,  6/17/2021
7 Powerful Cybersecurity Skills the Energy Sector Needs Most
Pam Baker, Contributing Writer,  6/22/2021
Microsoft Disrupts Large-Scale BEC Campaign Across Web Services
Kelly Sheridan, Staff Editor, Dark Reading,  6/15/2021
Register for Dark Reading Newsletters
White Papers
Current Issue
The State of Cybersecurity Incident Response
In this report learn how enterprises are building their incident response teams and processes, how they research potential compromises, how they respond to new breaches, and what tools and processes they use to remediate problems and improve their cyber defenses for the future.
Flash Poll
Twitter Feed
Dark Reading - Bug Report
Bug Report
Enterprise Vulnerabilities
From DHS/US-CERT's National Vulnerability Database
PUBLISHED: 2021-06-23
Vulnerability in OpenGrok (component: Web App). Versions that are affected are 1.6.7 and prior. Easily exploitable vulnerability allows low privileged attacker with network access via HTTPS to compromise OpenGrok. Successful attacks of this vulnerability can result in takeover of OpenGrok. CVSS 3.1 ...
PUBLISHED: 2021-06-23
A vulnerability in SonicOS where the HTTP server response leaks partial memory by sending a crafted HTTP request, this can potentially lead to an internal sensitive data disclosure vulnerability.
PUBLISHED: 2021-06-23
A command execution vulnerability exists in the default legacy spellchecker plugin in Moodle 3.10. A specially crafted series of HTTP requests can lead to command execution. An attacker must have administrator privileges to exploit this vulnerabilities.
PUBLISHED: 2021-06-23
Heap based buffer overflow in tsMuxer 2.6.16 allows attackers to cause a Denial of Service (DoS) by running the application with a crafted file.
PUBLISHED: 2021-06-23
Heap based buffer overflow in tsMuxer 2.6.16 allows attackers to cause a Denial of Service (DoS) by running the application with a crafted file.