Tag: Ethics & AI

Fairness versus Privacy: sensitive data is needed for bias detection

Authors: Sebastiaan Berendsen and Emma Beauxis-Aussalet

AI systems are vulnerable for biases that can lead to unfair and harmful outcomes. Methods to detect such biases in AI systems rely on sensitive data. However, this reliance on sensitive data is problematic due to ethical and legal concerns. Sensitive data is essential for the development and validation of bias detection methods, even when using less privacy-intrusive alternatives. Without real-world sensitive data, research on fairness and bias detection methods only concern abstract and hypothetical cases. To test their applicability in practice, it is crucial to access real-world data that includes sensitive data. Industry practitioners and policymakers are crucial actors in this process. As a society, we need legal and secure ways to use real-world sensitive data for bias detection research.

In this blog, we discuss what bias detection and sensitive data are, and why sensitive data is required. We also outline alternative approaches that would be less privacy-intrusive. We conclude with ways forward that all require collaboration between researchers and industry practitioners.

What is bias detection?

AI fairness is about enabling AI systems that are free of biases. A key approach to analyze AI fairness is bias detection. Bias detection attempts to identify structural differences in the results of an AI system for different groups of people. Most methods to detect bias use sensitive data. Sensitive data describes the characteristics of specific socio-demographic groups¹. These characteristics can be inherent (e.g., gender, ethnicity, age) or acquired (e.g., religion, political orientation), and are often protected by anti-discrimination laws and privacy regulations. If sensitive information is not used in an AI system, its outcomes can still be biased. We therefore need to explore how we can use sensitive data legally and ethically for bias detection.

In practice, sensitive data is often completely unavailable or of poor quality due to privacy, legal, and ethical concerns. The lack of access to high-quality sensitive data hinders the implementation of bias detection methods in practice.

Concerns regarding the use of sensitive data

The use of certain sensitive data for bias detection might be prohibited by the GDPR². However, the EU AI Act provides an exception to the GDPR that allows the use of special category data for bias detection purposes. Such usage of sensitive data is subjected to appropriate safeguards. Yet, the definition of appropriate safeguards remains unclear and the exception is strictly limited to the high-risk models defined by the EU AI Act.

Even if the EU AI Act might address some legal concerns, key ethical concerns remain^3,4. Widespread collection of sensitive data increases the risks of data misuse and abuse, such as citizen surveillance. Furthermore, obtaining accurate, representative sensitive data is a challenge. Inaccurate sensitive data harms the validity of bias detection methods and heightens the risk of misclassifying and misrepresenting individuals and their social groups.

Alternative approaches

Two approaches⁵ seem most promising to enable bias detection w.r.t. sensitive data: the trusted third party approach and the analytical approach. The trusted third party approach consists of letting a neutral party hold sensitive data, and run bias analyses on their premises. Such third parties do not share any sensitive data, but only the results of the bias analysis. These trusted third parties can be governmental organizations, such as national statistics or census bureaus, or non-governmental organizations.

The analytical approach consists of data analysis methods that do not require direct access to sensitive data. For example, such methods can be based on proxy variables, unsupervised learning models, causal fairness methods, or synthetic data generated with privacy-preserving technologies. Some of these methods could still require some sensitive data, but they remain less privacy-intrusive than other methods.

These alternative approaches do not structurally remove the need to use sensitive data. Besides, these approaches are currently understudied, and more research is needed to develop and validate them. This research requires controlled access to sensitive data, until such privacy-preserving bias detection approaches are properly validated, and their strengths and weaknesses are well-defined and measurable.

Ways forward

The lack of access to realistic data from real-world AI systems is a crucial challenge. The literature on AI fairness mostly relies on datasets with limited practical context¹. Therefore, existing bias detection methods are primarily tested “in-the-lab”. Insights into the validity of the bias detection methods in real-world applications are lacking. Yet, such insights are essential to justify the needs for collecting sensitive data to address AI bias in practice. This is required to understand whether the methods to address AI fairness are effective or not in the socio-technical context of AI systems.

Researchers cannot fix this challenge on their own. Collaboration between researchers, (non) governmental organizations, and industry practitioners is essential to address the challenges with fairness methods, and to increase their practicality and validity. A research collaboration is also needed to address the legal and ethical concerns, and specify the necessary safeguards. For example, the GDPR and EU AI Act contains exceptions for sensitive data processing for scientific purposes, when it adheres to recognised ethical standards for scientific research.

Closing

Sensitive data is essential for investigating the technical approaches to ensure AI fairness. However, the availability of accurate sensitive data remains a challenge. Alternative approaches exist to preserve privacy while using sensitive data for bias analysis. Yet these approaches are currently understudied, and more research is needed. For such research to be effective, collaboration is needed between researchers and practitioners from industry or public institutions.

References

1. Caton, S. & Haas, C. Fairness in Machine Learning: A Survey. Preprint at http://arxiv.org/abs/2010.04053 (2020).

2. Van Bekkum, M. & Zuiderveen Borgesius, F. Using sensitive data to prevent discrimination by artificial intelligence: Does the GDPR need a new exception? Computer Law & Security Review 48, 105770 (2023).

3. Andrus, M. & Villeneuve, S. Demographic-Reliant Algorithmic Fairness: Characterizing the Risks of Demographic Data Collection in the Pursuit of Fairness. in 2022 ACM Conference on Fairness, Accountability, and Transparency 1709–1721 (ACM, Seoul Republic of Korea, 2022). doi:10.1145/3531146.3533226.

4. Andrus, M., Spitzer, E., Brown, J. & Xiang, A. What We Can’t Measure, We Can’t Understand: Challenges to Demographic Data Procurement in the Pursuit of Fairness. in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency 249–260 (ACM, Virtual Event Canada, 2021). doi:10.1145/3442188.3445888.

5. Veale, M. & Binns, R. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data & Society 4, 2053951717743530 (2017).

Photography: Emma Beauxis-Aussalet

March 14, 2024
What Latour can teach us about AI and its moral implications

Last week, the renowned French philosopher, sociologist, and anthropologist Bruno Latour passed away at age 75 (1947-2022). Latour is considered to be one of the most influential thinkers of modern-day science. His Actor-Network theory (ANT) and mediation theory are known to provide an alternative perspective to the famous subject-object dichotomy, a dominant paradigm in science originating from Kant.

In view of the current critical ethical issues with AI systems pervading our societies, reviewing Latour’s ANT provides invaluable insights into the human network that can create or mitigate the threats of AI.

Read more about it in the blog Mirthe Dankloff (PhD Candidate) wrote for the Civic AI Lab

October 19, 2022
Civic AI Lab is part of UNESCO’s Global Top 100 AI list of projects!

We are very proud to announce that this January, the Civic AI Lab [1] was awarded to be part of the UNESCO’s TOP 100 International List of Artificial Intelligence (henceforth: AI) solutions for sustainable development for the benefit of humanity [2].

The TOP 100 was created by the International Research Centre on AI (UNESCO) to celebrate the development of AI-based solutions around the world related to the 17 United Nations Sustainable Development Goals [2]. The Civic AI Lab was granted in the category ‘early-stage project’ as the reviewers see great potential in the research lab which just had its one-year anniversary.

The Civic AI Lab is a research collaboration between the City of Amsterdam, The Dutch Ministry of Interior Affairs, the Vrije Universiteit Amsterdam (VU), and the University of Amsterdam (UvA). The Lab’s mission is to support an engaging society where all citizens have equal opportunity to participate and benefit from AI in a fair and transparent manner.

In this capacity, the Civic AI Lab focuses on the application of AI while respecting human rights such as privacy, non-discrimination, and equal opportunity in five domain-specific projects: education, health, welfare, mobility, and environment as well as, two domain-overarching projects: on the intersection of AI and Law, and on the intersection of AI and Public Governance.

Three of our researchers at UCDS are currently affiliated with the Civic AI Lab. Being part of UNESCO’s TOP 100 alongside projects from all continents is a true acknowledgment of the Lab’s work.

Keep up the good work Civic AI Lab!

[1] https://www.civic-ai.nl/

[2] https://ircai.org/top100/entry/the-civic-ai-lab/

[3] https://ircai.org/

February 8, 2022
How to account for automated decision systems in the public domain?

Automated decision-making affects governmental decision-making processes in terms of accountability, explainability, and democratic power. For instance, deciding on acceptable error rates reflects important value judgments that can have far-reaching ethical impacts for citizens.

Error analysis is an important determinant for the design and deployment choices in algorithms. Public authorities, therefore, need to balance the risks and benefits to protect their citizens by making error analysis transparent and understandable.

Read more about it in the blog Mirthe Dankloff (PhD Candidate) wrote for the Civic AI Lab (02-09-2021): https://www.civic-ai.nl/post/how-to-account-for-automated-decision-systems-in-the-public-domain

December 6, 2021