A survey on datasets for fairness-aware machine learning

Research output: Contribution to journalReview articleResearchpeer review

Authors

  • Tai Le Quy
  • Arjun Roy
  • Vasileios Iosifidis
  • Wenbin Zhang
  • Eirini Ntoutsi

Research Organisations

External Research Organisations

  • Freie Universität Berlin (FU Berlin)
  • Carnegie Mellon University
View graph of relations

Details

Original languageEnglish
Article numbere1452
Number of pages59
JournalWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Volume12
Issue number3
Early online date3 Mar 2022
Publication statusPublished - 13 May 2022

Abstract

As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing.

Keywords

    benchmark datasets, bias, datasets for fairness, discrimination, fairness-aware machine learning

ASJC Scopus subject areas

Cite this

A survey on datasets for fairness-aware machine learning. / Le Quy, Tai; Roy, Arjun; Iosifidis, Vasileios et al.
In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Vol. 12, No. 3, e1452, 13.05.2022.

Research output: Contribution to journalReview articleResearchpeer review

Le Quy, T, Roy, A, Iosifidis, V, Zhang, W & Ntoutsi, E 2022, 'A survey on datasets for fairness-aware machine learning', Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 12, no. 3, e1452. https://doi.org/10.1002/widm.1452
Le Quy, T., Roy, A., Iosifidis, V., Zhang, W., & Ntoutsi, E. (2022). A survey on datasets for fairness-aware machine learning. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(3), Article e1452. https://doi.org/10.1002/widm.1452
Le Quy T, Roy A, Iosifidis V, Zhang W, Ntoutsi E. A survey on datasets for fairness-aware machine learning. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2022 May 13;12(3):e1452. Epub 2022 Mar 3. doi: 10.1002/widm.1452
Le Quy, Tai ; Roy, Arjun ; Iosifidis, Vasileios et al. / A survey on datasets for fairness-aware machine learning. In: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2022 ; Vol. 12, No. 3.
Download
@article{9639c0eb1d75433991bd2c9350a70f42,
title = "A survey on datasets for fairness-aware machine learning",
abstract = "As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing.",
keywords = "benchmark datasets, bias, datasets for fairness, discrimination, fairness-aware machine learning",
author = "{Le Quy}, Tai and Arjun Roy and Vasileios Iosifidis and Wenbin Zhang and Eirini Ntoutsi",
note = "Funding Information: The work of the first author is supported by the Ministry of Science and Culture of Lower Saxony, Germany, within the PhD program “LernMINT: Data-assisted teaching in the MINT subjects.” The work of the second author is supported by the Volkswagen Foundation under the call “Artificial Intelligence and the Society of the Future” (the BIAS project). ",
year = "2022",
month = may,
day = "13",
doi = "10.1002/widm.1452",
language = "English",
volume = "12",
number = "3",

}

Download

TY - JOUR

T1 - A survey on datasets for fairness-aware machine learning

AU - Le Quy, Tai

AU - Roy, Arjun

AU - Iosifidis, Vasileios

AU - Zhang, Wenbin

AU - Ntoutsi, Eirini

N1 - Funding Information: The work of the first author is supported by the Ministry of Science and Culture of Lower Saxony, Germany, within the PhD program “LernMINT: Data-assisted teaching in the MINT subjects.” The work of the second author is supported by the Volkswagen Foundation under the call “Artificial Intelligence and the Society of the Future” (the BIAS project).

PY - 2022/5/13

Y1 - 2022/5/13

N2 - As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing.

AB - As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing.

KW - benchmark datasets

KW - bias

KW - datasets for fairness

KW - discrimination

KW - fairness-aware machine learning

UR - http://www.scopus.com/inward/record.url?scp=85125031530&partnerID=8YFLogxK

U2 - 10.1002/widm.1452

DO - 10.1002/widm.1452

M3 - Review article

AN - SCOPUS:85125031530

VL - 12

JO - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

JF - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

SN - 1942-4787

IS - 3

M1 - e1452

ER -