A survey on datasets for fairness-aware machine learning

Publikation: Beitrag in FachzeitschriftÜbersichtsarbeitForschungPeer-Review

Autoren

  • Tai Le Quy
  • Arjun Roy
  • Vasileios Iosifidis
  • Wenbin Zhang
  • Eirini Ntoutsi

Organisationseinheiten

Externe Organisationen

  • Freie Universität Berlin (FU Berlin)
  • Carnegie Mellon University
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Aufsatznummere1452
Seitenumfang59
FachzeitschriftWiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Jahrgang12
Ausgabenummer3
Frühes Online-Datum3 März 2022
PublikationsstatusVeröffentlicht - 13 Mai 2022

Abstract

As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing.

ASJC Scopus Sachgebiete

Zitieren

A survey on datasets for fairness-aware machine learning. / Le Quy, Tai; Roy, Arjun; Iosifidis, Vasileios et al.
in: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Jahrgang 12, Nr. 3, e1452, 13.05.2022.

Publikation: Beitrag in FachzeitschriftÜbersichtsarbeitForschungPeer-Review

Le Quy, T, Roy, A, Iosifidis, V, Zhang, W & Ntoutsi, E 2022, 'A survey on datasets for fairness-aware machine learning', Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, Jg. 12, Nr. 3, e1452. https://doi.org/10.1002/widm.1452
Le Quy, T., Roy, A., Iosifidis, V., Zhang, W., & Ntoutsi, E. (2022). A survey on datasets for fairness-aware machine learning. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 12(3), Artikel e1452. https://doi.org/10.1002/widm.1452
Le Quy T, Roy A, Iosifidis V, Zhang W, Ntoutsi E. A survey on datasets for fairness-aware machine learning. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2022 Mai 13;12(3):e1452. Epub 2022 Mär 3. doi: 10.1002/widm.1452
Le Quy, Tai ; Roy, Arjun ; Iosifidis, Vasileios et al. / A survey on datasets for fairness-aware machine learning. in: Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2022 ; Jahrgang 12, Nr. 3.
Download
@article{9639c0eb1d75433991bd2c9350a70f42,
title = "A survey on datasets for fairness-aware machine learning",
abstract = "As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing.",
keywords = "benchmark datasets, bias, datasets for fairness, discrimination, fairness-aware machine learning",
author = "{Le Quy}, Tai and Arjun Roy and Vasileios Iosifidis and Wenbin Zhang and Eirini Ntoutsi",
note = "Funding Information: The work of the first author is supported by the Ministry of Science and Culture of Lower Saxony, Germany, within the PhD program “LernMINT: Data-assisted teaching in the MINT subjects.” The work of the second author is supported by the Volkswagen Foundation under the call “Artificial Intelligence and the Society of the Future” (the BIAS project). ",
year = "2022",
month = may,
day = "13",
doi = "10.1002/widm.1452",
language = "English",
volume = "12",
number = "3",

}

Download

TY - JOUR

T1 - A survey on datasets for fairness-aware machine learning

AU - Le Quy, Tai

AU - Roy, Arjun

AU - Iosifidis, Vasileios

AU - Zhang, Wenbin

AU - Ntoutsi, Eirini

N1 - Funding Information: The work of the first author is supported by the Ministry of Science and Culture of Lower Saxony, Germany, within the PhD program “LernMINT: Data-assisted teaching in the MINT subjects.” The work of the second author is supported by the Volkswagen Foundation under the call “Artificial Intelligence and the Society of the Future” (the BIAS project).

PY - 2022/5/13

Y1 - 2022/5/13

N2 - As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing.

AB - As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis. This article is categorized under: Commercial, Legal, and Ethical Issues > Fairness in Data Mining Fundamental Concepts of Data and Knowledge > Data Concepts Technologies > Data Preprocessing.

KW - benchmark datasets

KW - bias

KW - datasets for fairness

KW - discrimination

KW - fairness-aware machine learning

UR - http://www.scopus.com/inward/record.url?scp=85125031530&partnerID=8YFLogxK

U2 - 10.1002/widm.1452

DO - 10.1002/widm.1452

M3 - Review article

AN - SCOPUS:85125031530

VL - 12

JO - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

JF - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery

SN - 1942-4787

IS - 3

M1 - e1452

ER -