A survey on bias in visual datasets

Simone Fabbrizzi; Symeon Papadopoulos; Eirini Ntoutsi; Ioannis Kompatsiaris

doi:10.48550/arXiv.2107.07919

Details

Originalsprache	Englisch
Aufsatznummer	103552
Fachzeitschrift	Computer Vision and Image Understanding
Jahrgang	223
Frühes Online-Datum	5 Sept. 2022
Publikationsstatus	Veröffentlicht - Okt. 2022

Abstract

Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in significant discrimination if not handled properly. Indeed, CV systems highly depend on training datasets and can learn and amplify biases that such datasets may carry. Thus, the problem of understanding and discovering bias in visual datasets is of utmost importance; yet, it has not been studied in a systematic way to date. Hence, this work aims to: (i) describe the different kinds of bias that may manifest in visual datasets; (ii) review the literature on methods for bias discovery and quantification in visual datasets; (iii) discuss existing attempts to collect visual datasets in a bias-aware manner. A key conclusion of our study is that the problem of bias discovery and quantification in visual datasets is still open, and there is room for improvement in terms of both methods and the range of biases that can be addressed. Moreover, there is no such thing as a bias-free dataset, so scientists and practitioners must become aware of the biases in their datasets and make them explicit. To this end, we propose a checklist to spot different types of bias during visual dataset collection.

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Informatik (insg.)
Signalverarbeitung
Informatik (insg.)
Maschinelles Sehen und Mustererkennung

Zitieren

A survey on bias in visual datasets. / Fabbrizzi, Simone; Papadopoulos, Symeon; Ntoutsi, Eirini et al.
in: Computer Vision and Image Understanding, Jahrgang 223, 103552, 10.2022.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Fabbrizzi, S, Papadopoulos, S, Ntoutsi, E & Kompatsiaris, I 2022, 'A survey on bias in visual datasets', Computer Vision and Image Understanding, Jg. 223, 103552. https://doi.org/10.48550/arXiv.2107.07919, https://doi.org/10.1016/j.cviu.2022.103552

Fabbrizzi, S., Papadopoulos, S., Ntoutsi, E., & Kompatsiaris, I. (2022). A survey on bias in visual datasets. Computer Vision and Image Understanding, 223, Artikel 103552. https://doi.org/10.48550/arXiv.2107.07919, https://doi.org/10.1016/j.cviu.2022.103552

Fabbrizzi S, Papadopoulos S, Ntoutsi E, Kompatsiaris I. A survey on bias in visual datasets. Computer Vision and Image Understanding. 2022 Okt;223:103552. Epub 2022 Sep 5. doi: 10.48550/arXiv.2107.07919, 10.1016/j.cviu.2022.103552

Fabbrizzi, Simone ; Papadopoulos, Symeon ; Ntoutsi, Eirini et al. / A survey on bias in visual datasets. in: Computer Vision and Image Understanding. 2022 ; Jahrgang 223.

Download

@article{972948a44aaf4b10a086d993c4f98738,

title = "A survey on bias in visual datasets",

abstract = "Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in significant discrimination if not handled properly. Indeed, CV systems highly depend on training datasets and can learn and amplify biases that such datasets may carry. Thus, the problem of understanding and discovering bias in visual datasets is of utmost importance; yet, it has not been studied in a systematic way to date. Hence, this work aims to: (i) describe the different kinds of bias that may manifest in visual datasets; (ii) review the literature on methods for bias discovery and quantification in visual datasets; (iii) discuss existing attempts to collect visual datasets in a bias-aware manner. A key conclusion of our study is that the problem of bias discovery and quantification in visual datasets is still open, and there is room for improvement in terms of both methods and the range of biases that can be addressed. Moreover, there is no such thing as a bias-free dataset, so scientists and practitioners must become aware of the biases in their datasets and make them explicit. To this end, we propose a checklist to spot different types of bias during visual dataset collection.",

keywords = "AI ethics, Bias, Computer vision, Visual datasets",

author = "Simone Fabbrizzi and Symeon Papadopoulos and Eirini Ntoutsi and Ioannis Kompatsiaris",

note = "Funding Information: We would like to thank Alaa Elobaid, Miriam Fahimi and Giorgos Kordopatis-Zilos for their feedback and fruitful discussions. This work has received funding from the European Union's Horizon 2020 research and innovation programme under Marie Sklodowska-Curie Actions (grant agreement number 860630) for the project “NoBIAS - Artificial Intelligence without Bias” and under grant agreement number 951911 for the project “AI4Media - A European Excellence Centre for Media, Society and Democracy ”. This work reflects only the authors{\textquoteright} views and the European Research Executive Agency (REA) is not responsible for any use that may be made of the information it contains. ",

year = "2022",

month = oct,

doi = "10.48550/arXiv.2107.07919",

language = "English",

volume = "223",

journal = "Computer Vision and Image Understanding",

issn = "1077-3142",

publisher = "Academic Press Inc.",

}

Download

TY - JOUR

T1 - A survey on bias in visual datasets

AU - Fabbrizzi, Simone

AU - Papadopoulos, Symeon

AU - Ntoutsi, Eirini

AU - Kompatsiaris, Ioannis

N1 - Funding Information: We would like to thank Alaa Elobaid, Miriam Fahimi and Giorgos Kordopatis-Zilos for their feedback and fruitful discussions. This work has received funding from the European Union's Horizon 2020 research and innovation programme under Marie Sklodowska-Curie Actions (grant agreement number 860630) for the project “NoBIAS - Artificial Intelligence without Bias” and under grant agreement number 951911 for the project “AI4Media - A European Excellence Centre for Media, Society and Democracy ”. This work reflects only the authors’ views and the European Research Executive Agency (REA) is not responsible for any use that may be made of the information it contains.

PY - 2022/10

Y1 - 2022/10

N2 - Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in significant discrimination if not handled properly. Indeed, CV systems highly depend on training datasets and can learn and amplify biases that such datasets may carry. Thus, the problem of understanding and discovering bias in visual datasets is of utmost importance; yet, it has not been studied in a systematic way to date. Hence, this work aims to: (i) describe the different kinds of bias that may manifest in visual datasets; (ii) review the literature on methods for bias discovery and quantification in visual datasets; (iii) discuss existing attempts to collect visual datasets in a bias-aware manner. A key conclusion of our study is that the problem of bias discovery and quantification in visual datasets is still open, and there is room for improvement in terms of both methods and the range of biases that can be addressed. Moreover, there is no such thing as a bias-free dataset, so scientists and practitioners must become aware of the biases in their datasets and make them explicit. To this end, we propose a checklist to spot different types of bias during visual dataset collection.

AB - Computer Vision (CV) has achieved remarkable results, outperforming humans in several tasks. Nonetheless, it may result in significant discrimination if not handled properly. Indeed, CV systems highly depend on training datasets and can learn and amplify biases that such datasets may carry. Thus, the problem of understanding and discovering bias in visual datasets is of utmost importance; yet, it has not been studied in a systematic way to date. Hence, this work aims to: (i) describe the different kinds of bias that may manifest in visual datasets; (ii) review the literature on methods for bias discovery and quantification in visual datasets; (iii) discuss existing attempts to collect visual datasets in a bias-aware manner. A key conclusion of our study is that the problem of bias discovery and quantification in visual datasets is still open, and there is room for improvement in terms of both methods and the range of biases that can be addressed. Moreover, there is no such thing as a bias-free dataset, so scientists and practitioners must become aware of the biases in their datasets and make them explicit. To this end, we propose a checklist to spot different types of bias during visual dataset collection.

KW - AI ethics

KW - Bias

KW - Computer vision

KW - Visual datasets

UR - http://www.scopus.com/inward/record.url?scp=85138020479&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2107.07919

DO - 10.48550/arXiv.2107.07919

M3 - Article

AN - SCOPUS:85138020479

VL - 223

JO - Computer Vision and Image Understanding

JF - Computer Vision and Image Understanding

SN - 1077-3142

M1 - 103552

ER -

Research@Leibniz University

A survey on bias in visual datasets

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren