Details
Original language | English |
---|---|
Title of host publication | EWAF`23 |
Subtitle of host publication | European Workshop on Algorithmic Fairness |
Publication status | Published - 16 Jul 2023 |
Event | 2nd European Workshop on Algorithmic Fairness, EWAF 2023 - Winterthur, Switzerland Duration: 7 Jun 2023 → 9 Jun 2023 |
Publication series
Name | CEUR Workshop Proceedings |
---|---|
Publisher | CEUR Workshop Proceedings |
Volume | 3442 |
ISSN (Print) | 1613-0073 |
Abstract
Group imbalance usually caused by insufficient or unrepresentative data collection procedures, is among the main reasons for the emergence of representation bias in datasets. Representation bias can exist with respect to different groups of one or more protected attributes and might lead to prejudicial and discriminatory outcomes toward certain groups of individuals; in case if a learning model is trained on such biased data. In this paper, we propose MASC a data augmentation approach based on affinity clustering of existing data in similar datasets. An arbitrary target dataset utilizes protected group instances of other neighboring datasets that locate in the same cluster, in order to balance out the cardinality of its non-protected and protected groups. To form clusters where datasets can share instances for protected-group augmentation, an affinity clustering pipeline is developed based on an affinity matrix. The formation of the affinity matrix relies on computing the discrepancy of distributions between each pair of datasets and translating these discrepancies into a symmetric pairwise similarity matrix. Furthermore, a non-parametric spectral clustering is applied to the affinity matrix and the corresponding datasets are categorized into an optimal number of clusters automatically. We perform a step-by-step experiment as a demo of our method to both show the procedure of the proposed data augmentation method and also to evaluate and discuss its performance. In addition, a comparison to other data augmentation methods before and after the augmentations are provided as well as model evaluation performance analysis of each of the competitors compared to our method. In our experiments, bias is measured in a non-binary protected attribute setup w.r.t. racial groups distribution for two separate minority groups in comparison with the majority group before and after debiasing. Empirical results imply that our method of augmenting dataset biases using real (genuine) data from similar contexts can effectively debias the target datasets comparably to existing data augmentation strategies.
Keywords
- Affinity Clustering, Bias & Fairness, Data augmentation, Data Debiasing, Distribution Shift, Maximum Mean Discrepancy
ASJC Scopus subject areas
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
EWAF`23: European Workshop on Algorithmic Fairness. 2023. (CEUR Workshop Proceedings; Vol. 3442).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Affinity Clustering Framework for Data Debiasing Using Pairwise Distribution Discrepancy
AU - Ghodsi, Siamak
AU - Ntoutsi, Eirini
N1 - Funding Information: This work has received funding from the European Union’s Horizon 2020 research and innovation programme under Marie Sklodowska-Curie Actions (grant agreement number 860630) for the project ‘’NoBIAS - Artificial Intelligence without Bias’’. This work reflects only the authors’ views and the European Research Executive Agency (REA) is not responsible for any use that may be made of the information it contains.
PY - 2023/7/16
Y1 - 2023/7/16
N2 - Group imbalance usually caused by insufficient or unrepresentative data collection procedures, is among the main reasons for the emergence of representation bias in datasets. Representation bias can exist with respect to different groups of one or more protected attributes and might lead to prejudicial and discriminatory outcomes toward certain groups of individuals; in case if a learning model is trained on such biased data. In this paper, we propose MASC a data augmentation approach based on affinity clustering of existing data in similar datasets. An arbitrary target dataset utilizes protected group instances of other neighboring datasets that locate in the same cluster, in order to balance out the cardinality of its non-protected and protected groups. To form clusters where datasets can share instances for protected-group augmentation, an affinity clustering pipeline is developed based on an affinity matrix. The formation of the affinity matrix relies on computing the discrepancy of distributions between each pair of datasets and translating these discrepancies into a symmetric pairwise similarity matrix. Furthermore, a non-parametric spectral clustering is applied to the affinity matrix and the corresponding datasets are categorized into an optimal number of clusters automatically. We perform a step-by-step experiment as a demo of our method to both show the procedure of the proposed data augmentation method and also to evaluate and discuss its performance. In addition, a comparison to other data augmentation methods before and after the augmentations are provided as well as model evaluation performance analysis of each of the competitors compared to our method. In our experiments, bias is measured in a non-binary protected attribute setup w.r.t. racial groups distribution for two separate minority groups in comparison with the majority group before and after debiasing. Empirical results imply that our method of augmenting dataset biases using real (genuine) data from similar contexts can effectively debias the target datasets comparably to existing data augmentation strategies.
AB - Group imbalance usually caused by insufficient or unrepresentative data collection procedures, is among the main reasons for the emergence of representation bias in datasets. Representation bias can exist with respect to different groups of one or more protected attributes and might lead to prejudicial and discriminatory outcomes toward certain groups of individuals; in case if a learning model is trained on such biased data. In this paper, we propose MASC a data augmentation approach based on affinity clustering of existing data in similar datasets. An arbitrary target dataset utilizes protected group instances of other neighboring datasets that locate in the same cluster, in order to balance out the cardinality of its non-protected and protected groups. To form clusters where datasets can share instances for protected-group augmentation, an affinity clustering pipeline is developed based on an affinity matrix. The formation of the affinity matrix relies on computing the discrepancy of distributions between each pair of datasets and translating these discrepancies into a symmetric pairwise similarity matrix. Furthermore, a non-parametric spectral clustering is applied to the affinity matrix and the corresponding datasets are categorized into an optimal number of clusters automatically. We perform a step-by-step experiment as a demo of our method to both show the procedure of the proposed data augmentation method and also to evaluate and discuss its performance. In addition, a comparison to other data augmentation methods before and after the augmentations are provided as well as model evaluation performance analysis of each of the competitors compared to our method. In our experiments, bias is measured in a non-binary protected attribute setup w.r.t. racial groups distribution for two separate minority groups in comparison with the majority group before and after debiasing. Empirical results imply that our method of augmenting dataset biases using real (genuine) data from similar contexts can effectively debias the target datasets comparably to existing data augmentation strategies.
KW - Affinity Clustering
KW - Bias & Fairness
KW - Data augmentation
KW - Data Debiasing
KW - Distribution Shift
KW - Maximum Mean Discrepancy
UR - http://www.scopus.com/inward/record.url?scp=85168309574&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85168309574
T3 - CEUR Workshop Proceedings
BT - EWAF`23
T2 - 2nd European Workshop on Algorithmic Fairness, EWAF 2023
Y2 - 7 June 2023 through 9 June 2023
ER -