Details
Original language | English |
---|---|
Title of host publication | Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) |
Editors | I-Han Hsiao, Shaghayegh Sahebi, Francois Bouchet, Jill-Jenn Vie |
Pages | 407-414 |
Publication status | Published - 2021 |
Event | 14th International Conference on Educational Data Mining 2021 - Paris, France Duration: 29 Jun 2021 → 2 Jul 2021 Conference number: 14 |
Abstract
Keywords
- cs.LG, cs.DC
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021). ed. / I-Han Hsiao; Shaghayegh Sahebi; Francois Bouchet; Jill-Jenn Vie . 2021. p. 407-414.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Fair-Capacitated Clustering
AU - Quy, Tai Le
AU - Roy, Arjun
AU - Friege, Gunnar
AU - Ntoutsi, Eirini
N1 - Conference code: 14
PY - 2021
Y1 - 2021
N2 - Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.
AB - Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.
KW - cs.LG
KW - cs.DC
UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85124270100&origin=inward&txGid=d311b1b95938fe8f2d065bd87ba9e712
U2 - 10.48550/arXiv.2104.12116
DO - 10.48550/arXiv.2104.12116
M3 - Conference contribution
SP - 407
EP - 414
BT - Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021)
A2 - Hsiao, I-Han
A2 - Sahebi, Shaghayegh
A2 - Bouchet, Francois
A2 - Vie , Jill-Jenn
T2 - 14th International Conference on Educational Data Mining 2021
Y2 - 29 June 2021 through 2 July 2021
ER -