Fair-Capacitated Clustering

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Tai Le Quy
  • Arjun Roy
  • Gunnar Friege
  • Eirini Ntoutsi

External Research Organisations

  • Freie Universität Berlin (FU Berlin)
View graph of relations

Details

Original languageEnglish
Title of host publicationProceedings of The 14th International Conference on Educational Data Mining (EDM 2021)
EditorsI-Han Hsiao, Shaghayegh Sahebi, Francois Bouchet, Jill-Jenn Vie
Pages407-414
Publication statusPublished - 2021
Event14th International Conference on Educational Data Mining 2021 - Paris, France
Duration: 29 Jun 20212 Jul 2021
Conference number: 14

Abstract

Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.

Keywords

    cs.LG, cs.DC

Cite this

Fair-Capacitated Clustering. / Quy, Tai Le; Roy, Arjun; Friege, Gunnar et al.
Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021). ed. / I-Han Hsiao; Shaghayegh Sahebi; Francois Bouchet; Jill-Jenn Vie . 2021. p. 407-414.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Quy, TL, Roy, A, Friege, G & Ntoutsi, E 2021, Fair-Capacitated Clustering. in I-H Hsiao, S Sahebi, F Bouchet & J-J Vie (eds), Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021). pp. 407-414, 14th International Conference on Educational Data Mining 2021, Paris, France, 29 Jun 2021. https://doi.org/10.48550/arXiv.2104.12116
Quy, T. L., Roy, A., Friege, G., & Ntoutsi, E. (2021). Fair-Capacitated Clustering. In I.-H. Hsiao, S. Sahebi, F. Bouchet, & J.-J. Vie (Eds.), Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) (pp. 407-414) https://doi.org/10.48550/arXiv.2104.12116
Quy TL, Roy A, Friege G, Ntoutsi E. Fair-Capacitated Clustering. In Hsiao IH, Sahebi S, Bouchet F, Vie JJ, editors, Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021). 2021. p. 407-414 doi: 10.48550/arXiv.2104.12116
Quy, Tai Le ; Roy, Arjun ; Friege, Gunnar et al. / Fair-Capacitated Clustering. Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021). editor / I-Han Hsiao ; Shaghayegh Sahebi ; Francois Bouchet ; Jill-Jenn Vie . 2021. pp. 407-414
Download
@inproceedings{2409bf525208449ab36719189e9ff374,
title = "Fair-Capacitated Clustering",
abstract = " Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality. ",
keywords = "cs.LG, cs.DC",
author = "Quy, {Tai Le} and Arjun Roy and Gunnar Friege and Eirini Ntoutsi",
year = "2021",
doi = "10.48550/arXiv.2104.12116",
language = "English",
pages = "407--414",
editor = "I-Han Hsiao and Sahebi, {Shaghayegh } and Francois Bouchet and {Vie }, Jill-Jenn",
booktitle = "Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021)",
note = "14th International Conference on Educational Data Mining 2021 ; Conference date: 29-06-2021 Through 02-07-2021",

}

Download

TY - GEN

T1 - Fair-Capacitated Clustering

AU - Quy, Tai Le

AU - Roy, Arjun

AU - Friege, Gunnar

AU - Ntoutsi, Eirini

N1 - Conference code: 14

PY - 2021

Y1 - 2021

N2 - Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.

AB - Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.

KW - cs.LG

KW - cs.DC

U2 - 10.48550/arXiv.2104.12116

DO - 10.48550/arXiv.2104.12116

M3 - Conference contribution

SP - 407

EP - 414

BT - Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021)

A2 - Hsiao, I-Han

A2 - Sahebi, Shaghayegh

A2 - Bouchet, Francois

A2 - Vie , Jill-Jenn

T2 - 14th International Conference on Educational Data Mining 2021

Y2 - 29 June 2021 through 2 July 2021

ER -