Fair-Capacitated Clustering

Tai Le Quy; Arjun Roy; Gunnar Friege; Eirini Ntoutsi

doi:10.48550/arXiv.2104.12116

Details

Original language	English
Title of host publication	Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021)
Editors	I-Han Hsiao, Shaghayegh Sahebi, Francois Bouchet, Jill-Jenn Vie
Pages	407-414
Publication status	Published - 2021
Event	14th International Conference on Educational Data Mining 2021 - Paris, France Duration: 29 Jun 2021 → 2 Jul 2021 Conference number: 14

Abstract

Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.

Keywords

cs.LG, cs.DC

Cite this

Fair-Capacitated Clustering. / Quy, Tai Le; Roy, Arjun; Friege, Gunnar et al.
Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021). ed. / I-Han Hsiao; Shaghayegh Sahebi; Francois Bouchet; Jill-Jenn Vie . 2021. p. 407-414.

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Quy, TL, Roy, A, Friege, G & Ntoutsi, E 2021, Fair-Capacitated Clustering. in I-H Hsiao, S Sahebi, F Bouchet & J-J Vie (eds), Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021). pp. 407-414, 14th International Conference on Educational Data Mining 2021, Paris, France, 29 Jun 2021. https://doi.org/10.48550/arXiv.2104.12116

Quy, T. L., Roy, A., Friege, G., & Ntoutsi, E. (2021). Fair-Capacitated Clustering. In I.-H. Hsiao, S. Sahebi, F. Bouchet, & J.-J. Vie (Eds.), Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021) (pp. 407-414) https://doi.org/10.48550/arXiv.2104.12116

Quy TL, Roy A, Friege G, Ntoutsi E. Fair-Capacitated Clustering. In Hsiao IH, Sahebi S, Bouchet F, Vie JJ, editors, Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021). 2021. p. 407-414 doi: 10.48550/arXiv.2104.12116

Quy, Tai Le ; Roy, Arjun ; Friege, Gunnar et al. / Fair-Capacitated Clustering. Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021). editor / I-Han Hsiao ; Shaghayegh Sahebi ; Francois Bouchet ; Jill-Jenn Vie . 2021. pp. 407-414

Download

@inproceedings{2409bf525208449ab36719189e9ff374,

title = "Fair-Capacitated Clustering",

abstract = " Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality. ",

keywords = "cs.LG, cs.DC",

author = "Quy, {Tai Le} and Arjun Roy and Gunnar Friege and Eirini Ntoutsi",

year = "2021",

doi = "10.48550/arXiv.2104.12116",

language = "English",

pages = "407--414",

editor = "I-Han Hsiao and Sahebi, {Shaghayegh } and Francois Bouchet and {Vie }, Jill-Jenn",

booktitle = "Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021)",

note = "14th International Conference on Educational Data Mining 2021 ; Conference date: 29-06-2021 Through 02-07-2021",

}

Download

TY - GEN

T1 - Fair-Capacitated Clustering

AU - Quy, Tai Le

AU - Roy, Arjun

AU - Friege, Gunnar

AU - Ntoutsi, Eirini

N1 - Conference code: 14

PY - 2021

Y1 - 2021

N2 - Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.

AB - Traditionally, clustering algorithms focus on partitioning the data into groups of similar instances. The similarity objective, however, is not sufficient in applications where a fair-representation of the groups in terms of protected attributes like gender or race, is required for each cluster. Moreover, in many applications, to make the clusters useful for the end-user, a balanced cardinality among the clusters is required. Our motivation comes from the education domain where studies indicate that students might learn better in diverse student groups and of course groups of similar cardinality are more practical e.g., for group assignments. To this end, we introduce the fair-capacitated clustering problem that partitions the data into clusters of similar instances while ensuring cluster fairness and balancing cluster cardinalities. We propose a two-step solution to the problem: i) we rely on fairlets to generate minimal sets that satisfy the fair constraint and ii) we propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain the fair-capacitated clustering. The hierarchical approach embeds the additional cardinality requirements during the merging step while the partitioning-based one alters the assignment step using a knapsack problem formulation to satisfy the additional requirements. Our experiments on four educational datasets show that our approaches deliver well-balanced clusters in terms of both fairness and cardinality while maintaining a good clustering quality.

KW - cs.LG

KW - cs.DC

UR - https://www.scopus.com/record/display.uri?eid=2-s2.0-85124270100&origin=inward&txGid=d311b1b95938fe8f2d065bd87ba9e712

U2 - 10.48550/arXiv.2104.12116

DO - 10.48550/arXiv.2104.12116

M3 - Conference contribution

SP - 407

EP - 414

BT - Proceedings of The 14th International Conference on Educational Data Mining (EDM 2021)

A2 - Hsiao, I-Han

A2 - Sahebi, Shaghayegh

A2 - Bouchet, Francois

A2 - Vie , Jill-Jenn

T2 - 14th International Conference on Educational Data Mining 2021

Y2 - 29 June 2021 through 2 July 2021

ER -

Research@Leibniz University

Fair-Capacitated Clustering

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

Cite this