Tracing the Impact of Bias in Link Prediction

Mayra Russo; Sammy Fabian Sawischa; Maria Esther Vidal

doi:10.1145/3605098.3635912

Details

Original language	English
Title of host publication	SAC '24
Subtitle of host publication	Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing
Pages	1626-1633
Number of pages	8
ISBN (electronic)	9798400702433
Publication status	Published - 21 May 2024
Event	39th Annual ACM Symposium on Applied Computing, SAC 2024 - Avila, Spain Duration: 8 Apr 2024 → 12 Apr 2024

Abstract

Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.

Keywords

bias, knowledge graphs, link prediction

ASJC Scopus subject areas

Computer Science(all)
Software

Cite this

Tracing the Impact of Bias in Link Prediction. / Russo, Mayra; Sawischa, Sammy Fabian; Vidal, Maria Esther.
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. p. 1626-1633.

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Russo, M, Sawischa, SF & Vidal, ME 2024, Tracing the Impact of Bias in Link Prediction. in SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. pp. 1626-1633, 39th Annual ACM Symposium on Applied Computing, SAC 2024, Avila, Spain, 8 Apr 2024. https://doi.org/10.1145/3605098.3635912

Russo, M., Sawischa, S. F., & Vidal, M. E. (2024). Tracing the Impact of Bias in Link Prediction. In SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing (pp. 1626-1633) https://doi.org/10.1145/3605098.3635912

Russo M, Sawischa SF, Vidal ME. Tracing the Impact of Bias in Link Prediction. In SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. p. 1626-1633 doi: 10.1145/3605098.3635912

Russo, Mayra ; Sawischa, Sammy Fabian ; Vidal, Maria Esther. / Tracing the Impact of Bias in Link Prediction. SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. pp. 1626-1633

Download

@inproceedings{42e21807d31a4b97baca10cac52db2bf,

title = "Tracing the Impact of Bias in Link Prediction",

abstract = "Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.",

keywords = "bias, knowledge graphs, link prediction",

author = "Mayra Russo and Sawischa, {Sammy Fabian} and Vidal, {Maria Esther}",

note = "Publisher Copyright: {\textcopyright} 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.; 39th Annual ACM Symposium on Applied Computing, SAC 2024 ; Conference date: 08-04-2024 Through 12-04-2024",

year = "2024",

month = may,

day = "21",

doi = "10.1145/3605098.3635912",

language = "English",

pages = "1626--1633",

booktitle = "SAC '24",

}

Download

TY - GEN

T1 - Tracing the Impact of Bias in Link Prediction

AU - Russo, Mayra

AU - Sawischa, Sammy Fabian

AU - Vidal, Maria Esther

PY - 2024/5/21

Y1 - 2024/5/21

N2 - Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.

AB - Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.

KW - bias

KW - knowledge graphs

KW - link prediction

UR - http://www.scopus.com/inward/record.url?scp=85197662345&partnerID=8YFLogxK

U2 - 10.1145/3605098.3635912

DO - 10.1145/3605098.3635912

M3 - Conference contribution

AN - SCOPUS:85197662345

SP - 1626

EP - 1633

BT - SAC '24

T2 - 39th Annual ACM Symposium on Applied Computing, SAC 2024

Y2 - 8 April 2024 through 12 April 2024

ER -

Research@Leibniz University

Tracing the Impact of Bias in Link Prediction

Authors

Research Organisations

External Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this