Tracing the Impact of Bias in Link Prediction

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Mayra Russo
  • Sammy Fabian Sawischa
  • Maria Esther Vidal

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationSAC '24
Subtitle of host publicationProceedings of the 39th ACM/SIGAPP Symposium on Applied Computing
Pages1626-1633
Number of pages8
ISBN (electronic)9798400702433
Publication statusPublished - 21 May 2024
Event39th Annual ACM Symposium on Applied Computing, SAC 2024 - Avila, Spain
Duration: 8 Apr 202412 Apr 2024

Abstract

Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.

Keywords

    bias, knowledge graphs, link prediction

ASJC Scopus subject areas

Cite this

Tracing the Impact of Bias in Link Prediction. / Russo, Mayra; Sawischa, Sammy Fabian; Vidal, Maria Esther.
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. p. 1626-1633.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Russo, M, Sawischa, SF & Vidal, ME 2024, Tracing the Impact of Bias in Link Prediction. in SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. pp. 1626-1633, 39th Annual ACM Symposium on Applied Computing, SAC 2024, Avila, Spain, 8 Apr 2024. https://doi.org/10.1145/3605098.3635912
Russo, M., Sawischa, S. F., & Vidal, M. E. (2024). Tracing the Impact of Bias in Link Prediction. In SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing (pp. 1626-1633) https://doi.org/10.1145/3605098.3635912
Russo M, Sawischa SF, Vidal ME. Tracing the Impact of Bias in Link Prediction. In SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. p. 1626-1633 doi: 10.1145/3605098.3635912
Russo, Mayra ; Sawischa, Sammy Fabian ; Vidal, Maria Esther. / Tracing the Impact of Bias in Link Prediction. SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. pp. 1626-1633
Download
@inproceedings{42e21807d31a4b97baca10cac52db2bf,
title = "Tracing the Impact of Bias in Link Prediction",
abstract = "Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.",
keywords = "bias, knowledge graphs, link prediction",
author = "Mayra Russo and Sawischa, {Sammy Fabian} and Vidal, {Maria Esther}",
note = "Publisher Copyright: {\textcopyright} 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.; 39th Annual ACM Symposium on Applied Computing, SAC 2024 ; Conference date: 08-04-2024 Through 12-04-2024",
year = "2024",
month = may,
day = "21",
doi = "10.1145/3605098.3635912",
language = "English",
pages = "1626--1633",
booktitle = "SAC '24",

}

Download

TY - GEN

T1 - Tracing the Impact of Bias in Link Prediction

AU - Russo, Mayra

AU - Sawischa, Sammy Fabian

AU - Vidal, Maria Esther

N1 - Publisher Copyright: © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

PY - 2024/5/21

Y1 - 2024/5/21

N2 - Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.

AB - Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.

KW - bias

KW - knowledge graphs

KW - link prediction

UR - http://www.scopus.com/inward/record.url?scp=85197662345&partnerID=8YFLogxK

U2 - 10.1145/3605098.3635912

DO - 10.1145/3605098.3635912

M3 - Conference contribution

AN - SCOPUS:85197662345

SP - 1626

EP - 1633

BT - SAC '24

T2 - 39th Annual ACM Symposium on Applied Computing, SAC 2024

Y2 - 8 April 2024 through 12 April 2024

ER -