Details
Original language | English |
---|---|
Title of host publication | SAC '24 |
Subtitle of host publication | Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing |
Pages | 1626-1633 |
Number of pages | 8 |
ISBN (electronic) | 9798400702433 |
Publication status | Published - 21 May 2024 |
Event | 39th Annual ACM Symposium on Applied Computing, SAC 2024 - Avila, Spain Duration: 8 Apr 2024 → 12 Apr 2024 |
Abstract
Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.
Keywords
- bias, knowledge graphs, link prediction
ASJC Scopus subject areas
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024. p. 1626-1633.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Tracing the Impact of Bias in Link Prediction
AU - Russo, Mayra
AU - Sawischa, Sammy Fabian
AU - Vidal, Maria Esther
N1 - Publisher Copyright: © 2024 Copyright is held by the owner/author(s). Publication rights licensed to ACM.
PY - 2024/5/21
Y1 - 2024/5/21
N2 - Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.
AB - Link prediction (LP) in knowledge graphs (KGs) uses embedding-based approaches and machine learning (ML) models to uncover new facts. In order to not overstate and accurately assess the performance of these techniques, their comprehensive and rigorous evaluation is needed. In this work, we suggest a framework to systematically trace and analyze bias - -specifically, test leakage bias and sample selection bias - -in training and testing knowledge graphs (KGs). The goal is to evaluate how bias affects the performance of LP models We specify a collection of bias measures in SPARQL (the W3C standard query language) to facilitate the analysis of any RDF graph with regard to its structural bias properties. Further, we evaluate our framework over seven state-of-the-art LP datasets (e.g., FB15k-237, WN18RR, and YAGO3-10) and the TransE model. Our findings show how bias, i.e., overrepresentation of entities and relations and pronounced information redundancy, is present across all datasets and how it advantageously impacts the reported performance of the LP model. With these results, we make a call for thorough assessments of data sources in order to discourage the use of biased datasets when appropriate, and to also help improve our understanding of how LP models work and to better interpret their produced output.
KW - bias
KW - knowledge graphs
KW - link prediction
UR - http://www.scopus.com/inward/record.url?scp=85197662345&partnerID=8YFLogxK
U2 - 10.1145/3605098.3635912
DO - 10.1145/3605098.3635912
M3 - Conference contribution
AN - SCOPUS:85197662345
SP - 1626
EP - 1633
BT - SAC '24
T2 - 39th Annual ACM Symposium on Applied Computing, SAC 2024
Y2 - 8 April 2024 through 12 April 2024
ER -