Details
Original language | English |
---|---|
Title of host publication | SAC '19 |
Subtitle of host publication | Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing |
Place of Publication | New York |
Publisher | Association for Computing Machinery (ACM) |
Pages | 1019-1026 |
Number of pages | 8 |
ISBN (print) | 978-1-4503-5933-7 |
Publication status | Published - 8 Apr 2019 |
Event | 34th Annual ACM Symposium on Applied Computing, SAC 2019 - Limassol, Cyprus Duration: 8 Apr 2019 → 12 Apr 2019 |
Abstract
Entity Linking (EL) is the task of automatically identifying entity mentions in a piece of text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. There is a large number of EL tools available for different types of documents and domains, yet EL remains a challenging task where the lack of precision on particularly ambiguous mentions often spoils the usefulness of automated disambiguation results in real applications. A priori approximations of the difficulty to link a particular entity mention can facilitate flagging of critical cases as part of semi-automated EL systems, while detecting latent factors that affect the EL performance, like corpus-specific features, can provide insights on how to improve a system based on the special characteristics of the underlying corpus. In this paper, we first introduce a consensus-based method to generate difficulty labels for entity mentions on arbitrary corpora. The difficulty labels are then exploited as training data for a supervised classification task able to predict the EL difficulty of entity mentions using a variety of features. Experiments over a corpus of news articles show that EL difficulty can be estimated with high accuracy, revealing also latent features that affect EL performance. Finally, evaluation results demonstrate the effectiveness of the proposed method to inform semi-automated EL pipelines.
Keywords
- Distant Supervision, Entity Linking, Named Entity Recognition and Disambiguation, Supervised Classification
ASJC Scopus subject areas
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
SAC '19: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. New York: Association for Computing Machinery (ACM), 2019. p. 1019-1026.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Same but Different
T2 - 34th Annual ACM Symposium on Applied Computing, SAC 2019
AU - João, Renato Stoffalette
AU - Fafalios, Pavlos
AU - Dietze, Stefan
N1 - Funding Information: This work was partially supported by CNPq (Brazilian National Council for Scientific and Technological Development) under grant GDE No. 203268/2014-8 and the European Commission for the ERC Advanced Grant ALEXANDRIA under grant No. 339233.
PY - 2019/4/8
Y1 - 2019/4/8
N2 - Entity Linking (EL) is the task of automatically identifying entity mentions in a piece of text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. There is a large number of EL tools available for different types of documents and domains, yet EL remains a challenging task where the lack of precision on particularly ambiguous mentions often spoils the usefulness of automated disambiguation results in real applications. A priori approximations of the difficulty to link a particular entity mention can facilitate flagging of critical cases as part of semi-automated EL systems, while detecting latent factors that affect the EL performance, like corpus-specific features, can provide insights on how to improve a system based on the special characteristics of the underlying corpus. In this paper, we first introduce a consensus-based method to generate difficulty labels for entity mentions on arbitrary corpora. The difficulty labels are then exploited as training data for a supervised classification task able to predict the EL difficulty of entity mentions using a variety of features. Experiments over a corpus of news articles show that EL difficulty can be estimated with high accuracy, revealing also latent features that affect EL performance. Finally, evaluation results demonstrate the effectiveness of the proposed method to inform semi-automated EL pipelines.
AB - Entity Linking (EL) is the task of automatically identifying entity mentions in a piece of text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. There is a large number of EL tools available for different types of documents and domains, yet EL remains a challenging task where the lack of precision on particularly ambiguous mentions often spoils the usefulness of automated disambiguation results in real applications. A priori approximations of the difficulty to link a particular entity mention can facilitate flagging of critical cases as part of semi-automated EL systems, while detecting latent factors that affect the EL performance, like corpus-specific features, can provide insights on how to improve a system based on the special characteristics of the underlying corpus. In this paper, we first introduce a consensus-based method to generate difficulty labels for entity mentions on arbitrary corpora. The difficulty labels are then exploited as training data for a supervised classification task able to predict the EL difficulty of entity mentions using a variety of features. Experiments over a corpus of news articles show that EL difficulty can be estimated with high accuracy, revealing also latent features that affect EL performance. Finally, evaluation results demonstrate the effectiveness of the proposed method to inform semi-automated EL pipelines.
KW - Distant Supervision
KW - Entity Linking
KW - Named Entity Recognition and Disambiguation
KW - Supervised Classification
UR - http://www.scopus.com/inward/record.url?scp=85065658346&partnerID=8YFLogxK
U2 - 10.48550/arXiv.1812.10387
DO - 10.48550/arXiv.1812.10387
M3 - Conference contribution
AN - SCOPUS:85065658346
SN - 978-1-4503-5933-7
SP - 1019
EP - 1026
BT - SAC '19
PB - Association for Computing Machinery (ACM)
CY - New York
Y2 - 8 April 2019 through 12 April 2019
ER -