Discovering Entities with Just a Little Help from You

Jaspreet Singh; Johannes Hoffart; Avishek Anand

doi:10.1145/2983323.2983798

Details

Originalsprache	Englisch
Titel des Sammelwerks	CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
Erscheinungsort	New York
Seiten	1331-1340
Seitenumfang	10
ISBN (elektronisch)	9781450340731
Publikationsstatus	Veröffentlicht - 24 Okt. 2016
Veranstaltung	25th ACM International Conference on Information and Knowledge Management, CIKM 2016 - Indianapolis, USA / Vereinigte Staaten Dauer: 24 Okt. 2016 → 28 Okt. 2016

Publikationsreihe

Name	International Conference on Information and Knowledge Management, Proceedings
Band	24-28-October-2016

Abstract

Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy. What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged.

ASJC Scopus Sachgebiete

Betriebswirtschaft, Management und Rechnungswesen (insg.)
Allgemeine Unternehmensführung und Buchhaltung
Entscheidungswissenschaften (insg.)
Allgemeine Entscheidungswissenschaften

Zitieren

Discovering Entities with Just a Little Help from You. / Singh, Jaspreet; Hoffart, Johannes; Anand, Avishek.
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. New York, 2016. S. 1331-1340 (International Conference on Information and Knowledge Management, Proceedings; Band 24-28-October-2016).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung

Singh, J, Hoffart, J & Anand, A 2016, Discovering Entities with Just a Little Help from You. in CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. International Conference on Information and Knowledge Management, Proceedings, Bd. 24-28-October-2016, New York, S. 1331-1340, 25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, USA / Vereinigte Staaten, 24 Okt. 2016. https://doi.org/10.1145/2983323.2983798

Singh, J., Hoffart, J., & Anand, A. (2016). Discovering Entities with Just a Little Help from You. In CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (S. 1331-1340). (International Conference on Information and Knowledge Management, Proceedings; Band 24-28-October-2016).. https://doi.org/10.1145/2983323.2983798

Singh J, Hoffart J, Anand A. Discovering Entities with Just a Little Help from You. in CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. New York. 2016. S. 1331-1340. (International Conference on Information and Knowledge Management, Proceedings). doi: 10.1145/2983323.2983798

Singh, Jaspreet ; Hoffart, Johannes ; Anand, Avishek. / Discovering Entities with Just a Little Help from You. CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. New York, 2016. S. 1331-1340 (International Conference on Information and Knowledge Management, Proceedings).

Download

@inproceedings{c97cd8ed7ad6434e9a20c821305ce36d,

title = "Discovering Entities with Just a Little Help from You",

abstract = " Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy. What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged. ",

keywords = "cs.IR",

author = "Jaspreet Singh and Johannes Hoffart and Avishek Anand",

year = "2016",

month = oct,

day = "24",

doi = "10.1145/2983323.2983798",

language = "English",

isbn = "978-1-4503-4073-1",

series = "International Conference on Information and Knowledge Management, Proceedings",

pages = "1331--1340",

booktitle = "CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management",

note = "25th ACM International Conference on Information and Knowledge Management, CIKM 2016 ; Conference date: 24-10-2016 Through 28-10-2016",

}

Download

TY - GEN

T1 - Discovering Entities with Just a Little Help from You

AU - Singh, Jaspreet

AU - Hoffart, Johannes

AU - Anand, Avishek

PY - 2016/10/24

Y1 - 2016/10/24

N2 - Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy. What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged.

AB - Linking entities like people, organizations, books, music groups and their songs in text to knowledge bases (KBs) is a fundamental task for many downstream search and mining applications. Achieving high disambiguation accuracy crucially depends on a rich and holistic representation of the entities in the KB. For popular entities, such a representation can be easily mined from Wikipedia, and many current entity disambiguation and linking methods make use of this fact. However, Wikipedia does not contain long-tail entities that only few people are interested in, and also at times lags behind until newly emerging entities are added. For such entities, mining a suitable representation in a fully automated fashion is very difficult, resulting in poor linking accuracy. What can automatically be mined, though, is a high-quality representation given the context of a new entity occurring in any text. Due to the lack of knowledge about the entity, no method can retrieve these occurrences automatically with high precision, resulting in a chicken-egg problem. To address this, our approach automatically generates candidate occurrences of entities, prompting the user for feedback to decide if the occurrence refers to the actual entity in question. This feedback gradually improves the knowledge and allows our methods to provide better candidate suggestions to keep the user engaged. We propose novel human-in-the-loop retrieval methods for generating candidates based on gradient interleaving of diversification and textual relevance approaches. We conducted extensive experiments on the FACC dataset, showing that our approaches convincingly outperform carefully selected baselines in both intrinsic and extrinsic measures while keeping users engaged.

KW - cs.IR

UR - http://www.scopus.com/inward/record.url?scp=84996508749&partnerID=8YFLogxK

U2 - 10.1145/2983323.2983798

DO - 10.1145/2983323.2983798

M3 - Conference contribution

SN - 978-1-4503-4073-1

T3 - International Conference on Information and Knowledge Management, Proceedings

SP - 1331

EP - 1340

BT - CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

CY - New York

T2 - 25th ACM International Conference on Information and Knowledge Management, CIKM 2016

Y2 - 24 October 2016 through 28 October 2016

ER -

Research@Leibniz University

Discovering Entities with Just a Little Help from You

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren