Coreference Resolution in Research Papers from Multiple Domains

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Arthur Brack
  • Daniel Uwe Müller
  • Anett Hoppe
  • Ralph Ewerth

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationAdvances in Information Retrieval
Subtitle of host publication43rd European Conference on IR Research, ECIR 2021, Proceedings
EditorsDjoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, Fabrizio Sebastiani
Place of PublicationCham
Pages79-97
Number of pages19
ISBN (electronic)978-3-030-72113-8
Publication statusPublished - 27 Mar 2021

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12656 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

Coreference resolution is essential for automatic text understanding to facilitate high-level information retrieval tasks such as text summarisation or question answering. Previous work indicates that the performance of state-of-the-art approaches (e.g. based on BERT) noticeably declines when applied to scientific papers. In this paper, we investigate the task of coreference resolution in research papers and subsequent knowledge graph population. We present the following contributions: (1) We annotate a corpus for coreference resolution that comprises 10 different scientific disciplines from Science, Technology, and Medicine (STM); (2) We propose transfer learning for automatic coreference resolution in research papers; (3) We analyse the impact of coreference resolution on knowledge graph (KG) population; (4) We release a research KG that is automatically populated from 55,485 papers in 10 STM domains. Comprehensive experiments show the usefulness of the proposed approach. Our transfer learning approach considerably outperforms state-of-the-art baselines on our corpus with an F1 score of 61.4 (+11.0), while the evaluation against a gold standard KG shows that coreference resolution improves the quality of the populated KG significantly with an F1 score of 63.5 (+21.8).

Keywords

    Coreference resolution, Information extraction, Knowledge graph population, Scholarly communication

ASJC Scopus subject areas

Cite this

Coreference Resolution in Research Papers from Multiple Domains. / Brack, Arthur; Müller, Daniel Uwe; Hoppe, Anett et al.
Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Proceedings. ed. / Djoerd Hiemstra; Marie-Francine Moens; Josiane Mothe; Raffaele Perego; Martin Potthast; Fabrizio Sebastiani. Cham, 2021. p. 79-97 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12656 LNCS).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Brack, A, Müller, DU, Hoppe, A & Ewerth, R 2021, Coreference Resolution in Research Papers from Multiple Domains. in D Hiemstra, M-F Moens, J Mothe, R Perego, M Potthast & F Sebastiani (eds), Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12656 LNCS, Cham, pp. 79-97. https://doi.org/10.48550/arXiv.2101.00884, https://doi.org/10.1007/978-3-030-72113-8_6
Brack, A., Müller, D. U., Hoppe, A., & Ewerth, R. (2021). Coreference Resolution in Research Papers from Multiple Domains. In D. Hiemstra, M.-F. Moens, J. Mothe, R. Perego, M. Potthast, & F. Sebastiani (Eds.), Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Proceedings (pp. 79-97). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12656 LNCS).. https://doi.org/10.48550/arXiv.2101.00884, https://doi.org/10.1007/978-3-030-72113-8_6
Brack A, Müller DU, Hoppe A, Ewerth R. Coreference Resolution in Research Papers from Multiple Domains. In Hiemstra D, Moens MF, Mothe J, Perego R, Potthast M, Sebastiani F, editors, Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Proceedings. Cham. 2021. p. 79-97. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.48550/arXiv.2101.00884, 10.1007/978-3-030-72113-8_6
Brack, Arthur ; Müller, Daniel Uwe ; Hoppe, Anett et al. / Coreference Resolution in Research Papers from Multiple Domains. Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Proceedings. editor / Djoerd Hiemstra ; Marie-Francine Moens ; Josiane Mothe ; Raffaele Perego ; Martin Potthast ; Fabrizio Sebastiani. Cham, 2021. pp. 79-97 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{057d83f7e4bb4cc8a13fff28e7b0b422,
title = "Coreference Resolution in Research Papers from Multiple Domains",
abstract = "Coreference resolution is essential for automatic text understanding to facilitate high-level information retrieval tasks such as text summarisation or question answering. Previous work indicates that the performance of state-of-the-art approaches (e.g. based on BERT) noticeably declines when applied to scientific papers. In this paper, we investigate the task of coreference resolution in research papers and subsequent knowledge graph population. We present the following contributions: (1) We annotate a corpus for coreference resolution that comprises 10 different scientific disciplines from Science, Technology, and Medicine (STM); (2) We propose transfer learning for automatic coreference resolution in research papers; (3) We analyse the impact of coreference resolution on knowledge graph (KG) population; (4) We release a research KG that is automatically populated from 55,485 papers in 10 STM domains. Comprehensive experiments show the usefulness of the proposed approach. Our transfer learning approach considerably outperforms state-of-the-art baselines on our corpus with an F1 score of 61.4 (+11.0), while the evaluation against a gold standard KG shows that coreference resolution improves the quality of the populated KG significantly with an F1 score of 63.5 (+21.8).",
keywords = "Coreference resolution, Information extraction, Knowledge graph population, Scholarly communication",
author = "Arthur Brack and M{\"u}ller, {Daniel Uwe} and Anett Hoppe and Ralph Ewerth",
year = "2021",
month = mar,
day = "27",
doi = "10.48550/arXiv.2101.00884",
language = "English",
isbn = "978-3-030-72112-1",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "79--97",
editor = "Djoerd Hiemstra and Marie-Francine Moens and Josiane Mothe and Raffaele Perego and Martin Potthast and Fabrizio Sebastiani",
booktitle = "Advances in Information Retrieval",

}

Download

TY - GEN

T1 - Coreference Resolution in Research Papers from Multiple Domains

AU - Brack, Arthur

AU - Müller, Daniel Uwe

AU - Hoppe, Anett

AU - Ewerth, Ralph

PY - 2021/3/27

Y1 - 2021/3/27

N2 - Coreference resolution is essential for automatic text understanding to facilitate high-level information retrieval tasks such as text summarisation or question answering. Previous work indicates that the performance of state-of-the-art approaches (e.g. based on BERT) noticeably declines when applied to scientific papers. In this paper, we investigate the task of coreference resolution in research papers and subsequent knowledge graph population. We present the following contributions: (1) We annotate a corpus for coreference resolution that comprises 10 different scientific disciplines from Science, Technology, and Medicine (STM); (2) We propose transfer learning for automatic coreference resolution in research papers; (3) We analyse the impact of coreference resolution on knowledge graph (KG) population; (4) We release a research KG that is automatically populated from 55,485 papers in 10 STM domains. Comprehensive experiments show the usefulness of the proposed approach. Our transfer learning approach considerably outperforms state-of-the-art baselines on our corpus with an F1 score of 61.4 (+11.0), while the evaluation against a gold standard KG shows that coreference resolution improves the quality of the populated KG significantly with an F1 score of 63.5 (+21.8).

AB - Coreference resolution is essential for automatic text understanding to facilitate high-level information retrieval tasks such as text summarisation or question answering. Previous work indicates that the performance of state-of-the-art approaches (e.g. based on BERT) noticeably declines when applied to scientific papers. In this paper, we investigate the task of coreference resolution in research papers and subsequent knowledge graph population. We present the following contributions: (1) We annotate a corpus for coreference resolution that comprises 10 different scientific disciplines from Science, Technology, and Medicine (STM); (2) We propose transfer learning for automatic coreference resolution in research papers; (3) We analyse the impact of coreference resolution on knowledge graph (KG) population; (4) We release a research KG that is automatically populated from 55,485 papers in 10 STM domains. Comprehensive experiments show the usefulness of the proposed approach. Our transfer learning approach considerably outperforms state-of-the-art baselines on our corpus with an F1 score of 61.4 (+11.0), while the evaluation against a gold standard KG shows that coreference resolution improves the quality of the populated KG significantly with an F1 score of 63.5 (+21.8).

KW - Coreference resolution

KW - Information extraction

KW - Knowledge graph population

KW - Scholarly communication

UR - http://www.scopus.com/inward/record.url?scp=85107346762&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2101.00884

DO - 10.48550/arXiv.2101.00884

M3 - Conference contribution

SN - 978-3-030-72112-1

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 79

EP - 97

BT - Advances in Information Retrieval

A2 - Hiemstra, Djoerd

A2 - Moens, Marie-Francine

A2 - Mothe, Josiane

A2 - Perego, Raffaele

A2 - Potthast, Martin

A2 - Sebastiani, Fabrizio

CY - Cham

ER -