Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Advances in Information Retrieval |
Untertitel | 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I |
Herausgeber/-innen | Joemon M. Jose, Emine Yilmaz, João Magalhães, Flávio Martins, Pablo Castells, Nicola Ferro, Mário J. Silva |
Erscheinungsort | Cham |
Seiten | 251-266 |
Seitenumfang | 16 |
ISBN (elektronisch) | 978-3-030-45439-5 |
Publikationsstatus | Veröffentlicht - 8 Apr. 2020 |
Publikationsreihe
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Band | 12035 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (elektronisch) | 1611-3349 |
Abstract
We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.
ASJC Scopus Sachgebiete
- Mathematik (insg.)
- Theoretische Informatik
- Informatik (insg.)
- Allgemeine Computerwissenschaft
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Hrsg. / Joemon M. Jose; Emine Yilmaz; João Magalhães; Flávio Martins; Pablo Castells; Nicola Ferro; Mário J. Silva. Cham, 2020. S. 251-266 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12035 LNCS).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung
}
TY - GEN
T1 - Domain-independent Extraction of Scientific Concepts from Research Articles
AU - Brack, Arthur
AU - D'Souza, Jennifer
AU - Hoppe, Anett
AU - Auer, Sören
AU - Ewerth, Ralph
PY - 2020/4/8
Y1 - 2020/4/8
N2 - We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.
AB - We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.
KW - cs.IR
KW - cs.DL
KW - Research knowledge graph
KW - Scholarly communication
KW - Scientific articles
KW - Active learning
KW - Information extraction
KW - Sequence labelling
UR - http://www.scopus.com/inward/record.url?scp=85083998712&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-45439-5_17
DO - 10.1007/978-3-030-45439-5_17
M3 - Conference contribution
SN - 9783030454388
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 251
EP - 266
BT - Advances in Information Retrieval
A2 - Jose, Joemon M.
A2 - Yilmaz, Emine
A2 - Magalhães, João
A2 - Martins, Flávio
A2 - Castells, Pablo
A2 - Ferro, Nicola
A2 - Silva, Mário J.
CY - Cham
ER -