Domain-independent Extraction of Scientific Concepts from Research Articles

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschung

Autoren

  • Arthur Brack
  • Jennifer D'Souza
  • Anett Hoppe
  • Sören Auer
  • Ralph Ewerth

Organisationseinheiten

Externe Organisationen

  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksAdvances in Information Retrieval
Untertitel42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I
Herausgeber/-innenJoemon M. Jose, Emine Yilmaz, João Magalhães, Flávio Martins, Pablo Castells, Nicola Ferro, Mário J. Silva
ErscheinungsortCham
Seiten251-266
Seitenumfang16
ISBN (elektronisch)978-3-030-45439-5
PublikationsstatusVeröffentlicht - 8 Apr. 2020

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band12035 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Abstract

We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

ASJC Scopus Sachgebiete

Zitieren

Domain-independent Extraction of Scientific Concepts from Research Articles. / Brack, Arthur; D'Souza, Jennifer; Hoppe, Anett et al.
Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Hrsg. / Joemon M. Jose; Emine Yilmaz; João Magalhães; Flávio Martins; Pablo Castells; Nicola Ferro; Mário J. Silva. Cham, 2020. S. 251-266 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12035 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschung

Brack, A, D'Souza, J, Hoppe, A, Auer, S & Ewerth, R 2020, Domain-independent Extraction of Scientific Concepts from Research Articles. in JM Jose, E Yilmaz, J Magalhães, F Martins, P Castells, N Ferro & MJ Silva (Hrsg.), Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 12035 LNCS, Cham, S. 251-266. https://doi.org/10.1007/978-3-030-45439-5_17
Brack, A., D'Souza, J., Hoppe, A., Auer, S., & Ewerth, R. (2020). Domain-independent Extraction of Scientific Concepts from Research Articles. In J. M. Jose, E. Yilmaz, J. Magalhães, F. Martins, P. Castells, N. Ferro, & M. J. Silva (Hrsg.), Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I (S. 251-266). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12035 LNCS).. https://doi.org/10.1007/978-3-030-45439-5_17
Brack A, D'Souza J, Hoppe A, Auer S, Ewerth R. Domain-independent Extraction of Scientific Concepts from Research Articles. in Jose JM, Yilmaz E, Magalhães J, Martins F, Castells P, Ferro N, Silva MJ, Hrsg., Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Cham. 2020. S. 251-266. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-45439-5_17
Brack, Arthur ; D'Souza, Jennifer ; Hoppe, Anett et al. / Domain-independent Extraction of Scientific Concepts from Research Articles. Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Hrsg. / Joemon M. Jose ; Emine Yilmaz ; João Magalhães ; Flávio Martins ; Pablo Castells ; Nicola Ferro ; Mário J. Silva. Cham, 2020. S. 251-266 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{b158f71c279942d4879ecbba95cbd7ef,
title = "Domain-independent Extraction of Scientific Concepts from Research Articles",
abstract = "We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.",
keywords = "cs.IR, cs.DL, Research knowledge graph, Scholarly communication, Scientific articles, Active learning, Information extraction, Sequence labelling",
author = "Arthur Brack and Jennifer D'Souza and Anett Hoppe and S{\"o}ren Auer and Ralph Ewerth",
year = "2020",
month = apr,
day = "8",
doi = "10.1007/978-3-030-45439-5_17",
language = "English",
isbn = "9783030454388",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "251--266",
editor = "Jose, {Joemon M.} and Emine Yilmaz and Jo{\~a}o Magalh{\~a}es and Fl{\'a}vio Martins and Pablo Castells and Nicola Ferro and Silva, {M{\'a}rio J.}",
booktitle = "Advances in Information Retrieval",

}

Download

TY - GEN

T1 - Domain-independent Extraction of Scientific Concepts from Research Articles

AU - Brack, Arthur

AU - D'Souza, Jennifer

AU - Hoppe, Anett

AU - Auer, Sören

AU - Ewerth, Ralph

PY - 2020/4/8

Y1 - 2020/4/8

N2 - We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

AB - We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

KW - cs.IR

KW - cs.DL

KW - Research knowledge graph

KW - Scholarly communication

KW - Scientific articles

KW - Active learning

KW - Information extraction

KW - Sequence labelling

UR - http://www.scopus.com/inward/record.url?scp=85083998712&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-45439-5_17

DO - 10.1007/978-3-030-45439-5_17

M3 - Conference contribution

SN - 9783030454388

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 251

EP - 266

BT - Advances in Information Retrieval

A2 - Jose, Joemon M.

A2 - Yilmaz, Emine

A2 - Magalhães, João

A2 - Martins, Flávio

A2 - Castells, Pablo

A2 - Ferro, Nicola

A2 - Silva, Mário J.

CY - Cham

ER -

Von denselben Autoren