Domain-independent Extraction of Scientific Concepts from Research Articles

Arthur Brack; Jennifer D'Souza; Anett Hoppe; Sören Auer; Ralph Ewerth

doi:10.1007/978-3-030-45439-5_17

Details

Originalsprache	Englisch
Titel des Sammelwerks	Advances in Information Retrieval
Untertitel	42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I
Herausgeber/-innen	Joemon M. Jose, Emine Yilmaz, João Magalhães, Flávio Martins, Pablo Castells, Nicola Ferro, Mário J. Silva
Erscheinungsort	Cham
Seiten	251-266
Seitenumfang	16
ISBN (elektronisch)	978-3-030-45439-5
Publikationsstatus	Veröffentlicht - 8 Apr. 2020

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	12035 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Abstract

We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

ASJC Scopus Sachgebiete

Mathematik (insg.)
Theoretische Informatik
Informatik (insg.)
Allgemeine Computerwissenschaft

Zitieren

Domain-independent Extraction of Scientific Concepts from Research Articles. / Brack, Arthur; D'Souza, Jennifer; Hoppe, Anett et al.
Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Hrsg. / Joemon M. Jose; Emine Yilmaz; João Magalhães; Flávio Martins; Pablo Castells; Nicola Ferro; Mário J. Silva. Cham, 2020. S. 251-266 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12035 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung

Brack, A, D'Souza, J, Hoppe, A, Auer, S & Ewerth, R 2020, Domain-independent Extraction of Scientific Concepts from Research Articles. in JM Jose, E Yilmaz, J Magalhães, F Martins, P Castells, N Ferro & MJ Silva (Hrsg.), Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 12035 LNCS, Cham, S. 251-266. https://doi.org/10.1007/978-3-030-45439-5_17

Brack, A., D'Souza, J., Hoppe, A., Auer, S., & Ewerth, R. (2020). Domain-independent Extraction of Scientific Concepts from Research Articles. In J. M. Jose, E. Yilmaz, J. Magalhães, F. Martins, P. Castells, N. Ferro, & M. J. Silva (Hrsg.), Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I (S. 251-266). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 12035 LNCS).. https://doi.org/10.1007/978-3-030-45439-5_17

Brack A, D'Souza J, Hoppe A, Auer S, Ewerth R. Domain-independent Extraction of Scientific Concepts from Research Articles. in Jose JM, Yilmaz E, Magalhães J, Martins F, Castells P, Ferro N, Silva MJ, Hrsg., Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Cham. 2020. S. 251-266. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-45439-5_17

Brack, Arthur ; D'Souza, Jennifer ; Hoppe, Anett et al. / Domain-independent Extraction of Scientific Concepts from Research Articles. Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I. Hrsg. / Joemon M. Jose ; Emine Yilmaz ; João Magalhães ; Flávio Martins ; Pablo Castells ; Nicola Ferro ; Mário J. Silva. Cham, 2020. S. 251-266 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{b158f71c279942d4879ecbba95cbd7ef,

title = "Domain-independent Extraction of Scientific Concepts from Research Articles",

abstract = "We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.",

keywords = "cs.IR, cs.DL, Research knowledge graph, Scholarly communication, Scientific articles, Active learning, Information extraction, Sequence labelling",

author = "Arthur Brack and Jennifer D'Souza and Anett Hoppe and S{\"o}ren Auer and Ralph Ewerth",

year = "2020",

month = apr,

day = "8",

doi = "10.1007/978-3-030-45439-5_17",

language = "English",

isbn = "9783030454388",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "251--266",

editor = "Jose, {Joemon M.} and Emine Yilmaz and Jo{\~a}o Magalh{\~a}es and Fl{\'a}vio Martins and Pablo Castells and Nicola Ferro and Silva, {M{\'a}rio J.}",

booktitle = "Advances in Information Retrieval",

}

Download

TY - GEN

T1 - Domain-independent Extraction of Scientific Concepts from Research Articles

AU - Brack, Arthur

AU - D'Souza, Jennifer

AU - Hoppe, Anett

AU - Auer, Sören

AU - Ewerth, Ralph

PY - 2020/4/8

Y1 - 2020/4/8

N2 - We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

AB - We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.

KW - cs.IR

KW - cs.DL

KW - Research knowledge graph

KW - Scholarly communication

KW - Scientific articles

KW - Active learning

KW - Information extraction

KW - Sequence labelling

UR - http://www.scopus.com/inward/record.url?scp=85083998712&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-45439-5_17

DO - 10.1007/978-3-030-45439-5_17

M3 - Conference contribution

SN - 9783030454388

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 251

EP - 266

BT - Advances in Information Retrieval

A2 - Jose, Joemon M.

A2 - Yilmaz, Emine

A2 - Magalhães, João

A2 - Martins, Flávio

A2 - Castells, Pablo

A2 - Ferro, Nicola

A2 - Silva, Mário J.

CY - Cham

ER -

Research@Leibniz University

Domain-independent Extraction of Scientific Concepts from Research Articles

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

DataDesc: A framework for creating and sharing technical metadata for research software interfaces

Organizing Scientific Knowledge from Engineering Sciences Using the Open Research Knowledge Graph: The Tailored Forming Process Chain Use Case

A Neuro-Symbolic Approach for Faceted Search in Digital Libraries

Leveraging GPT Models For Semantic Table Annotation

Managing Comprehensive Research Instrument Descriptions Within a Scholarly Knowledge Graph

DataDesc: A framework for creating and sharing technical metadata for research software interfaces

Organizing Scientific Knowledge from Engineering Sciences Using the Open Research Knowledge Graph: The Tailored Forming Process Chain Use Case

A Neuro-Symbolic Approach for Faceted Search in Digital Libraries

Leveraging GPT Models For Semantic Table Annotation

Managing Comprehensive Research Instrument Descriptions Within a Scholarly Knowledge Graph

DataDesc: A framework for creating and sharing technical metadata for research software interfaces