NLPContributions: An annotation scheme for machine reading of scholarly contributions in natural language processing literature

Jennifer D'Souza; Sören Auer

Details

Original language	English
Title of host publication	1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents
Subtitle of host publication	Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020)
Pages	16-27
Number of pages	12
Publication status	Published - 31 Aug 2020
Event	1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2020 - Virtual, Online, China Duration: 1 Aug 2020 → …

Publication series

Name	CEUR Workshop Proceedings
Publisher	CEUR Workshop Proceedings
Volume	2658
ISSN (Print)	1613-0073

Abstract

We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers [18] of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development.

Keywords

Annotation guidelines, Dataset, Digital libraries, Open science graphs, Scholarly knowledge graphs, Semantic publishing

ASJC Scopus subject areas

Computer Science(all)
General Computer Science

Cite this

NLPContributions: An annotation scheme for machine reading of scholarly contributions in natural language processing literature. / D'Souza, Jennifer; Auer, Sören.
1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents: Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020). 2020. p. 16-27 (CEUR Workshop Proceedings; Vol. 2658).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

D'Souza, J & Auer, S 2020, NLPContributions: An annotation scheme for machine reading of scholarly contributions in natural language processing literature. in 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents: Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020). CEUR Workshop Proceedings, vol. 2658, pp. 16-27, 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2020, Virtual, Online, China, 1 Aug 2020. <https://arxiv.org/abs/2006.12870>

D'Souza, J., & Auer, S. (2020). NLPContributions: An annotation scheme for machine reading of scholarly contributions in natural language processing literature. In 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents: Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020) (pp. 16-27). (CEUR Workshop Proceedings; Vol. 2658). https://arxiv.org/abs/2006.12870

D'Souza J, Auer S. NLPContributions: An annotation scheme for machine reading of scholarly contributions in natural language processing literature. In 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents: Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020). 2020. p. 16-27. (CEUR Workshop Proceedings).

D'Souza, Jennifer ; Auer, Sören. / NLPContributions : An annotation scheme for machine reading of scholarly contributions in natural language processing literature. 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents: Proceedings of the 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents co-located with the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL 2020). 2020. pp. 16-27 (CEUR Workshop Proceedings).

Download

@inproceedings{60e4760aad944bef9102858b93c6a710,

title = "NLPContributions: An annotation scheme for machine reading of scholarly contributions in natural language processing literature",

abstract = "We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers [18] of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development.",

keywords = "Annotation guidelines, Dataset, Digital libraries, Open science graphs, Scholarly knowledge graphs, Semantic publishing",

author = "Jennifer D'Souza and S{\"o}ren Auer",

year = "2020",

month = aug,

day = "31",

language = "English",

series = "CEUR Workshop Proceedings",

publisher = "CEUR Workshop Proceedings",

pages = "16--27",

booktitle = "1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents",

note = "1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2020 ; Conference date: 01-08-2020",

}

Download

TY - GEN

T1 - NLPContributions

T2 - 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents, EEKE 2020

AU - D'Souza, Jennifer

AU - Auer, Sören

PY - 2020/8/31

Y1 - 2020/8/31

N2 - We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers [18] of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development.

AB - We describe an annotation initiative to capture the scholarly contributions in natural language processing (NLP) articles, particularly, for the articles that discuss machine learning (ML) approaches for various information extraction tasks. We develop the annotation task based on a pilot annotation exercise on 50 NLP-ML scholarly articles presenting contributions to five information extraction tasks 1. machine translation, 2. named entity recognition, 3. question answering, 4. relation classification, and 5. text classification. In this article, we describe the outcomes of this pilot annotation phase. Through the exercise we have obtained an annotation methodology; and found ten core information units that reflect the contribution of the NLP-ML scholarly investigations. The resulting annotation scheme we developed based on these information units is called NLPContributions. The overarching goal of our endeavor is four-fold: 1) to find a systematic set of patterns of subject-predicate-object statements for the semantic structuring of scholarly contributions that are more or less generically applicable for NLP-ML research articles; 2) to apply the discovered patterns in the creation of a larger annotated dataset for training machine readers [18] of research contributions; 3) to ingest the dataset into the Open Research Knowledge Graph (ORKG) infrastructure as a showcase for creating user-friendly state-of-the-art overviews; 4) to integrate the machine readers into the ORKG to assist users in the manual curation of their respective article contributions. We envision that the NLPContributions methodology engenders a wider discussion on the topic toward its further refinement and development.

KW - Annotation guidelines

KW - Dataset

KW - Digital libraries

KW - Open science graphs

KW - Scholarly knowledge graphs

KW - Semantic publishing

UR - http://www.scopus.com/inward/record.url?scp=85090918844&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85090918844

T3 - CEUR Workshop Proceedings

SP - 16

EP - 27

BT - 1st Workshop on Extraction and Evaluation of Knowledge Entities from Scientific Documents

Y2 - 1 August 2020

ER -

Research@Leibniz University

NLPContributions: An annotation scheme for machine reading of scholarly contributions in natural language processing literature

Authors

Research Organisations

External Research Organisations

Details

Publication series

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

DataDesc: A framework for creating and sharing technical metadata for research software interfaces

Organizing Scientific Knowledge from Engineering Sciences Using the Open Research Knowledge Graph: The Tailored Forming Process Chain Use Case

A Neuro-Symbolic Approach for Faceted Search in Digital Libraries

Leveraging GPT Models For Semantic Table Annotation

Managing Comprehensive Research Instrument Descriptions Within a Scholarly Knowledge Graph

DataDesc: A framework for creating and sharing technical metadata for research software interfaces

Organizing Scientific Knowledge from Engineering Sciences Using the Open Research Knowledge Graph: The Tailored Forming Process Chain Use Case

A Neuro-Symbolic Approach for Faceted Search in Digital Libraries

Leveraging GPT Models For Semantic Table Annotation

Managing Comprehensive Research Instrument Descriptions Within a Scholarly Knowledge Graph

DataDesc: A framework for creating and sharing technical metadata for research software interfaces