Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization

Ming Jiang; Jennifer D'Souza; Sören Auer; J. Stephen Downie

doi:10.1002/pra2.303

Details

Originalsprache	Englisch
Aufsatznummer	e303
Seitenumfang	7
Fachzeitschrift	Proceedings of the Association for Information Science and Technology
Jahrgang	57
Ausgabenummer	1
Frühes Online-Datum	22 Okt. 2020
Publikationsstatus	Veröffentlicht - 2020
Extern publiziert	Ja

Abstract

Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high-precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive-scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state-of-the-art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset.

ASJC Scopus Sachgebiete

Informatik (insg.)
Allgemeine Computerwissenschaft
Sozialwissenschaften (insg.)
Bibliotheks- und Informationswissenschaften

Zitieren

Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization. / Jiang, Ming; D'Souza, Jennifer; Auer, Sören et al.
in: Proceedings of the Association for Information Science and Technology, Jahrgang 57, Nr. 1, e303, 2020.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Jiang, M, D'Souza, J, Auer, S & Downie, JS 2020, 'Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization', Proceedings of the Association for Information Science and Technology, Jg. 57, Nr. 1, e303. https://doi.org/10.1002/pra2.303

Jiang, M., D'Souza, J., Auer, S., & Downie, J. S. (2020). Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization. Proceedings of the Association for Information Science and Technology, 57(1), Artikel e303. https://doi.org/10.1002/pra2.303

Jiang M, D'Souza J, Auer S, Downie JS. Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization. Proceedings of the Association for Information Science and Technology. 2020;57(1):e303. Epub 2020 Okt 22. doi: 10.1002/pra2.303

Jiang, Ming ; D'Souza, Jennifer ; Auer, Sören et al. / Targeting precision : A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization. in: Proceedings of the Association for Information Science and Technology. 2020 ; Jahrgang 57, Nr. 1.

Download

@article{d43fc880298d428eb4b27bde8587fc79,

title = "Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization",

abstract = "Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high-precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive-scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state-of-the-art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset.",

keywords = "information extraction, knowledge graphs, relation extraction, scholarly knowledge organization, scholarly text mining",

author = "Ming Jiang and Jennifer D'Souza and S{\"o}ren Auer and Downie, {J. Stephen}",

year = "2020",

doi = "10.1002/pra2.303",

language = "English",

volume = "57",

number = "1",

}

Download

TY - JOUR

T1 - Targeting precision

T2 - A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization

AU - Jiang, Ming

AU - D'Souza, Jennifer

AU - Auer, Sören

AU - Downie, J. Stephen

PY - 2020

Y1 - 2020

N2 - Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high-precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive-scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state-of-the-art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset.

AB - Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high-precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive-scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state-of-the-art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset.

KW - information extraction

KW - knowledge graphs

KW - relation extraction

KW - scholarly knowledge organization

KW - scholarly text mining

UR - http://www.scopus.com/inward/record.url?scp=85098432341&partnerID=8YFLogxK

U2 - 10.1002/pra2.303

DO - 10.1002/pra2.303

M3 - Article

AN - SCOPUS:85098432341

VL - 57

JO - Proceedings of the Association for Information Science and Technology

JF - Proceedings of the Association for Information Science and Technology

IS - 1

M1 - e303

ER -

Research@Leibniz University

Targeting precision: A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization

Autorschaft

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

Managing Comprehensive Research Instrument Descriptions Within a Scholarly Knowledge Graph

DataDesc: A framework for creating and sharing technical metadata for research software interfaces

Federated Querying of Scholarly Communication Infrastructures

A Reputation System for Scientific Contributions Based on a Token Economy

SWARM-SLR: Streamlined Workflow Automation for Machine-Actionable Systematic Literature Reviews