Details
Original language | English |
---|---|
Article number | e303 |
Number of pages | 7 |
Journal | Proceedings of the Association for Information Science and Technology |
Volume | 57 |
Issue number | 1 |
Early online date | 22 Oct 2020 |
Publication status | Published - 2020 |
Externally published | Yes |
Abstract
Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high-precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive-scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state-of-the-art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset.
Keywords
- information extraction, knowledge graphs, relation extraction, scholarly knowledge organization, scholarly text mining
ASJC Scopus subject areas
- Computer Science(all)
- Social Sciences(all)
- Library and Information Sciences
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Proceedings of the Association for Information Science and Technology, Vol. 57, No. 1, e303, 2020.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Targeting precision
T2 - A hybrid scientific relation extraction pipeline for improved scholarly knowledge organization
AU - Jiang, Ming
AU - D'Souza, Jennifer
AU - Auer, Sören
AU - Downie, J. Stephen
PY - 2020
Y1 - 2020
N2 - Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high-precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive-scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state-of-the-art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset.
AB - Knowledge graphs have been successfully built from unstructured texts in general domains such as newswire by leveraging distant supervision relation signals from linked data repositories such as DBpedia. In contrast, the lack of a comprehensive ontology of scholarly relations makes it difficult to similarly adopt distant supervision to create knowledge graphs over scholarly articles. In light of this difficulty, we propose a hybrid approach to extract scientific concept relations from scholarly publications by: (a) utilizing syntactic rules as a form of distant supervision to link related scientific term pairs; and (b) training a classifier to further identify the relation type per pair. Our system targets a high-precision performance objective as opposed to high recall, aiming to reduce the noisy results albeit at the cost of extracting fewer relations when building scholarly knowledge graphs over massive-scale publications. Results on two benchmark datasets show that our hybrid system surpasses the state-of-the-art with an overall 60% F1 score led by the nearly 15% precision boost in identifying related scientific concepts. We further achieved an overall F1 in the range 34.1% to 51.2%, on relation classification, per experimental dataset.
KW - information extraction
KW - knowledge graphs
KW - relation extraction
KW - scholarly knowledge organization
KW - scholarly text mining
UR - http://www.scopus.com/inward/record.url?scp=85098432341&partnerID=8YFLogxK
U2 - 10.1002/pra2.303
DO - 10.1002/pra2.303
M3 - Article
AN - SCOPUS:85098432341
VL - 57
JO - Proceedings of the Association for Information Science and Technology
JF - Proceedings of the Association for Information Science and Technology
IS - 1
M1 - e303
ER -