Details
Original language | English |
---|---|
Title of host publication | Digital Libraries at Times of Massive Societal Transition |
Subtitle of host publication | 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Proceedings |
Editors | Emi Ishita, Natalie Lee Pang, Lihong Zhou |
Place of Publication | Cham |
Publisher | Springer Science and Business Media Deutschland GmbH |
Pages | 3-19 |
Number of pages | 17 |
ISBN (electronic) | 978-3-030-64452-9 |
ISBN (print) | 9783030644512 |
Publication status | Published - 2020 |
Event | 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020 - Kyoto, Japan Duration: 30 Nov 2020 → 1 Dec 2020 |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12504 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (electronic) | 1611-3349 |
Abstract
With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on large corpora have been popularly explored for automatic relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small number of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.
Keywords
- Digital library, Information extraction, Knowledge graphs, Neural machine learning, Scholarly text mining, Semantic relation classification
ASJC Scopus subject areas
- Mathematics(all)
- Theoretical Computer Science
- Computer Science(all)
- General Computer Science
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Digital Libraries at Times of Massive Societal Transition : 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, Proceedings. ed. / Emi Ishita; Natalie Lee Pang; Lihong Zhou. Cham: Springer Science and Business Media Deutschland GmbH, 2020. p. 3-19 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12504 LNCS).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Improving Scholarly Knowledge Representation
T2 - 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020
AU - Jiang, Ming
AU - D’Souza, Jennifer
AU - Auer, Sören
AU - Downie, J. Stephen
PY - 2020
Y1 - 2020
N2 - With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on large corpora have been popularly explored for automatic relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small number of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.
AB - With the rapid growth of research publications, there is a vast amount of scholarly knowledge that needs to be organized in digital libraries. To deal with this challenge, techniques relying on knowledge-graph structures are being advocated. Within such graph-based pipelines, inferring relation types between related scientific concepts is a crucial step. Recently, advanced techniques relying on language models pre-trained on large corpora have been popularly explored for automatic relation classification. Despite the remarkable contributions that have been made, many of these methods were evaluated under different scenarios, which limits their comparability. To address this shortcoming, we present a thorough empirical evaluation of eight Bert-based classification models by focusing on two key factors: 1) Bert model variants, and 2) classification strategies. Experiments on three corpora show that domain-specific pre-training corpus benefits the Bert-based classification model to identify the type of scientific relations. Although the strategy of predicting a single relation each time achieves a higher classification accuracy than the strategy of identifying multiple relation types simultaneously in general, the latter strategy demonstrates a more consistent performance in the corpus with either a large or small number of annotations. Our study aims to offer recommendations to the stakeholders of digital libraries for selecting the appropriate technique to build knowledge-graph-based systems for enhanced scholarly information organization.
KW - Digital library
KW - Information extraction
KW - Knowledge graphs
KW - Neural machine learning
KW - Scholarly text mining
KW - Semantic relation classification
UR - http://www.scopus.com/inward/record.url?scp=85097538751&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-64452-9_1
DO - 10.1007/978-3-030-64452-9_1
M3 - Conference contribution
AN - SCOPUS:85097538751
SN - 9783030644512
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 19
BT - Digital Libraries at Times of Massive Societal Transition
A2 - Ishita, Emi
A2 - Pang, Natalie Lee
A2 - Zhou, Lihong
PB - Springer Science and Business Media Deutschland GmbH
CY - Cham
Y2 - 30 November 2020 through 1 December 2020
ER -