Automated Mining of Leaderboards for Empirical AI Research

Salomon Kabongo; Jennifer D’Souza; Sören Auer

doi:10.1007/978-3-030-91669-5_35

Details

Original language	English
Title of host publication	Towards Open and Trustworthy Digital Societies
Subtitle of host publication	23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings
Editors	Hao-Ren Ke, Chei Sian Lee, Kazunari Sugiyama
Publisher	Springer Nature Switzerland AG
Pages	453-470
Number of pages	18
ISBN (electronic)	978-3-030-91669-5
ISBN (print)	9783030916688
Publication status	Published - 2021
Event	23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021 - Virtual, Online Duration: 1 Dec 2021 → 3 Dec 2021

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13133
ISSN (Print)	0302-9743
ISSN (electronic)	1611-3349

Abstract

With the rapid growth of research publications, empowering scientists to keep an oversight over scientific progress is of paramount importance. In this regard, the leaderboards facet of information organization provides an overview on the state-of-the-art by aggregating empirical results from various studies addressing the same research challenge. Crowdsourcing efforts like PapersWithCode among others are devoted to the construction of leaderboards predominantly for various subdomains in Artificial Intelligence. Leaderboards provide machine-readable scholarly knowledge that has proven to be directly useful for scientists to keep track of research progress – their construction could be greatly expedited with automated text mining. This study presents a comprehensive approach for generating leaderboards for knowledge-graph-based scholarly information organization. Specifically, we investigate the problem of automated leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet. Our analysis reveals an optimal approach that significantly outperforms existing baselines for the task with evaluation scores above 90% in F1. This, in turn, offers new state-of-the-art results for leaderboard extraction. As a result, a vast share of empirical AI research can be organized in the next-generation digital libraries as knowledge graphs.

Keywords

Information extraction, Knowledge graphs, Neural machine learning, Scholarly text mining, Table mining

ASJC Scopus subject areas

Mathematics(all)
Theoretical Computer Science
Computer Science(all)
General Computer Science

Cite this

Automated Mining of Leaderboards for Empirical AI Research. / Kabongo, Salomon; D’Souza, Jennifer; Auer, Sören.
Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings. ed. / Hao-Ren Ke; Chei Sian Lee; Kazunari Sugiyama. Springer Nature Switzerland AG, 2021. p. 453-470 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13133).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Kabongo, S, D’Souza, J & Auer, S 2021, Automated Mining of Leaderboards for Empirical AI Research. in H-R Ke, CS Lee & K Sugiyama (eds), Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13133, Springer Nature Switzerland AG, pp. 453-470, 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual, Online, 1 Dec 2021. https://doi.org/10.1007/978-3-030-91669-5_35

Kabongo, S., D’Souza, J., & Auer, S. (2021). Automated Mining of Leaderboards for Empirical AI Research. In H.-R. Ke, C. S. Lee, & K. Sugiyama (Eds.), Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings (pp. 453-470). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13133). Springer Nature Switzerland AG. https://doi.org/10.1007/978-3-030-91669-5_35

Kabongo S, D’Souza J, Auer S. Automated Mining of Leaderboards for Empirical AI Research. In Ke HR, Lee CS, Sugiyama K, editors, Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings. Springer Nature Switzerland AG. 2021. p. 453-470. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2021 Nov 30. doi: 10.1007/978-3-030-91669-5_35

Kabongo, Salomon ; D’Souza, Jennifer ; Auer, Sören. / Automated Mining of Leaderboards for Empirical AI Research. Towards Open and Trustworthy Digital Societies: 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Proceedings. editor / Hao-Ren Ke ; Chei Sian Lee ; Kazunari Sugiyama. Springer Nature Switzerland AG, 2021. pp. 453-470 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{065f5bac741f4214a36a6fc4397a0884,

title = "Automated Mining of Leaderboards for Empirical AI Research",

abstract = "With the rapid growth of research publications, empowering scientists to keep an oversight over scientific progress is of paramount importance. In this regard, the leaderboards facet of information organization provides an overview on the state-of-the-art by aggregating empirical results from various studies addressing the same research challenge. Crowdsourcing efforts like PapersWithCode among others are devoted to the construction of leaderboards predominantly for various subdomains in Artificial Intelligence. Leaderboards provide machine-readable scholarly knowledge that has proven to be directly useful for scientists to keep track of research progress – their construction could be greatly expedited with automated text mining. This study presents a comprehensive approach for generating leaderboards for knowledge-graph-based scholarly information organization. Specifically, we investigate the problem of automated leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet. Our analysis reveals an optimal approach that significantly outperforms existing baselines for the task with evaluation scores above 90% in F1. This, in turn, offers new state-of-the-art results for leaderboard extraction. As a result, a vast share of empirical AI research can be organized in the next-generation digital libraries as knowledge graphs.",

keywords = "Information extraction, Knowledge graphs, Neural machine learning, Scholarly text mining, Table mining",

author = "Salomon Kabongo and Jennifer D{\textquoteright}Souza and S{\"o}ren Auer",

note = "Funding Information: This work was co-funded by the Federal Ministry of Education and Research (BMBF) of Germany for the project LeibnizKILabor (grant no. 01DD20003) and by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536).; 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021 ; Conference date: 01-12-2021 Through 03-12-2021",

year = "2021",

doi = "10.1007/978-3-030-91669-5_35",

language = "English",

isbn = "9783030916688",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Nature Switzerland AG",

pages = "453--470",

editor = "Hao-Ren Ke and Lee, {Chei Sian} and Kazunari Sugiyama",

booktitle = "Towards Open and Trustworthy Digital Societies",

address = "Switzerland",

}

Download

TY - GEN

T1 - Automated Mining of Leaderboards for Empirical AI Research

AU - Kabongo, Salomon

AU - D’Souza, Jennifer

AU - Auer, Sören

N1 - Funding Information: This work was co-funded by the Federal Ministry of Education and Research (BMBF) of Germany for the project LeibnizKILabor (grant no. 01DD20003) and by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536).

PY - 2021

Y1 - 2021

N2 - With the rapid growth of research publications, empowering scientists to keep an oversight over scientific progress is of paramount importance. In this regard, the leaderboards facet of information organization provides an overview on the state-of-the-art by aggregating empirical results from various studies addressing the same research challenge. Crowdsourcing efforts like PapersWithCode among others are devoted to the construction of leaderboards predominantly for various subdomains in Artificial Intelligence. Leaderboards provide machine-readable scholarly knowledge that has proven to be directly useful for scientists to keep track of research progress – their construction could be greatly expedited with automated text mining. This study presents a comprehensive approach for generating leaderboards for knowledge-graph-based scholarly information organization. Specifically, we investigate the problem of automated leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet. Our analysis reveals an optimal approach that significantly outperforms existing baselines for the task with evaluation scores above 90% in F1. This, in turn, offers new state-of-the-art results for leaderboard extraction. As a result, a vast share of empirical AI research can be organized in the next-generation digital libraries as knowledge graphs.

AB - With the rapid growth of research publications, empowering scientists to keep an oversight over scientific progress is of paramount importance. In this regard, the leaderboards facet of information organization provides an overview on the state-of-the-art by aggregating empirical results from various studies addressing the same research challenge. Crowdsourcing efforts like PapersWithCode among others are devoted to the construction of leaderboards predominantly for various subdomains in Artificial Intelligence. Leaderboards provide machine-readable scholarly knowledge that has proven to be directly useful for scientists to keep track of research progress – their construction could be greatly expedited with automated text mining. This study presents a comprehensive approach for generating leaderboards for knowledge-graph-based scholarly information organization. Specifically, we investigate the problem of automated leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet. Our analysis reveals an optimal approach that significantly outperforms existing baselines for the task with evaluation scores above 90% in F1. This, in turn, offers new state-of-the-art results for leaderboard extraction. As a result, a vast share of empirical AI research can be organized in the next-generation digital libraries as knowledge graphs.

KW - Information extraction

KW - Knowledge graphs

KW - Neural machine learning

KW - Scholarly text mining

KW - Table mining

UR - http://www.scopus.com/inward/record.url?scp=85121928250&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-91669-5_35

DO - 10.1007/978-3-030-91669-5_35

M3 - Conference contribution

AN - SCOPUS:85121928250

SN - 9783030916688

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 453

EP - 470

BT - Towards Open and Trustworthy Digital Societies

A2 - Ke, Hao-Ren

A2 - Lee, Chei Sian

A2 - Sugiyama, Kazunari

PB - Springer Nature Switzerland AG

T2 - 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021

Y2 - 1 December 2021 through 3 December 2021

ER -

Research@Leibniz University

Automated Mining of Leaderboards for Empirical AI Research

Authors

Research Organisations

External Research Organisations

Details

Publication series

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

DataDesc: A framework for creating and sharing technical metadata for research software interfaces

Organizing Scientific Knowledge from Engineering Sciences Using the Open Research Knowledge Graph: The Tailored Forming Process Chain Use Case

A Neuro-Symbolic Approach for Faceted Search in Digital Libraries

Leveraging GPT Models For Semantic Table Annotation

Managing Comprehensive Research Instrument Descriptions Within a Scholarly Knowledge Graph

DataDesc: A framework for creating and sharing technical metadata for research software interfaces

Organizing Scientific Knowledge from Engineering Sciences Using the Open Research Knowledge Graph: The Tailored Forming Process Chain Use Case

A Neuro-Symbolic Approach for Faceted Search in Digital Libraries

Leveraging GPT Models For Semantic Table Annotation

Managing Comprehensive Research Instrument Descriptions Within a Scholarly Knowledge Graph

DataDesc: A framework for creating and sharing technical metadata for research software interfaces