Scholarly Knowledge Extraction from Published Software Packages

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

Organisationseinheiten

Externe Organisationen

  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksFrom Born-Physical to Born-Virtual
UntertitelAugmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings
Herausgeber/-innenYuen-Hsien Tseng, Marie Katsurai, Hoa N. Nguyen
Herausgeber (Verlag)Springer Science and Business Media Deutschland GmbH
Seiten301-310
Seitenumfang10
ISBN (Print)9783031217555
PublikationsstatusVeröffentlicht - 7 Dez. 2022
Veranstaltung24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022 - Hanoi, Vietnam
Dauer: 30 Nov. 20222 Dez. 2022

Publikationsreihe

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band13636 LNCS
ISSN (Print)0302-9743
ISSN (elektronisch)1611-3349

Abstract

A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static analysis of their (meta)data and contents (in particular scripts in languages such as Python). The approach can be summarized as follows. First, we extract metadata information (software description, programming languages, related references) from software packages by leveraging the Software Metadata Extraction Framework (SOMEF) and the GitHub API. Second, we analyze the extracted metadata to find the research articles associated with the corresponding software repository. Third, for software contained in published packages, we create and analyze the Abstract Syntax Tree (AST) representation to extract information about the procedures performed on data. Fourth, we search the extracted information in the full text of related articles to constrain the extracted information to scholarly knowledge, i.e. information published in the scholarly literature. Finally, we publish the extracted machine actionable scholarly knowledge in the Open Research Knowledge Graph (ORKG).

ASJC Scopus Sachgebiete

Zitieren

Scholarly Knowledge Extraction from Published Software Packages. / Haris, Muhammad; Stocker, Markus; Auer, Sören.
From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings. Hrsg. / Yuen-Hsien Tseng; Marie Katsurai; Hoa N. Nguyen. Springer Science and Business Media Deutschland GmbH, 2022. S. 301-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13636 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Haris, M, Stocker, M & Auer, S 2022, Scholarly Knowledge Extraction from Published Software Packages. in Y-H Tseng, M Katsurai & HN Nguyen (Hrsg.), From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 13636 LNCS, Springer Science and Business Media Deutschland GmbH, S. 301-310, 24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022, Hanoi, Vietnam, 30 Nov. 2022. https://doi.org/10.48550/arXiv.2212.07921, https://doi.org/10.1007/978-3-031-21756-2_24
Haris, M., Stocker, M., & Auer, S. (2022). Scholarly Knowledge Extraction from Published Software Packages. In Y.-H. Tseng, M. Katsurai, & H. N. Nguyen (Hrsg.), From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings (S. 301-310). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13636 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.48550/arXiv.2212.07921, https://doi.org/10.1007/978-3-031-21756-2_24
Haris M, Stocker M, Auer S. Scholarly Knowledge Extraction from Published Software Packages. in Tseng YH, Katsurai M, Nguyen HN, Hrsg., From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. S. 301-310. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: https://doi.org/10.48550/arXiv.2212.07921, 10.1007/978-3-031-21756-2_24
Haris, Muhammad ; Stocker, Markus ; Auer, Sören. / Scholarly Knowledge Extraction from Published Software Packages. From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings. Hrsg. / Yuen-Hsien Tseng ; Marie Katsurai ; Hoa N. Nguyen. Springer Science and Business Media Deutschland GmbH, 2022. S. 301-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{57a41e05f40e4dc79985ef44f8e06708,
title = "Scholarly Knowledge Extraction from Published Software Packages",
abstract = "A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static analysis of their (meta)data and contents (in particular scripts in languages such as Python). The approach can be summarized as follows. First, we extract metadata information (software description, programming languages, related references) from software packages by leveraging the Software Metadata Extraction Framework (SOMEF) and the GitHub API. Second, we analyze the extracted metadata to find the research articles associated with the corresponding software repository. Third, for software contained in published packages, we create and analyze the Abstract Syntax Tree (AST) representation to extract information about the procedures performed on data. Fourth, we search the extracted information in the full text of related articles to constrain the extracted information to scholarly knowledge, i.e. information published in the scholarly literature. Finally, we publish the extracted machine actionable scholarly knowledge in the Open Research Knowledge Graph (ORKG).",
keywords = "Abstract syntax tree, Analyzing software packages, Code analysis, Machine actionability, Open research knowledge graph, Scholarly communication",
author = "Muhammad Haris and Markus Stocker and S{\"o}ren Auer",
note = "Funding Information: Acknowledgment. This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and TIB–Leibniz Information Centre for Science and Technology. ; 24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022 ; Conference date: 30-11-2022 Through 02-12-2022",
year = "2022",
month = dec,
day = "7",
doi = "https://doi.org/10.48550/arXiv.2212.07921",
language = "English",
isbn = "9783031217555",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "301--310",
editor = "Yuen-Hsien Tseng and Marie Katsurai and Nguyen, {Hoa N.}",
booktitle = "From Born-Physical to Born-Virtual",
address = "Germany",

}

Download

TY - GEN

T1 - Scholarly Knowledge Extraction from Published Software Packages

AU - Haris, Muhammad

AU - Stocker, Markus

AU - Auer, Sören

N1 - Funding Information: Acknowledgment. This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and TIB–Leibniz Information Centre for Science and Technology.

PY - 2022/12/7

Y1 - 2022/12/7

N2 - A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static analysis of their (meta)data and contents (in particular scripts in languages such as Python). The approach can be summarized as follows. First, we extract metadata information (software description, programming languages, related references) from software packages by leveraging the Software Metadata Extraction Framework (SOMEF) and the GitHub API. Second, we analyze the extracted metadata to find the research articles associated with the corresponding software repository. Third, for software contained in published packages, we create and analyze the Abstract Syntax Tree (AST) representation to extract information about the procedures performed on data. Fourth, we search the extracted information in the full text of related articles to constrain the extracted information to scholarly knowledge, i.e. information published in the scholarly literature. Finally, we publish the extracted machine actionable scholarly knowledge in the Open Research Knowledge Graph (ORKG).

AB - A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static analysis of their (meta)data and contents (in particular scripts in languages such as Python). The approach can be summarized as follows. First, we extract metadata information (software description, programming languages, related references) from software packages by leveraging the Software Metadata Extraction Framework (SOMEF) and the GitHub API. Second, we analyze the extracted metadata to find the research articles associated with the corresponding software repository. Third, for software contained in published packages, we create and analyze the Abstract Syntax Tree (AST) representation to extract information about the procedures performed on data. Fourth, we search the extracted information in the full text of related articles to constrain the extracted information to scholarly knowledge, i.e. information published in the scholarly literature. Finally, we publish the extracted machine actionable scholarly knowledge in the Open Research Knowledge Graph (ORKG).

KW - Abstract syntax tree

KW - Analyzing software packages

KW - Code analysis

KW - Machine actionability

KW - Open research knowledge graph

KW - Scholarly communication

UR - http://www.scopus.com/inward/record.url?scp=85145010085&partnerID=8YFLogxK

U2 - https://doi.org/10.48550/arXiv.2212.07921

DO - https://doi.org/10.48550/arXiv.2212.07921

M3 - Conference contribution

AN - SCOPUS:85145010085

SN - 9783031217555

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 301

EP - 310

BT - From Born-Physical to Born-Virtual

A2 - Tseng, Yuen-Hsien

A2 - Katsurai, Marie

A2 - Nguyen, Hoa N.

PB - Springer Science and Business Media Deutschland GmbH

T2 - 24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022

Y2 - 30 November 2022 through 2 December 2022

ER -

Von denselben Autoren