Scholarly Knowledge Extraction from Published Software Packages

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationFrom Born-Physical to Born-Virtual
Subtitle of host publicationAugmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings
EditorsYuen-Hsien Tseng, Marie Katsurai, Hoa N. Nguyen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages301-310
Number of pages10
ISBN (print)9783031217555
Publication statusPublished - 7 Dec 2022
Event24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022 - Hanoi, Viet Nam
Duration: 30 Nov 20222 Dec 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13636 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static analysis of their (meta)data and contents (in particular scripts in languages such as Python). The approach can be summarized as follows. First, we extract metadata information (software description, programming languages, related references) from software packages by leveraging the Software Metadata Extraction Framework (SOMEF) and the GitHub API. Second, we analyze the extracted metadata to find the research articles associated with the corresponding software repository. Third, for software contained in published packages, we create and analyze the Abstract Syntax Tree (AST) representation to extract information about the procedures performed on data. Fourth, we search the extracted information in the full text of related articles to constrain the extracted information to scholarly knowledge, i.e. information published in the scholarly literature. Finally, we publish the extracted machine actionable scholarly knowledge in the Open Research Knowledge Graph (ORKG).

Keywords

    Abstract syntax tree, Analyzing software packages, Code analysis, Machine actionability, Open research knowledge graph, Scholarly communication

ASJC Scopus subject areas

Cite this

Scholarly Knowledge Extraction from Published Software Packages. / Haris, Muhammad; Stocker, Markus; Auer, Sören.
From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings. ed. / Yuen-Hsien Tseng; Marie Katsurai; Hoa N. Nguyen. Springer Science and Business Media Deutschland GmbH, 2022. p. 301-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13636 LNCS).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Haris, M, Stocker, M & Auer, S 2022, Scholarly Knowledge Extraction from Published Software Packages. in Y-H Tseng, M Katsurai & HN Nguyen (eds), From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13636 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 301-310, 24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022, Hanoi, Viet Nam, 30 Nov 2022. https://doi.org/10.48550/arXiv.2212.07921, https://doi.org/10.1007/978-3-031-21756-2_24
Haris, M., Stocker, M., & Auer, S. (2022). Scholarly Knowledge Extraction from Published Software Packages. In Y.-H. Tseng, M. Katsurai, & H. N. Nguyen (Eds.), From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings (pp. 301-310). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13636 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.48550/arXiv.2212.07921, https://doi.org/10.1007/978-3-031-21756-2_24
Haris M, Stocker M, Auer S. Scholarly Knowledge Extraction from Published Software Packages. In Tseng YH, Katsurai M, Nguyen HN, editors, From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2022. p. 301-310. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: https://doi.org/10.48550/arXiv.2212.07921, 10.1007/978-3-031-21756-2_24
Haris, Muhammad ; Stocker, Markus ; Auer, Sören. / Scholarly Knowledge Extraction from Published Software Packages. From Born-Physical to Born-Virtual: Augmenting Intelligence in Digital Libraries - 24th International Conference on Asian Digital Libraries, ICADL 2022, Proceedings. editor / Yuen-Hsien Tseng ; Marie Katsurai ; Hoa N. Nguyen. Springer Science and Business Media Deutschland GmbH, 2022. pp. 301-310 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{57a41e05f40e4dc79985ef44f8e06708,
title = "Scholarly Knowledge Extraction from Published Software Packages",
abstract = "A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static analysis of their (meta)data and contents (in particular scripts in languages such as Python). The approach can be summarized as follows. First, we extract metadata information (software description, programming languages, related references) from software packages by leveraging the Software Metadata Extraction Framework (SOMEF) and the GitHub API. Second, we analyze the extracted metadata to find the research articles associated with the corresponding software repository. Third, for software contained in published packages, we create and analyze the Abstract Syntax Tree (AST) representation to extract information about the procedures performed on data. Fourth, we search the extracted information in the full text of related articles to constrain the extracted information to scholarly knowledge, i.e. information published in the scholarly literature. Finally, we publish the extracted machine actionable scholarly knowledge in the Open Research Knowledge Graph (ORKG).",
keywords = "Abstract syntax tree, Analyzing software packages, Code analysis, Machine actionability, Open research knowledge graph, Scholarly communication",
author = "Muhammad Haris and Markus Stocker and S{\"o}ren Auer",
note = "Funding Information: Acknowledgment. This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and TIB–Leibniz Information Centre for Science and Technology. ; 24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022 ; Conference date: 30-11-2022 Through 02-12-2022",
year = "2022",
month = dec,
day = "7",
doi = "https://doi.org/10.48550/arXiv.2212.07921",
language = "English",
isbn = "9783031217555",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "301--310",
editor = "Yuen-Hsien Tseng and Marie Katsurai and Nguyen, {Hoa N.}",
booktitle = "From Born-Physical to Born-Virtual",
address = "Germany",

}

Download

TY - GEN

T1 - Scholarly Knowledge Extraction from Published Software Packages

AU - Haris, Muhammad

AU - Stocker, Markus

AU - Auer, Sören

N1 - Funding Information: Acknowledgment. This work was co-funded by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536) and TIB–Leibniz Information Centre for Science and Technology.

PY - 2022/12/7

Y1 - 2022/12/7

N2 - A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static analysis of their (meta)data and contents (in particular scripts in languages such as Python). The approach can be summarized as follows. First, we extract metadata information (software description, programming languages, related references) from software packages by leveraging the Software Metadata Extraction Framework (SOMEF) and the GitHub API. Second, we analyze the extracted metadata to find the research articles associated with the corresponding software repository. Third, for software contained in published packages, we create and analyze the Abstract Syntax Tree (AST) representation to extract information about the procedures performed on data. Fourth, we search the extracted information in the full text of related articles to constrain the extracted information to scholarly knowledge, i.e. information published in the scholarly literature. Finally, we publish the extracted machine actionable scholarly knowledge in the Open Research Knowledge Graph (ORKG).

AB - A plethora of scientific software packages are published in repositories, e.g., Zenodo and figshare. These software packages are crucial for the reproducibility of published research. As an additional route to scholarly knowledge graph construction, we propose an approach for automated extraction of machine actionable (structured) scholarly knowledge from published software packages by static analysis of their (meta)data and contents (in particular scripts in languages such as Python). The approach can be summarized as follows. First, we extract metadata information (software description, programming languages, related references) from software packages by leveraging the Software Metadata Extraction Framework (SOMEF) and the GitHub API. Second, we analyze the extracted metadata to find the research articles associated with the corresponding software repository. Third, for software contained in published packages, we create and analyze the Abstract Syntax Tree (AST) representation to extract information about the procedures performed on data. Fourth, we search the extracted information in the full text of related articles to constrain the extracted information to scholarly knowledge, i.e. information published in the scholarly literature. Finally, we publish the extracted machine actionable scholarly knowledge in the Open Research Knowledge Graph (ORKG).

KW - Abstract syntax tree

KW - Analyzing software packages

KW - Code analysis

KW - Machine actionability

KW - Open research knowledge graph

KW - Scholarly communication

UR - http://www.scopus.com/inward/record.url?scp=85145010085&partnerID=8YFLogxK

U2 - https://doi.org/10.48550/arXiv.2212.07921

DO - https://doi.org/10.48550/arXiv.2212.07921

M3 - Conference contribution

AN - SCOPUS:85145010085

SN - 9783031217555

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 301

EP - 310

BT - From Born-Physical to Born-Virtual

A2 - Tseng, Yuen-Hsien

A2 - Katsurai, Marie

A2 - Nguyen, Hoa N.

PB - Springer Science and Business Media Deutschland GmbH

T2 - 24th International Conference on Asia-Pacific Digital Libraries, ICADL 2022

Y2 - 30 November 2022 through 2 December 2022

ER -

By the same author(s)