SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Enrique Iglesias
  • Samaneh Jozashoori
  • David Chaves-Fraga
  • Diego Collarana
  • Maria Esther Vidal

Research Organisations

External Research Organisations

  • German National Library of Science and Technology (TIB)
  • Technical University of Madrid (UPM)
  • University of Bonn
View graph of relations

Details

Original languageEnglish
Title of host publicationCIKM 2020
Subtitle of host publicationProceedings of the 29th ACM International Conference on Information and Knowledge Management
PublisherAssociation for Computing Machinery (ACM)
Pages3039-3046
Number of pages8
ISBN (electronic)9781450368599
Publication statusPublished - 19 Oct 2020
Event29th ACM International Conference on Information and Knowledge Management, CIKM 2020 - online, Virtual, Online, Ireland
Duration: 19 Oct 202023 Oct 2020

Publication series

NameInternational Conference on Information and Knowledge Management, Proceedings

Abstract

In recent years, the amount of data has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. However, data complexity issues like large volume, high-duplicate rate, and heterogeneity usually characterize these data sources, being required data management tools able to address the negative impact of these issues on the knowledge graph creation process. In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. SDM-RDFizer implements novel algorithms to execute the logical operators between mappings in RML, allowing thus to scale up to complex scenarios where data is not only broad but has a high-duplication rate. We empirically evaluate the SDM-RDFizer performance against diverse testbeds with diverse configurations of data volume, duplicates, and heterogeneity. The observed results indicate that SDM-RDFizer is two orders of magnitude faster than state of the art, thus, meaning that SDM-RDFizer an interoperable and scalable solution for knowledge graph creation. SDM-RDFizer is publicly available as a resource through a Github repository and a DOI.

Keywords

    knowledge graph, rdf, rml

ASJC Scopus subject areas

Cite this

SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. / Iglesias, Enrique; Jozashoori, Samaneh; Chaves-Fraga, David et al.
CIKM 2020: Proceedings of the 29th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery (ACM), 2020. p. 3039-3046 (International Conference on Information and Knowledge Management, Proceedings).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Iglesias, E, Jozashoori, S, Chaves-Fraga, D, Collarana, D & Vidal, ME 2020, SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. in CIKM 2020: Proceedings of the 29th ACM International Conference on Information and Knowledge Management. International Conference on Information and Knowledge Management, Proceedings, Association for Computing Machinery (ACM), pp. 3039-3046, 29th ACM International Conference on Information and Knowledge Management, CIKM 2020, Virtual, Online, Ireland, 19 Oct 2020. https://doi.org/10.48550/arXiv.2008.07176, https://doi.org/10.1145/3340531.3412881
Iglesias, E., Jozashoori, S., Chaves-Fraga, D., Collarana, D., & Vidal, M. E. (2020). SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. In CIKM 2020: Proceedings of the 29th ACM International Conference on Information and Knowledge Management (pp. 3039-3046). (International Conference on Information and Knowledge Management, Proceedings). Association for Computing Machinery (ACM). https://doi.org/10.48550/arXiv.2008.07176, https://doi.org/10.1145/3340531.3412881
Iglesias E, Jozashoori S, Chaves-Fraga D, Collarana D, Vidal ME. SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. In CIKM 2020: Proceedings of the 29th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery (ACM). 2020. p. 3039-3046. (International Conference on Information and Knowledge Management, Proceedings). doi: 10.48550/arXiv.2008.07176, 10.1145/3340531.3412881
Iglesias, Enrique ; Jozashoori, Samaneh ; Chaves-Fraga, David et al. / SDM-RDFizer : An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs. CIKM 2020: Proceedings of the 29th ACM International Conference on Information and Knowledge Management. Association for Computing Machinery (ACM), 2020. pp. 3039-3046 (International Conference on Information and Knowledge Management, Proceedings).
Download
@inproceedings{eb9e45b6b9c94d1fa3478f451934f80c,
title = "SDM-RDFizer: An RML Interpreter for the Efficient Creation of RDF Knowledge Graphs",
abstract = "In recent years, the amount of data has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. However, data complexity issues like large volume, high-duplicate rate, and heterogeneity usually characterize these data sources, being required data management tools able to address the negative impact of these issues on the knowledge graph creation process. In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. SDM-RDFizer implements novel algorithms to execute the logical operators between mappings in RML, allowing thus to scale up to complex scenarios where data is not only broad but has a high-duplication rate. We empirically evaluate the SDM-RDFizer performance against diverse testbeds with diverse configurations of data volume, duplicates, and heterogeneity. The observed results indicate that SDM-RDFizer is two orders of magnitude faster than state of the art, thus, meaning that SDM-RDFizer an interoperable and scalable solution for knowledge graph creation. SDM-RDFizer is publicly available as a resource through a Github repository and a DOI.",
keywords = "knowledge graph, rdf, rml",
author = "Enrique Iglesias and Samaneh Jozashoori and David Chaves-Fraga and Diego Collarana and Vidal, {Maria Esther}",
note = "Funding Information: This work has been partially supported by the EU H2020 RIA funded project iASiS with grant agreement No 727658, by Ministerio de Econom{\'i}a, Industria y Competitividad and EU FEDER funds under the DATOS 4.0: RETOS Y SOLUCIONES - UPM Spanish national project (TIN2016-78011-C4-4-R) and by the FPI grant (BES-2017-082511).; 29th ACM International Conference on Information and Knowledge Management, CIKM 2020 ; Conference date: 19-10-2020 Through 23-10-2020",
year = "2020",
month = oct,
day = "19",
doi = "10.48550/arXiv.2008.07176",
language = "English",
series = "International Conference on Information and Knowledge Management, Proceedings",
publisher = "Association for Computing Machinery (ACM)",
pages = "3039--3046",
booktitle = "CIKM 2020",
address = "United States",

}

Download

TY - GEN

T1 - SDM-RDFizer

T2 - 29th ACM International Conference on Information and Knowledge Management, CIKM 2020

AU - Iglesias, Enrique

AU - Jozashoori, Samaneh

AU - Chaves-Fraga, David

AU - Collarana, Diego

AU - Vidal, Maria Esther

N1 - Funding Information: This work has been partially supported by the EU H2020 RIA funded project iASiS with grant agreement No 727658, by Ministerio de Economía, Industria y Competitividad and EU FEDER funds under the DATOS 4.0: RETOS Y SOLUCIONES - UPM Spanish national project (TIN2016-78011-C4-4-R) and by the FPI grant (BES-2017-082511).

PY - 2020/10/19

Y1 - 2020/10/19

N2 - In recent years, the amount of data has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. However, data complexity issues like large volume, high-duplicate rate, and heterogeneity usually characterize these data sources, being required data management tools able to address the negative impact of these issues on the knowledge graph creation process. In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. SDM-RDFizer implements novel algorithms to execute the logical operators between mappings in RML, allowing thus to scale up to complex scenarios where data is not only broad but has a high-duplication rate. We empirically evaluate the SDM-RDFizer performance against diverse testbeds with diverse configurations of data volume, duplicates, and heterogeneity. The observed results indicate that SDM-RDFizer is two orders of magnitude faster than state of the art, thus, meaning that SDM-RDFizer an interoperable and scalable solution for knowledge graph creation. SDM-RDFizer is publicly available as a resource through a Github repository and a DOI.

AB - In recent years, the amount of data has increased exponentially, and knowledge graphs have gained attention as data structures to integrate data and knowledge harvested from myriad data sources. However, data complexity issues like large volume, high-duplicate rate, and heterogeneity usually characterize these data sources, being required data management tools able to address the negative impact of these issues on the knowledge graph creation process. In this paper, we propose the SDM-RDFizer, an interpreter of the RDF Mapping Language (RML), to transform raw data in various formats into an RDF knowledge graph. SDM-RDFizer implements novel algorithms to execute the logical operators between mappings in RML, allowing thus to scale up to complex scenarios where data is not only broad but has a high-duplication rate. We empirically evaluate the SDM-RDFizer performance against diverse testbeds with diverse configurations of data volume, duplicates, and heterogeneity. The observed results indicate that SDM-RDFizer is two orders of magnitude faster than state of the art, thus, meaning that SDM-RDFizer an interoperable and scalable solution for knowledge graph creation. SDM-RDFizer is publicly available as a resource through a Github repository and a DOI.

KW - knowledge graph

KW - rdf

KW - rml

UR - http://www.scopus.com/inward/record.url?scp=85095542622&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2008.07176

DO - 10.48550/arXiv.2008.07176

M3 - Conference contribution

AN - SCOPUS:85095542622

T3 - International Conference on Information and Knowledge Management, Proceedings

SP - 3039

EP - 3046

BT - CIKM 2020

PB - Association for Computing Machinery (ACM)

Y2 - 19 October 2020 through 23 October 2020

ER -