Enhancing virtual ontology based access over tabular data with Morph-CSV

David Chaves-Fraga; Edna Ruckhaus; Freddy Priyatna; Maria Esther Vidal; Oscar Corcho

doi:10.48550/arXiv.2001.09052

Details

Originalsprache	Englisch
Seiten (von - bis)	869-902
Seitenumfang	34
Fachzeitschrift	Semantic web
Jahrgang	12
Ausgabenummer	6
Publikationsstatus	Veröffentlicht - 2021

Abstract

Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing on-the-fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented (e.g., referential integrity among sources, datatypes, or data integrity); thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with the GTFS-Madrid benchmark; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of Morph-CSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.

ASJC Scopus Sachgebiete

Informatik (insg.)
Information systems
Informatik (insg.)
Angewandte Informatik
Informatik (insg.)
Computernetzwerke und -kommunikation

Zitieren

Enhancing virtual ontology based access over tabular data with Morph-CSV. / Chaves-Fraga, David; Ruckhaus, Edna; Priyatna, Freddy et al.
in: Semantic web, Jahrgang 12, Nr. 6, 2021, S. 869-902.

Publikation: Beitrag in Fachzeitschrift › Artikel › Forschung › Peer-Review

Chaves-Fraga, D, Ruckhaus, E, Priyatna, F, Vidal, ME & Corcho, O 2021, 'Enhancing virtual ontology based access over tabular data with Morph-CSV', Semantic web, Jg. 12, Nr. 6, S. 869-902. https://doi.org/10.48550/arXiv.2001.09052, https://doi.org/10.3233/SW-210432

Chaves-Fraga, D., Ruckhaus, E., Priyatna, F., Vidal, M. E., & Corcho, O. (2021). Enhancing virtual ontology based access over tabular data with Morph-CSV. Semantic web, 12(6), 869-902. https://doi.org/10.48550/arXiv.2001.09052, https://doi.org/10.3233/SW-210432

Chaves-Fraga D, Ruckhaus E, Priyatna F, Vidal ME, Corcho O. Enhancing virtual ontology based access over tabular data with Morph-CSV. Semantic web. 2021;12(6):869-902. doi: 10.48550/arXiv.2001.09052, 10.3233/SW-210432

Chaves-Fraga, David ; Ruckhaus, Edna ; Priyatna, Freddy et al. / Enhancing virtual ontology based access over tabular data with Morph-CSV. in: Semantic web. 2021 ; Jahrgang 12, Nr. 6. S. 869-902.

Download

@article{d9d0f57384e94d5e93d028c93e6557b8,

title = "Enhancing virtual ontology based access over tabular data with Morph-CSV",

abstract = "Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing on-the-fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented (e.g., referential integrity among sources, datatypes, or data integrity); thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with the GTFS-Madrid benchmark; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of Morph-CSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.",

keywords = "constraints, Knowledge graphs, mapping languages, tabular data",

author = "David Chaves-Fraga and Edna Ruckhaus and Freddy Priyatna and Vidal, {Maria Esther} and Oscar Corcho",

note = "Funding Information: We are very thankful to Anastasia Dimou, Ben de Meester and Pieter Heyvaert (the RML team), who helped us in the initial discussions about the main contributions of our approach and in the creation of (YARR)RML mappings. We are also very thankful to the developers of Morph-CSV: Jhon Toledo and Luis Pozo-Gilo. The work presented in this paper is supported by the Spanish Ministerio de Econom{\'i}a, Indus-tria y Competitividad and EU FEDER funds under the DATOS 4.0: RETOS Y SOLUCIONES – UPM Spanish national project (TIN2016-78011-C4-4-R) and by an FPI grant (BES-2017-082511). ",

year = "2021",

doi = "10.48550/arXiv.2001.09052",

language = "English",

volume = "12",

pages = "869--902",

journal = "Semantic web",

issn = "1570-0844",

publisher = "IOS Press",

number = "6",

}

Download

TY - JOUR

T1 - Enhancing virtual ontology based access over tabular data with Morph-CSV

AU - Chaves-Fraga, David

AU - Ruckhaus, Edna

AU - Priyatna, Freddy

AU - Vidal, Maria Esther

AU - Corcho, Oscar

N1 - Funding Information: We are very thankful to Anastasia Dimou, Ben de Meester and Pieter Heyvaert (the RML team), who helped us in the initial discussions about the main contributions of our approach and in the creation of (YARR)RML mappings. We are also very thankful to the developers of Morph-CSV: Jhon Toledo and Luis Pozo-Gilo. The work presented in this paper is supported by the Spanish Ministerio de Economía, Indus-tria y Competitividad and EU FEDER funds under the DATOS 4.0: RETOS Y SOLUCIONES – UPM Spanish national project (TIN2016-78011-C4-4-R) and by an FPI grant (BES-2017-082511).

PY - 2021

Y1 - 2021

N2 - Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing on-the-fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented (e.g., referential integrity among sources, datatypes, or data integrity); thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with the GTFS-Madrid benchmark; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of Morph-CSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.

AB - Ontology-Based Data Access (OBDA) has traditionally focused on providing a unified view of heterogeneous datasets (e.g., relational databases, CSV and JSON files), either by materializing integrated data into RDF or by performing on-the-fly querying via SPARQL query translation. In the specific case of tabular datasets represented as several CSV or Excel files, query translation approaches have been applied by considering each source as a single table that can be loaded into a relational database management system (RDBMS). Nevertheless, constraints over these tables are not represented (e.g., referential integrity among sources, datatypes, or data integrity); thus, neither consistency among attributes nor indexes over tables are enforced. As a consequence, efficiency of the SPARQL-to-SQL translation process may be affected, as well as the completeness of the answers produced during the evaluation of the generated SQL query. Our work is focused on applying implicit constraints on the OBDA query translation process over tabular data. We propose Morph-CSV, a framework for querying tabular data that exploits information from typical OBDA inputs (e.g., mappings, queries) to enforce constraints that can be used together with any SPARQL-to-SQL OBDA engine. Morph-CSV relies on both a constraint component and a set of constraint operators. For a given set of constraints, the operators are applied to each type of constraint with the aim of enhancing query completeness and performance. We evaluate Morph-CSV in several domains: e-commerce with the BSBM benchmark; transportation with the GTFS-Madrid benchmark; and biology with a use case extracted from the Bio2RDF project. We compare and report the performance of two SPARQL-to-SQL OBDA engines, without and with the incorporation of Morph-CSV. The observed results suggest that Morph-CSV is able to speed up the total query execution time by up to two orders of magnitude, while it is able to produce all the query answers.

KW - constraints

KW - Knowledge graphs

KW - mapping languages

KW - tabular data

UR - http://www.scopus.com/inward/record.url?scp=85117904971&partnerID=8YFLogxK

U2 - 10.48550/arXiv.2001.09052

DO - 10.48550/arXiv.2001.09052

M3 - Article

AN - SCOPUS:85117904971

VL - 12

SP - 869

EP - 902

JO - Semantic web

JF - Semantic web

SN - 1570-0844

IS - 6

ER -

Research@Leibniz University

Enhancing virtual ontology based access over tabular data with Morph-CSV

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren