Optimizing federated queries based on the physical design of a data lake

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Philipp D. Rohde
  • Maria Esther Vidal

External Research Organisations

  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationEDBT/ICDT 2020 Workshops
Subtitle of host publicationProceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference
Publication statusPublished - 2020
Externally publishedYes
EventWorkshops of the 23rd International Conference on Extending Database Technology/23rd International Conference on Database Theory, EDBT-ICDT-WS 2020 - Copenhagen, Denmark
Duration: 30 Mar 20202 Apr 2020

Publication series

NameCEUR Workshop Proceedings
PublisherCEUR Workshop Proceedings
Number2578
ISSN (Print)1613-0073

Abstract

The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.

ASJC Scopus subject areas

Cite this

Optimizing federated queries based on the physical design of a data lake. / Rohde, Philipp D.; Vidal, Maria Esther.
EDBT/ICDT 2020 Workshops: Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference. 2020. (CEUR Workshop Proceedings; No. 2578).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Rohde, PD & Vidal, ME 2020, Optimizing federated queries based on the physical design of a data lake. in EDBT/ICDT 2020 Workshops: Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference. CEUR Workshop Proceedings, no. 2578, Workshops of the 23rd International Conference on Extending Database Technology/23rd International Conference on Database Theory, EDBT-ICDT-WS 2020, Copenhagen, Denmark, 30 Mar 2020. <https://ceur-ws.org/Vol-2578/SEAData6.pdf>
Rohde, P. D., & Vidal, M. E. (2020). Optimizing federated queries based on the physical design of a data lake. In EDBT/ICDT 2020 Workshops: Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference (CEUR Workshop Proceedings; No. 2578). https://ceur-ws.org/Vol-2578/SEAData6.pdf
Rohde PD, Vidal ME. Optimizing federated queries based on the physical design of a data lake. In EDBT/ICDT 2020 Workshops: Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference. 2020. (CEUR Workshop Proceedings; 2578).
Rohde, Philipp D. ; Vidal, Maria Esther. / Optimizing federated queries based on the physical design of a data lake. EDBT/ICDT 2020 Workshops: Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference. 2020. (CEUR Workshop Proceedings; 2578).
Download
@inproceedings{b1f3aa8a068e4633af417212cf90bd06,
title = "Optimizing federated queries based on the physical design of a data lake",
abstract = "The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.",
author = "Rohde, {Philipp D.} and Vidal, {Maria Esther}",
note = "Funding information: This work has been partially supported by the EU H2020 RIA funded projects QualiChain (No 822404) and iASiS (No 727658).; Workshops of the 23rd International Conference on Extending Database Technology/23rd International Conference on Database Theory, EDBT-ICDT-WS 2020 ; Conference date: 30-03-2020 Through 02-04-2020",
year = "2020",
language = "English",
series = "CEUR Workshop Proceedings",
publisher = "CEUR Workshop Proceedings",
number = "2578",
booktitle = "EDBT/ICDT 2020 Workshops",

}

Download

TY - GEN

T1 - Optimizing federated queries based on the physical design of a data lake

AU - Rohde, Philipp D.

AU - Vidal, Maria Esther

N1 - Funding information: This work has been partially supported by the EU H2020 RIA funded projects QualiChain (No 822404) and iASiS (No 727658).

PY - 2020

Y1 - 2020

N2 - The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.

AB - The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.

UR - http://www.scopus.com/inward/record.url?scp=85082741950&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85082741950

T3 - CEUR Workshop Proceedings

BT - EDBT/ICDT 2020 Workshops

T2 - Workshops of the 23rd International Conference on Extending Database Technology/23rd International Conference on Database Theory, EDBT-ICDT-WS 2020

Y2 - 30 March 2020 through 2 April 2020

ER -