Details
Original language | English |
---|---|
Title of host publication | EDBT/ICDT 2020 Workshops |
Subtitle of host publication | Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference |
Publication status | Published - 2020 |
Externally published | Yes |
Event | Workshops of the 23rd International Conference on Extending Database Technology/23rd International Conference on Database Theory, EDBT-ICDT-WS 2020 - Copenhagen, Denmark Duration: 30 Mar 2020 → 2 Apr 2020 |
Publication series
Name | CEUR Workshop Proceedings |
---|---|
Publisher | CEUR Workshop Proceedings |
Number | 2578 |
ISSN (Print) | 1613-0073 |
Abstract
The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.
ASJC Scopus subject areas
- Computer Science(all)
- General Computer Science
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
EDBT/ICDT 2020 Workshops: Proceedings of the Workshops of the EDBT/ICDT 2020 Joint Conference. 2020. (CEUR Workshop Proceedings; No. 2578).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Optimizing federated queries based on the physical design of a data lake
AU - Rohde, Philipp D.
AU - Vidal, Maria Esther
N1 - Funding information: This work has been partially supported by the EU H2020 RIA funded projects QualiChain (No 822404) and iASiS (No 727658).
PY - 2020
Y1 - 2020
N2 - The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.
AB - The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing.
UR - http://www.scopus.com/inward/record.url?scp=85082741950&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85082741950
T3 - CEUR Workshop Proceedings
BT - EDBT/ICDT 2020 Workshops
T2 - Workshops of the 23rd International Conference on Extending Database Technology/23rd International Conference on Database Theory, EDBT-ICDT-WS 2020
Y2 - 30 March 2020 through 2 April 2020
ER -