Querying data lakes using spark and presto

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Mohamed Nadjib Mami
  • Damien Graux
  • Simon Scerri
  • Hajira Jabeen
  • Sören Auer

Organisationseinheiten

Externe Organisationen

  • Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme (IAIS)
  • Rheinische Friedrich-Wilhelms-Universität Bonn
  • Technische Informationsbibliothek (TIB) Leibniz-Informationszentrum Technik und Naturwissenschaften und Universitätsbibliothek
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksThe Web Conference 2019
UntertitelProceedings of the World Wide Web Conference, WWW 2019
Herausgeber/-innenLing Liu, Ryen White
ErscheinungsortNew York
Seiten3574-3578
Seitenumfang5
ISBN (elektronisch)9781450366748
PublikationsstatusVeröffentlicht - Mai 2019
Veranstaltung2019 World Wide Web Conference, WWW 2019 - San Francisco, USA / Vereinigte Staaten
Dauer: 13 Mai 201917 Mai 2019

Abstract

Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.

ASJC Scopus Sachgebiete

Zitieren

Querying data lakes using spark and presto. / Mami, Mohamed Nadjib; Graux, Damien; Scerri, Simon et al.
The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. Hrsg. / Ling Liu; Ryen White. New York, 2019. S. 3574-3578.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Mami, MN, Graux, D, Scerri, S, Jabeen, H & Auer, S 2019, Querying data lakes using spark and presto. in L Liu & R White (Hrsg.), The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. New York, S. 3574-3578, 2019 World Wide Web Conference, WWW 2019, San Francisco, USA / Vereinigte Staaten, 13 Mai 2019. https://doi.org/10.1145/3308558.3314132
Mami, M. N., Graux, D., Scerri, S., Jabeen, H., & Auer, S. (2019). Querying data lakes using spark and presto. In L. Liu, & R. White (Hrsg.), The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019 (S. 3574-3578). https://doi.org/10.1145/3308558.3314132
Mami MN, Graux D, Scerri S, Jabeen H, Auer S. Querying data lakes using spark and presto. in Liu L, White R, Hrsg., The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. New York. 2019. S. 3574-3578 doi: 10.1145/3308558.3314132
Mami, Mohamed Nadjib ; Graux, Damien ; Scerri, Simon et al. / Querying data lakes using spark and presto. The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. Hrsg. / Ling Liu ; Ryen White. New York, 2019. S. 3574-3578
Download
@inproceedings{e77ed988ef434b7c8ad4404d19866763,
title = "Querying data lakes using spark and presto",
abstract = "Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.",
keywords = "Data Lake, Heterogeneous Databases, NoSQL, Query, SPARQL, SQL",
author = "Mami, {Mohamed Nadjib} and Damien Graux and Simon Scerri and Hajira Jabeen and S{\"o}ren Auer",
note = "Funding information: This research was partially supported by the European Union{\textquoteright}s H2020 research and innovation programme BETTER under the Grant Agreement number 776280.; 2019 World Wide Web Conference, WWW 2019 ; Conference date: 13-05-2019 Through 17-05-2019",
year = "2019",
month = may,
doi = "10.1145/3308558.3314132",
language = "English",
pages = "3574--3578",
editor = "Ling Liu and Ryen White",
booktitle = "The Web Conference 2019",

}

Download

TY - GEN

T1 - Querying data lakes using spark and presto

AU - Mami, Mohamed Nadjib

AU - Graux, Damien

AU - Scerri, Simon

AU - Jabeen, Hajira

AU - Auer, Sören

N1 - Funding information: This research was partially supported by the European Union’s H2020 research and innovation programme BETTER under the Grant Agreement number 776280.

PY - 2019/5

Y1 - 2019/5

N2 - Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.

AB - Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.

KW - Data Lake

KW - Heterogeneous Databases

KW - NoSQL

KW - Query

KW - SPARQL

KW - SQL

U2 - 10.1145/3308558.3314132

DO - 10.1145/3308558.3314132

M3 - Conference contribution

AN - SCOPUS:85066892349

SP - 3574

EP - 3578

BT - The Web Conference 2019

A2 - Liu, Ling

A2 - White, Ryen

CY - New York

T2 - 2019 World Wide Web Conference, WWW 2019

Y2 - 13 May 2019 through 17 May 2019

ER -

Von denselben Autoren