Querying data lakes using spark and presto

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Mohamed Nadjib Mami
  • Damien Graux
  • Simon Scerri
  • Hajira Jabeen
  • Sören Auer

Research Organisations

External Research Organisations

  • Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)
  • University of Bonn
  • German National Library of Science and Technology (TIB)
View graph of relations

Details

Original languageEnglish
Title of host publicationThe Web Conference 2019
Subtitle of host publicationProceedings of the World Wide Web Conference, WWW 2019
EditorsLing Liu, Ryen White
Place of PublicationNew York
Pages3574-3578
Number of pages5
ISBN (electronic)9781450366748
Publication statusPublished - May 2019
Event2019 World Wide Web Conference, WWW 2019 - San Francisco, United States
Duration: 13 May 201917 May 2019

Abstract

Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.

Keywords

    Data Lake, Heterogeneous Databases, NoSQL, Query, SPARQL, SQL

ASJC Scopus subject areas

Cite this

Querying data lakes using spark and presto. / Mami, Mohamed Nadjib; Graux, Damien; Scerri, Simon et al.
The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. ed. / Ling Liu; Ryen White. New York, 2019. p. 3574-3578.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Mami, MN, Graux, D, Scerri, S, Jabeen, H & Auer, S 2019, Querying data lakes using spark and presto. in L Liu & R White (eds), The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. New York, pp. 3574-3578, 2019 World Wide Web Conference, WWW 2019, San Francisco, United States, 13 May 2019. https://doi.org/10.1145/3308558.3314132
Mami, M. N., Graux, D., Scerri, S., Jabeen, H., & Auer, S. (2019). Querying data lakes using spark and presto. In L. Liu, & R. White (Eds.), The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019 (pp. 3574-3578). https://doi.org/10.1145/3308558.3314132
Mami MN, Graux D, Scerri S, Jabeen H, Auer S. Querying data lakes using spark and presto. In Liu L, White R, editors, The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. New York. 2019. p. 3574-3578 doi: 10.1145/3308558.3314132
Mami, Mohamed Nadjib ; Graux, Damien ; Scerri, Simon et al. / Querying data lakes using spark and presto. The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. editor / Ling Liu ; Ryen White. New York, 2019. pp. 3574-3578
Download
@inproceedings{e77ed988ef434b7c8ad4404d19866763,
title = "Querying data lakes using spark and presto",
abstract = "Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.",
keywords = "Data Lake, Heterogeneous Databases, NoSQL, Query, SPARQL, SQL",
author = "Mami, {Mohamed Nadjib} and Damien Graux and Simon Scerri and Hajira Jabeen and S{\"o}ren Auer",
note = "Funding information: This research was partially supported by the European Union{\textquoteright}s H2020 research and innovation programme BETTER under the Grant Agreement number 776280.; 2019 World Wide Web Conference, WWW 2019 ; Conference date: 13-05-2019 Through 17-05-2019",
year = "2019",
month = may,
doi = "10.1145/3308558.3314132",
language = "English",
pages = "3574--3578",
editor = "Ling Liu and Ryen White",
booktitle = "The Web Conference 2019",

}

Download

TY - GEN

T1 - Querying data lakes using spark and presto

AU - Mami, Mohamed Nadjib

AU - Graux, Damien

AU - Scerri, Simon

AU - Jabeen, Hajira

AU - Auer, Sören

N1 - Funding information: This research was partially supported by the European Union’s H2020 research and innovation programme BETTER under the Grant Agreement number 776280.

PY - 2019/5

Y1 - 2019/5

N2 - Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.

AB - Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.

KW - Data Lake

KW - Heterogeneous Databases

KW - NoSQL

KW - Query

KW - SPARQL

KW - SQL

U2 - 10.1145/3308558.3314132

DO - 10.1145/3308558.3314132

M3 - Conference contribution

AN - SCOPUS:85066892349

SP - 3574

EP - 3578

BT - The Web Conference 2019

A2 - Liu, Ling

A2 - White, Ryen

CY - New York

T2 - 2019 World Wide Web Conference, WWW 2019

Y2 - 13 May 2019 through 17 May 2019

ER -

By the same author(s)