Details
Original language | English |
---|---|
Title of host publication | The Web Conference 2019 |
Subtitle of host publication | Proceedings of the World Wide Web Conference, WWW 2019 |
Editors | Ling Liu, Ryen White |
Place of Publication | New York |
Pages | 3574-3578 |
Number of pages | 5 |
ISBN (electronic) | 9781450366748 |
Publication status | Published - May 2019 |
Event | 2019 World Wide Web Conference, WWW 2019 - San Francisco, United States Duration: 13 May 2019 → 17 May 2019 |
Abstract
Keywords
- Data Lake, Heterogeneous Databases, NoSQL, Query, SPARQL, SQL
ASJC Scopus subject areas
- Computer Science(all)
- Computer Networks and Communications
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. ed. / Ling Liu; Ryen White. New York, 2019. p. 3574-3578.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Querying data lakes using spark and presto
AU - Mami, Mohamed Nadjib
AU - Graux, Damien
AU - Scerri, Simon
AU - Jabeen, Hajira
AU - Auer, Sören
N1 - Funding information: This research was partially supported by the European Union’s H2020 research and innovation programme BETTER under the Grant Agreement number 776280.
PY - 2019/5
Y1 - 2019/5
N2 - Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.
AB - Squerall is a tool that allows the querying of heterogeneous, large-scale data sources by leveraging state-of-the-art Big Data processing engines: Spark and Presto. Queries are posed on-demand against a Data Lake, i.e., directly on the original data sources without requiring prior data transformation. We showcase Squerall's ability to query five different data sources, including inter alia the popular Cassandra and MongoDB. In particular, we demonstrate how it can jointly query heterogeneous data sources, and how interested developers can easily extend it to support additional data sources. Graphical user interfaces (GUIs) are offered to support users in (1) building intra-source queries, and (2) creating required input files.
KW - Data Lake
KW - Heterogeneous Databases
KW - NoSQL
KW - Query
KW - SPARQL
KW - SQL
U2 - 10.1145/3308558.3314132
DO - 10.1145/3308558.3314132
M3 - Conference contribution
AN - SCOPUS:85066892349
SP - 3574
EP - 3578
BT - The Web Conference 2019
A2 - Liu, Ling
A2 - White, Ryen
CY - New York
T2 - 2019 World Wide Web Conference, WWW 2019
Y2 - 13 May 2019 through 17 May 2019
ER -