Uniform Access to Multiform Data Lakes using Semantic Technologies

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Mohamed Nadjib Mami
  • Damien Graux
  • Simon Scerri
  • Hajira Jabeen
  • Soren Auer
  • Jens Lehmann

External Research Organisations

  • University of Bonn
  • Trinity College Dublin
  • German National Library of Science and Technology (TIB)
  • Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)
View graph of relations

Details

Original languageEnglish
Title of host publication21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings
Subtitle of host publication 21st International Conference on Information Integration and Web-Based Applications & Services
EditorsMaria Indrawan-Santiago, Eric Pardede, Ivan Luiz Salvadori, Matthias Steinbauer, Ismail Khalil, Gabriele Anderst-Kotsis
PublisherAssociation for Computing Machinery (ACM)
ISBN (electronic)9781450371797
Publication statusPublished - 2 Dec 2019
Externally publishedYes
Event21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Munich, Germany
Duration: 2 Dec 20194 Dec 2019

Publication series

NameACM International Conference Proceeding Series

Abstract

Increasing data volumes have extensively increased application possibilities. However, accessing this data in an ad hoc manner remains an unsolved problem due to the diversity of data management approaches, formats and storage frameworks, resulting in the need to effectively access and process distributed heterogeneous data at scale. For years, SemanticWeb techniques have addressed data integration challenges with practical knowledge representation models and ontology-based mappings. Leveraging these techniques, we provide a solution enabling uniform access to large, heterogeneous data sources, without enforcing centralization; thus realizing the vision of a Semantic Data Lake. In this paper, we define the core concepts underlying this vision and the architectural requirements that systems implementing it need to fulfill. Squerall, an example of such a system, is an extensible framework built on top of state-ofthe- A rt Big Data technologies. We focus on Squerall's distributed query execution techniques and strategies, empirically evaluating its performance throughout its various sub-phases.

Keywords

    Big Data, Data Variety, NoSQL, Semantic Data Lake, SPARQL

ASJC Scopus subject areas

Cite this

Uniform Access to Multiform Data Lakes using Semantic Technologies. / Mami, Mohamed Nadjib; Graux, Damien; Scerri, Simon et al.
21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings: 21st International Conference on Information Integration and Web-Based Applications & Services. ed. / Maria Indrawan-Santiago; Eric Pardede; Ivan Luiz Salvadori; Matthias Steinbauer; Ismail Khalil; Gabriele Anderst-Kotsis. Association for Computing Machinery (ACM), 2019. (ACM International Conference Proceeding Series).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Mami, MN, Graux, D, Scerri, S, Jabeen, H, Auer, S & Lehmann, J 2019, Uniform Access to Multiform Data Lakes using Semantic Technologies. in M Indrawan-Santiago, E Pardede, IL Salvadori, M Steinbauer, I Khalil & G Anderst-Kotsis (eds), 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings: 21st International Conference on Information Integration and Web-Based Applications & Services. ACM International Conference Proceeding Series, Association for Computing Machinery (ACM), 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019, Munich, Germany, 2 Dec 2019. https://doi.org/10.1145/3366030.3366054
Mami, M. N., Graux, D., Scerri, S., Jabeen, H., Auer, S., & Lehmann, J. (2019). Uniform Access to Multiform Data Lakes using Semantic Technologies. In M. Indrawan-Santiago, E. Pardede, I. L. Salvadori, M. Steinbauer, I. Khalil, & G. Anderst-Kotsis (Eds.), 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings: 21st International Conference on Information Integration and Web-Based Applications & Services (ACM International Conference Proceeding Series). Association for Computing Machinery (ACM). https://doi.org/10.1145/3366030.3366054
Mami MN, Graux D, Scerri S, Jabeen H, Auer S, Lehmann J. Uniform Access to Multiform Data Lakes using Semantic Technologies. In Indrawan-Santiago M, Pardede E, Salvadori IL, Steinbauer M, Khalil I, Anderst-Kotsis G, editors, 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings: 21st International Conference on Information Integration and Web-Based Applications & Services. Association for Computing Machinery (ACM). 2019. (ACM International Conference Proceeding Series). doi: 10.1145/3366030.3366054
Mami, Mohamed Nadjib ; Graux, Damien ; Scerri, Simon et al. / Uniform Access to Multiform Data Lakes using Semantic Technologies. 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings: 21st International Conference on Information Integration and Web-Based Applications & Services. editor / Maria Indrawan-Santiago ; Eric Pardede ; Ivan Luiz Salvadori ; Matthias Steinbauer ; Ismail Khalil ; Gabriele Anderst-Kotsis. Association for Computing Machinery (ACM), 2019. (ACM International Conference Proceeding Series).
Download
@inproceedings{b549a52b41674cefbd9ffaab195a1977,
title = "Uniform Access to Multiform Data Lakes using Semantic Technologies",
abstract = "Increasing data volumes have extensively increased application possibilities. However, accessing this data in an ad hoc manner remains an unsolved problem due to the diversity of data management approaches, formats and storage frameworks, resulting in the need to effectively access and process distributed heterogeneous data at scale. For years, SemanticWeb techniques have addressed data integration challenges with practical knowledge representation models and ontology-based mappings. Leveraging these techniques, we provide a solution enabling uniform access to large, heterogeneous data sources, without enforcing centralization; thus realizing the vision of a Semantic Data Lake. In this paper, we define the core concepts underlying this vision and the architectural requirements that systems implementing it need to fulfill. Squerall, an example of such a system, is an extensible framework built on top of state-ofthe- A rt Big Data technologies. We focus on Squerall's distributed query execution techniques and strategies, empirically evaluating its performance throughout its various sub-phases.",
keywords = "Big Data, Data Variety, NoSQL, Semantic Data Lake, SPARQL",
author = "Mami, {Mohamed Nadjib} and Damien Graux and Simon Scerri and Hajira Jabeen and Soren Auer and Jens Lehmann",
note = "Funding Information: This work is partly supported by the EU H2020 projects BETTER (GA 776280) and QualiChain (GA 822404); and by the ADAPT Centre for Digital Content Technology funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded under the European Regional Development Fund.; 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 ; Conference date: 02-12-2019 Through 04-12-2019",
year = "2019",
month = dec,
day = "2",
doi = "10.1145/3366030.3366054",
language = "English",
series = "ACM International Conference Proceeding Series",
publisher = "Association for Computing Machinery (ACM)",
editor = "Maria Indrawan-Santiago and Eric Pardede and Salvadori, {Ivan Luiz} and Matthias Steinbauer and Ismail Khalil and Gabriele Anderst-Kotsis",
booktitle = "21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings",
address = "United States",

}

Download

TY - GEN

T1 - Uniform Access to Multiform Data Lakes using Semantic Technologies

AU - Mami, Mohamed Nadjib

AU - Graux, Damien

AU - Scerri, Simon

AU - Jabeen, Hajira

AU - Auer, Soren

AU - Lehmann, Jens

N1 - Funding Information: This work is partly supported by the EU H2020 projects BETTER (GA 776280) and QualiChain (GA 822404); and by the ADAPT Centre for Digital Content Technology funded under the SFI Research Centres Programme (Grant 13/RC/2106) and co-funded under the European Regional Development Fund.

PY - 2019/12/2

Y1 - 2019/12/2

N2 - Increasing data volumes have extensively increased application possibilities. However, accessing this data in an ad hoc manner remains an unsolved problem due to the diversity of data management approaches, formats and storage frameworks, resulting in the need to effectively access and process distributed heterogeneous data at scale. For years, SemanticWeb techniques have addressed data integration challenges with practical knowledge representation models and ontology-based mappings. Leveraging these techniques, we provide a solution enabling uniform access to large, heterogeneous data sources, without enforcing centralization; thus realizing the vision of a Semantic Data Lake. In this paper, we define the core concepts underlying this vision and the architectural requirements that systems implementing it need to fulfill. Squerall, an example of such a system, is an extensible framework built on top of state-ofthe- A rt Big Data technologies. We focus on Squerall's distributed query execution techniques and strategies, empirically evaluating its performance throughout its various sub-phases.

AB - Increasing data volumes have extensively increased application possibilities. However, accessing this data in an ad hoc manner remains an unsolved problem due to the diversity of data management approaches, formats and storage frameworks, resulting in the need to effectively access and process distributed heterogeneous data at scale. For years, SemanticWeb techniques have addressed data integration challenges with practical knowledge representation models and ontology-based mappings. Leveraging these techniques, we provide a solution enabling uniform access to large, heterogeneous data sources, without enforcing centralization; thus realizing the vision of a Semantic Data Lake. In this paper, we define the core concepts underlying this vision and the architectural requirements that systems implementing it need to fulfill. Squerall, an example of such a system, is an extensible framework built on top of state-ofthe- A rt Big Data technologies. We focus on Squerall's distributed query execution techniques and strategies, empirically evaluating its performance throughout its various sub-phases.

KW - Big Data

KW - Data Variety

KW - NoSQL

KW - Semantic Data Lake

KW - SPARQL

UR - http://www.scopus.com/inward/record.url?scp=85117539584&partnerID=8YFLogxK

U2 - 10.1145/3366030.3366054

DO - 10.1145/3366030.3366054

M3 - Conference contribution

T3 - ACM International Conference Proceeding Series

BT - 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019 - Proceedings

A2 - Indrawan-Santiago, Maria

A2 - Pardede, Eric

A2 - Salvadori, Ivan Luiz

A2 - Steinbauer, Matthias

A2 - Khalil, Ismail

A2 - Anderst-Kotsis, Gabriele

PB - Association for Computing Machinery (ACM)

T2 - 21st International Conference on Information Integration and Web-Based Applications and Services, iiWAS 2019

Y2 - 2 December 2019 through 4 December 2019

ER -

By the same author(s)