Details
Original language | English |
---|---|
Qualification | Doctor of Engineering |
Awarding Institution | |
Supervised by |
|
Date of Award | 13 Apr 2023 |
Place of Publication | Hannover |
Publication status | Published - 2023 |
Abstract
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
Hannover, 2023. 159 p.
Research output: Thesis › Doctoral thesis
}
TY - BOOK
T1 - Semantic data integration and knowledge graph creation at scale
AU - Jozashoori, Samaneh
N1 - Doctoral thesis
PY - 2023
Y1 - 2023
N2 - Contrary to data, knowledge is often abstract. Concrete knowledge can be achieved through the inclusion of semantics in the data models, highlighting the role of data integration. The massive growing number of data, in recent years, has promoted the demand for scaling up data management techniques; materializing data integration, a.k.a., knowledge graph creation falls in that category. In this thesis, we investigate efficient methods and techniques for materializing data integration. We formalize the process of materializing data integration. We formally define the characteristics of a materialized data integration system that merge the data operators and sources. Owing to this formalism, both layers of data integration, including data and schema-level integration, are formalized in the context of mapping assertions. We explore optimization opportunities for improving the materialization of data integration systems. We recognize three angles including intra/inter-mapping assertions from which the materialization can be improved. Accordingly, we propose source-based, mapping-based, and inter-mapping assertion groups of optimization techniques. We utilize our proposed techniques in three real-world projects. We illustrate how applying these optimization techniques contribute to meeting the objectives of the mentioned projects. Furthermore, we study the parameters impacting the performance of materialization of data integration. Relying on reported parameters and the presumably impacting parameters, we build four groups of testbeds. We empirically study the performances of these different testbeds in the presence and absence of our proposed techniques, in terms of execution time. We observe that the savings can be up to 75%. Lastly, we contribute to facilitating the process of declarative data integration system definition. We propose two data operation function signatures in Function Ontology (FnO). The first set of functions is designed to perform the task of entity alignment by resorting to an entity and relation linking tool. The second library consists of domain-specific functions to align genomic entities by harmonizing their representations. Finally, we introduce a tool equipped with a user interface to facilitate the process of defining declarative mapping rules by allowing users to explore the data sources and unified schema while defining their correspondences.
AB - Contrary to data, knowledge is often abstract. Concrete knowledge can be achieved through the inclusion of semantics in the data models, highlighting the role of data integration. The massive growing number of data, in recent years, has promoted the demand for scaling up data management techniques; materializing data integration, a.k.a., knowledge graph creation falls in that category. In this thesis, we investigate efficient methods and techniques for materializing data integration. We formalize the process of materializing data integration. We formally define the characteristics of a materialized data integration system that merge the data operators and sources. Owing to this formalism, both layers of data integration, including data and schema-level integration, are formalized in the context of mapping assertions. We explore optimization opportunities for improving the materialization of data integration systems. We recognize three angles including intra/inter-mapping assertions from which the materialization can be improved. Accordingly, we propose source-based, mapping-based, and inter-mapping assertion groups of optimization techniques. We utilize our proposed techniques in three real-world projects. We illustrate how applying these optimization techniques contribute to meeting the objectives of the mentioned projects. Furthermore, we study the parameters impacting the performance of materialization of data integration. Relying on reported parameters and the presumably impacting parameters, we build four groups of testbeds. We empirically study the performances of these different testbeds in the presence and absence of our proposed techniques, in terms of execution time. We observe that the savings can be up to 75%. Lastly, we contribute to facilitating the process of declarative data integration system definition. We propose two data operation function signatures in Function Ontology (FnO). The first set of functions is designed to perform the task of entity alignment by resorting to an entity and relation linking tool. The second library consists of domain-specific functions to align genomic entities by harmonizing their representations. Finally, we introduce a tool equipped with a user interface to facilitate the process of defining declarative mapping rules by allowing users to explore the data sources and unified schema while defining their correspondences.
U2 - 10.15488/13537
DO - 10.15488/13537
M3 - Doctoral thesis
CY - Hannover
ER -