Context-based entity matching for big data

Mayesha Tasnim; Diego Collarana; Damien Graux; Maria Esther Vidal

doi:10.1007/978-3-030-53199-7_8

Details

Original language	English
Title of host publication	Knowledge Graphs and Big Data Processing
Place of Publication	Cham
Chapter	8
Pages	122-146
Number of pages	25
ISBN (electronic)	978-3-030-53199-7
Publication status	Published - 16 Jul 2020

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	12072 LNCS
ISSN (Print)	0302-9743
ISSN (electronic)	1611-3349

Abstract

In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner.

ASJC Scopus subject areas

Mathematics(all)
Theoretical Computer Science
Computer Science(all)
General Computer Science

Cite this

Context-based entity matching for big data. / Tasnim, Mayesha; Collarana, Diego; Graux, Damien et al.
Knowledge Graphs and Big Data Processing. Cham, 2020. p. 122-146 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12072 LNCS).

Research output: Chapter in book/report/conference proceeding › Contribution to book/anthology › Research › peer review

Tasnim, M, Collarana, D, Graux, D & Vidal, ME 2020, Context-based entity matching for big data. in Knowledge Graphs and Big Data Processing. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12072 LNCS, Cham, pp. 122-146. https://doi.org/10.1007/978-3-030-53199-7_8

Tasnim, M., Collarana, D., Graux, D., & Vidal, M. E. (2020). Context-based entity matching for big data. In Knowledge Graphs and Big Data Processing (pp. 122-146). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 12072 LNCS).. https://doi.org/10.1007/978-3-030-53199-7_8

Tasnim M, Collarana D, Graux D, Vidal ME. Context-based entity matching for big data. In Knowledge Graphs and Big Data Processing. Cham. 2020. p. 122-146. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-53199-7_8

Tasnim, Mayesha ; Collarana, Diego ; Graux, Damien et al. / Context-based entity matching for big data. Knowledge Graphs and Big Data Processing. Cham, 2020. pp. 122-146 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inbook{42df704872534d45a88340360936a2d8,

title = "Context-based entity matching for big data",

abstract = "In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner.",

author = "Mayesha Tasnim and Diego Collarana and Damien Graux and Vidal, {Maria Esther}",

year = "2020",

month = jul,

day = "16",

doi = "10.1007/978-3-030-53199-7_8",

language = "English",

isbn = "978-3-030-53198-0",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "122--146",

booktitle = "Knowledge Graphs and Big Data Processing",

}

Download

TY - CHAP

T1 - Context-based entity matching for big data

AU - Tasnim, Mayesha

AU - Collarana, Diego

AU - Graux, Damien

AU - Vidal, Maria Esther

PY - 2020/7/16

Y1 - 2020/7/16

N2 - In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner.

AB - In the Big Data era, where variety is the most dominant dimension, the RDF data model enables the creation and integration of actionable knowledge from heterogeneous data sources. However, the RDF data model allows for describing entities under various contexts, e.g., people can be described from its demographic context, but as well from their professional contexts. Context-aware description poses challenges during entity matching of RDF datasets—the match might not be valid in every context. To perform a contextually relevant entity matching, the specific context under which a data-driven task, e.g., data integration is performed, must be taken into account. However, existing approaches only consider inter-schema and properties mapping of different data sources and prevent users from selecting contexts and conditions during a data integration process. We devise COMET, an entity matching technique that relies on both the knowledge stated in RDF vocabularies and a context-based similarity metric to map contextually equivalent RDF graphs. COMET follows a two-fold approach to solve the problem of entity matching in RDF graphs in a context-aware manner. In the first step, COMET computes the similarity measures across RDF entities and resorts to the Formal Concept Analysis algorithm to map contextually equivalent RDF entities. Finally, COMET combines the results of the first step and executes a 1-1 perfect matching algorithm for matching RDF entities based on the combined scores. We empirically evaluate the performance of COMET on testbed from DBpedia. The experimental results suggest that COMET accurately matches equivalent RDF graphs in a context-dependent manner.

UR - http://www.scopus.com/inward/record.url?scp=85089507687&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-53199-7_8

DO - 10.1007/978-3-030-53199-7_8

M3 - Contribution to book/anthology

AN - SCOPUS:85089507687

SN - 978-3-030-53198-0

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 122

EP - 146

BT - Knowledge Graphs and Big Data Processing

CY - Cham

ER -

Research@Leibniz University

Context-based entity matching for big data

Authors

Research Organisations

External Research Organisations

Details

Publication series

Abstract

ASJC Scopus subject areas

Cite this