Blocking music metadata from heterogenous data sources

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Oliver Pabst
  • Udo W. Lipeck
View graph of relations

Details

Original languageEnglish
Title of host publicationGrundlagen von Datenbanken
Subtitle of host publicationProceedings of the 30th GI-Workshop Grundlagen von Datenbanken
Pages53-58
Number of pages6
Publication statusPublished - 29 Jun 2018
Event30th GI-Workshop on the Foundations of Databases, GvDB 2018 - Wuppertal, Germany
Duration: 22 May 201825 May 2018

Publication series

NameCEUR Workshop Proceedings
Volume2126
ISSN (Print)1613-0073

Abstract

Entity resolution or object matching describes the assignment of different objects to each other that describe the same object of the real world. It is used in a variety of technical systems, e.g. systems that fuse different data sources. Blocking is used in this context as an approach to reduce the total amount of comparisons by grouping similar objects in the same cluster and dissimilar objects in different clusters. As a result only the objects of the same clusters have to be compared to each other. To deal with noise, for instance spelling errors, that can result from different heterogeneous data sources, various blocking approaches exist that may add or remove redundancy to the data. In this paper we propose a system that utilizes a derivative of the standard blocking technique to compute correspondences between objects as starting points for a graph matching process. The blocking technique, which usually relies on identity of blocking keys derived from attributes, is modified to cope with heterogenous source data with few attributes suitable for matching. A common criticism of standard blocking is low efficiency, since the block sizes are unbalanced with regard to the number of contained entities. We take precautions to keep the efficiency high by reducing the size and amount of large partitions. Copyright is held by the author/owner(s).

Keywords

    Blocking, Entity resolution, Matching

ASJC Scopus subject areas

Cite this

Blocking music metadata from heterogenous data sources. / Pabst, Oliver; Lipeck, Udo W.
Grundlagen von Datenbanken: Proceedings of the 30th GI-Workshop Grundlagen von Datenbanken. 2018. p. 53-58 (CEUR Workshop Proceedings; Vol. 2126).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Pabst, O & Lipeck, UW 2018, Blocking music metadata from heterogenous data sources. in Grundlagen von Datenbanken: Proceedings of the 30th GI-Workshop Grundlagen von Datenbanken. CEUR Workshop Proceedings, vol. 2126, pp. 53-58, 30th GI-Workshop on the Foundations of Databases, GvDB 2018, Wuppertal, Germany, 22 May 2018. <https://ceur-ws.org/Vol-2126/>
Pabst, O., & Lipeck, U. W. (2018). Blocking music metadata from heterogenous data sources. In Grundlagen von Datenbanken: Proceedings of the 30th GI-Workshop Grundlagen von Datenbanken (pp. 53-58). (CEUR Workshop Proceedings; Vol. 2126). https://ceur-ws.org/Vol-2126/
Pabst O, Lipeck UW. Blocking music metadata from heterogenous data sources. In Grundlagen von Datenbanken: Proceedings of the 30th GI-Workshop Grundlagen von Datenbanken. 2018. p. 53-58. (CEUR Workshop Proceedings).
Pabst, Oliver ; Lipeck, Udo W. / Blocking music metadata from heterogenous data sources. Grundlagen von Datenbanken: Proceedings of the 30th GI-Workshop Grundlagen von Datenbanken. 2018. pp. 53-58 (CEUR Workshop Proceedings).
Download
@inproceedings{4ef83fa7af9f49c3b6018dd700fbd012,
title = "Blocking music metadata from heterogenous data sources",
abstract = "Entity resolution or object matching describes the assignment of different objects to each other that describe the same object of the real world. It is used in a variety of technical systems, e.g. systems that fuse different data sources. Blocking is used in this context as an approach to reduce the total amount of comparisons by grouping similar objects in the same cluster and dissimilar objects in different clusters. As a result only the objects of the same clusters have to be compared to each other. To deal with noise, for instance spelling errors, that can result from different heterogeneous data sources, various blocking approaches exist that may add or remove redundancy to the data. In this paper we propose a system that utilizes a derivative of the standard blocking technique to compute correspondences between objects as starting points for a graph matching process. The blocking technique, which usually relies on identity of blocking keys derived from attributes, is modified to cope with heterogenous source data with few attributes suitable for matching. A common criticism of standard blocking is low efficiency, since the block sizes are unbalanced with regard to the number of contained entities. We take precautions to keep the efficiency high by reducing the size and amount of large partitions. Copyright is held by the author/owner(s).",
keywords = "Blocking, Entity resolution, Matching",
author = "Oliver Pabst and Lipeck, {Udo W.}",
note = "Publisher Copyright: {\textcopyright} 2018 CEUR-WS. All rights reserved.; 30th GI-Workshop on the Foundations of Databases, GvDB 2018 ; Conference date: 22-05-2018 Through 25-05-2018",
year = "2018",
month = jun,
day = "29",
language = "English",
series = "CEUR Workshop Proceedings",
pages = "53--58",
booktitle = "Grundlagen von Datenbanken",

}

Download

TY - GEN

T1 - Blocking music metadata from heterogenous data sources

AU - Pabst, Oliver

AU - Lipeck, Udo W.

N1 - Publisher Copyright: © 2018 CEUR-WS. All rights reserved.

PY - 2018/6/29

Y1 - 2018/6/29

N2 - Entity resolution or object matching describes the assignment of different objects to each other that describe the same object of the real world. It is used in a variety of technical systems, e.g. systems that fuse different data sources. Blocking is used in this context as an approach to reduce the total amount of comparisons by grouping similar objects in the same cluster and dissimilar objects in different clusters. As a result only the objects of the same clusters have to be compared to each other. To deal with noise, for instance spelling errors, that can result from different heterogeneous data sources, various blocking approaches exist that may add or remove redundancy to the data. In this paper we propose a system that utilizes a derivative of the standard blocking technique to compute correspondences between objects as starting points for a graph matching process. The blocking technique, which usually relies on identity of blocking keys derived from attributes, is modified to cope with heterogenous source data with few attributes suitable for matching. A common criticism of standard blocking is low efficiency, since the block sizes are unbalanced with regard to the number of contained entities. We take precautions to keep the efficiency high by reducing the size and amount of large partitions. Copyright is held by the author/owner(s).

AB - Entity resolution or object matching describes the assignment of different objects to each other that describe the same object of the real world. It is used in a variety of technical systems, e.g. systems that fuse different data sources. Blocking is used in this context as an approach to reduce the total amount of comparisons by grouping similar objects in the same cluster and dissimilar objects in different clusters. As a result only the objects of the same clusters have to be compared to each other. To deal with noise, for instance spelling errors, that can result from different heterogeneous data sources, various blocking approaches exist that may add or remove redundancy to the data. In this paper we propose a system that utilizes a derivative of the standard blocking technique to compute correspondences between objects as starting points for a graph matching process. The blocking technique, which usually relies on identity of blocking keys derived from attributes, is modified to cope with heterogenous source data with few attributes suitable for matching. A common criticism of standard blocking is low efficiency, since the block sizes are unbalanced with regard to the number of contained entities. We take precautions to keep the efficiency high by reducing the size and amount of large partitions. Copyright is held by the author/owner(s).

KW - Blocking

KW - Entity resolution

KW - Matching

UR - http://www.scopus.com/inward/record.url?scp=85049799546&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85049799546

T3 - CEUR Workshop Proceedings

SP - 53

EP - 58

BT - Grundlagen von Datenbanken

T2 - 30th GI-Workshop on the Foundations of Databases, GvDB 2018

Y2 - 22 May 2018 through 25 May 2018

ER -