Efficient Entity Resolution Methods for Heterogeneous Information Spaces

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops
Pages304-307
Number of pages4
Publication statusPublished - 10 Jun 2011
Event2011 IEEE 27th International Conference on Data Engineering Workshops, ICDE 2011 - Hannover, Germany
Duration: 11 Apr 201116 Apr 2011

Publication series

NameProceedings - International Conference on Data Engineering
ISSN (Print)1084-4627

Abstract

The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.

ASJC Scopus subject areas

Cite this

Efficient Entity Resolution Methods for Heterogeneous Information Spaces. / Papadakis, George; Nejdl, Wolfgang.
ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops. 2011. p. 304-307 5767671 (Proceedings - International Conference on Data Engineering).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Papadakis, G & Nejdl, W 2011, Efficient Entity Resolution Methods for Heterogeneous Information Spaces. in ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops., 5767671, Proceedings - International Conference on Data Engineering, pp. 304-307, 2011 IEEE 27th International Conference on Data Engineering Workshops, ICDE 2011, Hannover, Germany, 11 Apr 2011. https://doi.org/10.1109/ICDEW.2011.5767671
Papadakis, G., & Nejdl, W. (2011). Efficient Entity Resolution Methods for Heterogeneous Information Spaces. In ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops (pp. 304-307). Article 5767671 (Proceedings - International Conference on Data Engineering). https://doi.org/10.1109/ICDEW.2011.5767671
Papadakis G, Nejdl W. Efficient Entity Resolution Methods for Heterogeneous Information Spaces. In ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops. 2011. p. 304-307. 5767671. (Proceedings - International Conference on Data Engineering). doi: 10.1109/ICDEW.2011.5767671
Papadakis, George ; Nejdl, Wolfgang. / Efficient Entity Resolution Methods for Heterogeneous Information Spaces. ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops. 2011. pp. 304-307 (Proceedings - International Conference on Data Engineering).
Download
@inproceedings{bc0e6bade362404db4ffee73dc2dc3cd,
title = "Efficient Entity Resolution Methods for Heterogeneous Information Spaces",
abstract = "The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.",
author = "George Papadakis and Wolfgang Nejdl",
year = "2011",
month = jun,
day = "10",
doi = "10.1109/ICDEW.2011.5767671",
language = "English",
isbn = "9781424491940",
series = "Proceedings - International Conference on Data Engineering",
pages = "304--307",
booktitle = "ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops",
note = "2011 IEEE 27th International Conference on Data Engineering Workshops, ICDE 2011 ; Conference date: 11-04-2011 Through 16-04-2011",

}

Download

TY - GEN

T1 - Efficient Entity Resolution Methods for Heterogeneous Information Spaces

AU - Papadakis, George

AU - Nejdl, Wolfgang

PY - 2011/6/10

Y1 - 2011/6/10

N2 - The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.

AB - The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.

UR - http://www.scopus.com/inward/record.url?scp=79958070483&partnerID=8YFLogxK

U2 - 10.1109/ICDEW.2011.5767671

DO - 10.1109/ICDEW.2011.5767671

M3 - Conference contribution

AN - SCOPUS:79958070483

SN - 9781424491940

T3 - Proceedings - International Conference on Data Engineering

SP - 304

EP - 307

BT - ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops

T2 - 2011 IEEE 27th International Conference on Data Engineering Workshops, ICDE 2011

Y2 - 11 April 2011 through 16 April 2011

ER -

By the same author(s)