Details
Original language | English |
---|---|
Title of host publication | ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops |
Pages | 304-307 |
Number of pages | 4 |
Publication status | Published - 10 Jun 2011 |
Event | 2011 IEEE 27th International Conference on Data Engineering Workshops, ICDE 2011 - Hannover, Germany Duration: 11 Apr 2011 → 16 Apr 2011 |
Publication series
Name | Proceedings - International Conference on Data Engineering |
---|---|
ISSN (Print) | 1084-4627 |
Abstract
The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.
ASJC Scopus subject areas
- Computer Science(all)
- Software
- Computer Science(all)
- Signal Processing
- Computer Science(all)
- Information Systems
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops. 2011. p. 304-307 5767671 (Proceedings - International Conference on Data Engineering).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Efficient Entity Resolution Methods for Heterogeneous Information Spaces
AU - Papadakis, George
AU - Nejdl, Wolfgang
PY - 2011/6/10
Y1 - 2011/6/10
N2 - The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.
AB - The Web of Data encompasses a voluminous, yet constantly expanding collection of structured and semi-structured data sets. An important prerequisite for leveraging on them is the detection (and merge) of information that describe the same real-world entities, a task known as Entity Resolution. To enhance the efficiency of this quadratic task, blocking techniques are typically employed. They are, however, inapplicable to the Web of Data, due to the noise, the loose schema binding as well as the unprecedented heterogeneity inherent in it. In the context of my thesis, I focus on developing novel blocking methods that scale up Entity Resolution within such large, noisy, and heterogeneous information spaces. At their core lies an attribute-agnostic mechanism that relies exclusively on the values of entity profiles in order to build blocks effectively. The resulting set of blocks is processed efficiently by intelligent techniques that minimize the required number of comparisons. Any combination of block building and block processing methods is possible, allowing for high flexibility of the overall approach. Initial experimental studies on large, real-world data sets have produced quite promising results.
UR - http://www.scopus.com/inward/record.url?scp=79958070483&partnerID=8YFLogxK
U2 - 10.1109/ICDEW.2011.5767671
DO - 10.1109/ICDEW.2011.5767671
M3 - Conference contribution
AN - SCOPUS:79958070483
SN - 9781424491940
T3 - Proceedings - International Conference on Data Engineering
SP - 304
EP - 307
BT - ICDE Workshops 2011 - 2011 IEEE 27th International Conference on Data Engineering Workshops
T2 - 2011 IEEE 27th International Conference on Data Engineering Workshops, ICDE 2011
Y2 - 11 April 2011 through 16 April 2011
ER -