Details
Original language | English |
---|---|
Pages (from-to) | 159-180 |
Number of pages | 22 |
Journal | Semantic web |
Volume | 10 |
Issue number | 1 |
Publication status | Published - 28 Dec 2018 |
Abstract
Knowledge bases are in widespread use for aiding tasks such as information extraction and information retrieval, for example in Web search. However, knowledge bases are known to be inherently incomplete, where in particular tail entities and properties are under-represented. As a complimentary data source, embedded entity markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data with significant potential to aid the task of knowledge base augmentation (KBA). RDF statements extracted from markup are fundamentally different from traditional knowledge graphs: entity descriptions are flat, facts are highly redundant and of varied quality, and, explicit links are missing despite a vast amount of coreferences. Therefore, data fusion is required in order to facilitate the use of markup data for KBA. We present a novel data fusion approach which addresses these issues through a combination of entity matching and fusion techniques geared towards the specific challenges associated with Web markup. To ensure precise and non-redundant results, we follow a supervised learning approach based on a set of features considering aspects such as quality and relevance of entities, facts and their sources. We perform a thorough evaluation on a subset of the Web Data Commons dataset and show significant potential for augmenting existing knowledge bases. A comparison with existing data fusion baselines demonstrates superior performance of our approach when applied to Web markup data.
Keywords
- data fusion, entity resolution, Knowledge base augmentation, microdata, structured data, Web markup
ASJC Scopus subject areas
- Computer Science(all)
- Information Systems
- Computer Science(all)
- Computer Science Applications
- Computer Science(all)
- Computer Networks and Communications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Semantic web, Vol. 10, No. 1, 28.12.2018, p. 159-180.
Research output: Contribution to journal › Review article › Research › peer review
}
TY - JOUR
T1 - KnowMore
T2 - Knowledge base augmentation with structured web markup
AU - Yu, Ran
AU - Gadiraju, Ujwal
AU - Fetahu, Besnik
AU - Lehmberg, Oliver
AU - Ritze, Dominique
AU - DIetze, Stefan
N1 - Publisher Copyright: © 2019 - IOS Press and the authors. All rights reserved.
PY - 2018/12/28
Y1 - 2018/12/28
N2 - Knowledge bases are in widespread use for aiding tasks such as information extraction and information retrieval, for example in Web search. However, knowledge bases are known to be inherently incomplete, where in particular tail entities and properties are under-represented. As a complimentary data source, embedded entity markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data with significant potential to aid the task of knowledge base augmentation (KBA). RDF statements extracted from markup are fundamentally different from traditional knowledge graphs: entity descriptions are flat, facts are highly redundant and of varied quality, and, explicit links are missing despite a vast amount of coreferences. Therefore, data fusion is required in order to facilitate the use of markup data for KBA. We present a novel data fusion approach which addresses these issues through a combination of entity matching and fusion techniques geared towards the specific challenges associated with Web markup. To ensure precise and non-redundant results, we follow a supervised learning approach based on a set of features considering aspects such as quality and relevance of entities, facts and their sources. We perform a thorough evaluation on a subset of the Web Data Commons dataset and show significant potential for augmenting existing knowledge bases. A comparison with existing data fusion baselines demonstrates superior performance of our approach when applied to Web markup data.
AB - Knowledge bases are in widespread use for aiding tasks such as information extraction and information retrieval, for example in Web search. However, knowledge bases are known to be inherently incomplete, where in particular tail entities and properties are under-represented. As a complimentary data source, embedded entity markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data with significant potential to aid the task of knowledge base augmentation (KBA). RDF statements extracted from markup are fundamentally different from traditional knowledge graphs: entity descriptions are flat, facts are highly redundant and of varied quality, and, explicit links are missing despite a vast amount of coreferences. Therefore, data fusion is required in order to facilitate the use of markup data for KBA. We present a novel data fusion approach which addresses these issues through a combination of entity matching and fusion techniques geared towards the specific challenges associated with Web markup. To ensure precise and non-redundant results, we follow a supervised learning approach based on a set of features considering aspects such as quality and relevance of entities, facts and their sources. We perform a thorough evaluation on a subset of the Web Data Commons dataset and show significant potential for augmenting existing knowledge bases. A comparison with existing data fusion baselines demonstrates superior performance of our approach when applied to Web markup data.
KW - data fusion
KW - entity resolution
KW - Knowledge base augmentation
KW - microdata
KW - structured data
KW - Web markup
UR - http://www.scopus.com/inward/record.url?scp=85059621895&partnerID=8YFLogxK
U2 - 10.3233/SW-180304
DO - 10.3233/SW-180304
M3 - Review article
AN - SCOPUS:85059621895
VL - 10
SP - 159
EP - 180
JO - Semantic web
JF - Semantic web
SN - 1570-0844
IS - 1
ER -