KnowMore: Knowledge base augmentation with structured web markup

Ran Yu; Ujwal Gadiraju; Besnik Fetahu; Oliver Lehmberg; Dominique Ritze; Stefan DIetze

doi:10.3233/SW-180304

Details

Originalsprache	Englisch
Seiten (von - bis)	159-180
Seitenumfang	22
Fachzeitschrift	Semantic web
Jahrgang	10
Ausgabenummer	1
Publikationsstatus	Veröffentlicht - 28 Dez. 2018

Abstract

Knowledge bases are in widespread use for aiding tasks such as information extraction and information retrieval, for example in Web search. However, knowledge bases are known to be inherently incomplete, where in particular tail entities and properties are under-represented. As a complimentary data source, embedded entity markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data with significant potential to aid the task of knowledge base augmentation (KBA). RDF statements extracted from markup are fundamentally different from traditional knowledge graphs: entity descriptions are flat, facts are highly redundant and of varied quality, and, explicit links are missing despite a vast amount of coreferences. Therefore, data fusion is required in order to facilitate the use of markup data for KBA. We present a novel data fusion approach which addresses these issues through a combination of entity matching and fusion techniques geared towards the specific challenges associated with Web markup. To ensure precise and non-redundant results, we follow a supervised learning approach based on a set of features considering aspects such as quality and relevance of entities, facts and their sources. We perform a thorough evaluation on a subset of the Web Data Commons dataset and show significant potential for augmenting existing knowledge bases. A comparison with existing data fusion baselines demonstrates superior performance of our approach when applied to Web markup data.

ASJC Scopus Sachgebiete

Informatik (insg.)
Information systems
Informatik (insg.)
Angewandte Informatik
Informatik (insg.)
Computernetzwerke und -kommunikation

Zitieren

KnowMore: Knowledge base augmentation with structured web markup. / Yu, Ran; Gadiraju, Ujwal; Fetahu, Besnik et al.
in: Semantic web, Jahrgang 10, Nr. 1, 28.12.2018, S. 159-180.

Publikation: Beitrag in Fachzeitschrift › Übersichtsarbeit › Forschung › Peer-Review

Yu, R, Gadiraju, U, Fetahu, B, Lehmberg, O, Ritze, D & DIetze, S 2018, 'KnowMore: Knowledge base augmentation with structured web markup', Semantic web, Jg. 10, Nr. 1, S. 159-180. https://doi.org/10.3233/SW-180304

Yu, R., Gadiraju, U., Fetahu, B., Lehmberg, O., Ritze, D., & DIetze, S. (2018). KnowMore: Knowledge base augmentation with structured web markup. Semantic web, 10(1), 159-180. https://doi.org/10.3233/SW-180304

Yu R, Gadiraju U, Fetahu B, Lehmberg O, Ritze D, DIetze S. KnowMore: Knowledge base augmentation with structured web markup. Semantic web. 2018 Dez 28;10(1):159-180. doi: 10.3233/SW-180304

Yu, Ran ; Gadiraju, Ujwal ; Fetahu, Besnik et al. / KnowMore : Knowledge base augmentation with structured web markup. in: Semantic web. 2018 ; Jahrgang 10, Nr. 1. S. 159-180.

Download

@article{41414f09f1e3455eb134ed6d15c8b115,

title = "KnowMore: Knowledge base augmentation with structured web markup",

abstract = "Knowledge bases are in widespread use for aiding tasks such as information extraction and information retrieval, for example in Web search. However, knowledge bases are known to be inherently incomplete, where in particular tail entities and properties are under-represented. As a complimentary data source, embedded entity markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data with significant potential to aid the task of knowledge base augmentation (KBA). RDF statements extracted from markup are fundamentally different from traditional knowledge graphs: entity descriptions are flat, facts are highly redundant and of varied quality, and, explicit links are missing despite a vast amount of coreferences. Therefore, data fusion is required in order to facilitate the use of markup data for KBA. We present a novel data fusion approach which addresses these issues through a combination of entity matching and fusion techniques geared towards the specific challenges associated with Web markup. To ensure precise and non-redundant results, we follow a supervised learning approach based on a set of features considering aspects such as quality and relevance of entities, facts and their sources. We perform a thorough evaluation on a subset of the Web Data Commons dataset and show significant potential for augmenting existing knowledge bases. A comparison with existing data fusion baselines demonstrates superior performance of our approach when applied to Web markup data.",

keywords = "data fusion, entity resolution, Knowledge base augmentation, microdata, structured data, Web markup",

author = "Ran Yu and Ujwal Gadiraju and Besnik Fetahu and Oliver Lehmberg and Dominique Ritze and Stefan DIetze",

year = "2018",

month = dec,

day = "28",

doi = "10.3233/SW-180304",

language = "English",

volume = "10",

pages = "159--180",

journal = "Semantic web",

issn = "1570-0844",

publisher = "IOS Press",

number = "1",

}

Download

TY - JOUR

T1 - KnowMore

T2 - Knowledge base augmentation with structured web markup

AU - Yu, Ran

AU - Gadiraju, Ujwal

AU - Fetahu, Besnik

AU - Lehmberg, Oliver

AU - Ritze, Dominique

AU - DIetze, Stefan

PY - 2018/12/28

Y1 - 2018/12/28

N2 - Knowledge bases are in widespread use for aiding tasks such as information extraction and information retrieval, for example in Web search. However, knowledge bases are known to be inherently incomplete, where in particular tail entities and properties are under-represented. As a complimentary data source, embedded entity markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data with significant potential to aid the task of knowledge base augmentation (KBA). RDF statements extracted from markup are fundamentally different from traditional knowledge graphs: entity descriptions are flat, facts are highly redundant and of varied quality, and, explicit links are missing despite a vast amount of coreferences. Therefore, data fusion is required in order to facilitate the use of markup data for KBA. We present a novel data fusion approach which addresses these issues through a combination of entity matching and fusion techniques geared towards the specific challenges associated with Web markup. To ensure precise and non-redundant results, we follow a supervised learning approach based on a set of features considering aspects such as quality and relevance of entities, facts and their sources. We perform a thorough evaluation on a subset of the Web Data Commons dataset and show significant potential for augmenting existing knowledge bases. A comparison with existing data fusion baselines demonstrates superior performance of our approach when applied to Web markup data.

AB - Knowledge bases are in widespread use for aiding tasks such as information extraction and information retrieval, for example in Web search. However, knowledge bases are known to be inherently incomplete, where in particular tail entities and properties are under-represented. As a complimentary data source, embedded entity markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data with significant potential to aid the task of knowledge base augmentation (KBA). RDF statements extracted from markup are fundamentally different from traditional knowledge graphs: entity descriptions are flat, facts are highly redundant and of varied quality, and, explicit links are missing despite a vast amount of coreferences. Therefore, data fusion is required in order to facilitate the use of markup data for KBA. We present a novel data fusion approach which addresses these issues through a combination of entity matching and fusion techniques geared towards the specific challenges associated with Web markup. To ensure precise and non-redundant results, we follow a supervised learning approach based on a set of features considering aspects such as quality and relevance of entities, facts and their sources. We perform a thorough evaluation on a subset of the Web Data Commons dataset and show significant potential for augmenting existing knowledge bases. A comparison with existing data fusion baselines demonstrates superior performance of our approach when applied to Web markup data.

KW - data fusion

KW - entity resolution

KW - Knowledge base augmentation

KW - microdata

KW - structured data

KW - Web markup

UR - http://www.scopus.com/inward/record.url?scp=85059621895&partnerID=8YFLogxK

U2 - 10.3233/SW-180304

DO - 10.3233/SW-180304

M3 - Review article

AN - SCOPUS:85059621895

VL - 10

SP - 159

EP - 180

JO - Semantic web

JF - Semantic web

SN - 1570-0844

IS - 1

ER -

Research@Leibniz University

KnowMore: Knowledge base augmentation with structured web markup

Autoren

Organisationseinheiten

Externe Organisationen

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren