MultiWiki: Interlingual text passage alignment in wikipedia

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Autorschaft

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Aufsatznummer6
FachzeitschriftACM transactions on the web
Jahrgang11
Ausgabenummer1
PublikationsstatusVeröffentlicht - Apr. 2017

Abstract

In this article, we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences, and build a basis for qualitative analysis of the articles. An important challenge in this context is the tradeoff between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian, and English Wikipedia and collect a user-Annotated benchmark. Then we propose MultiWiki, a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. The MultiWiki demonstration is publicly available and currently supports four language pairs.

ASJC Scopus Sachgebiete

Zitieren

MultiWiki: Interlingual text passage alignment in wikipedia. / Gottschalk, Simon; Demidova, Elena.
in: ACM transactions on the web, Jahrgang 11, Nr. 1, 6, 04.2017.

Publikation: Beitrag in FachzeitschriftArtikelForschungPeer-Review

Download
@article{100bc854fec9446e953c45a0e22cff89,
title = "MultiWiki: Interlingual text passage alignment in wikipedia",
abstract = "In this article, we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences, and build a basis for qualitative analysis of the articles. An important challenge in this context is the tradeoff between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian, and English Wikipedia and collect a user-Annotated benchmark. Then we propose MultiWiki, a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. The MultiWiki demonstration is publicly available and currently supports four language pairs.",
keywords = "Interlingual text alignment, Wikipedia",
author = "Simon Gottschalk and Elena Demidova",
note = "Publisher Copyright: {\textcopyright} 2017 ACM 1559-1131/2017/04-ART6 $15.00.",
year = "2017",
month = apr,
doi = "10.1145/3004296",
language = "English",
volume = "11",
journal = "ACM transactions on the web",
issn = "1559-1131",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

Download

TY - JOUR

T1 - MultiWiki

T2 - Interlingual text passage alignment in wikipedia

AU - Gottschalk, Simon

AU - Demidova, Elena

N1 - Publisher Copyright: © 2017 ACM 1559-1131/2017/04-ART6 $15.00.

PY - 2017/4

Y1 - 2017/4

N2 - In this article, we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences, and build a basis for qualitative analysis of the articles. An important challenge in this context is the tradeoff between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian, and English Wikipedia and collect a user-Annotated benchmark. Then we propose MultiWiki, a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. The MultiWiki demonstration is publicly available and currently supports four language pairs.

AB - In this article, we address the problem of text passage alignment across interlingual article pairs in Wikipedia. We develop methods that enable the identification and interlinking of text passages written in different languages and containing overlapping information. Interlingual text passage alignment can enable Wikipedia editors and readers to better understand language-specific context of entities, provide valuable insights in cultural differences, and build a basis for qualitative analysis of the articles. An important challenge in this context is the tradeoff between the granularity of the extracted text passages and the precision of the alignment. Whereas short text passages can result in more precise alignment, longer text passages can facilitate a better overview of the differences in an article pair. To better understand these aspects from the user perspective, we conduct a user study at the example of the German, Russian, and English Wikipedia and collect a user-Annotated benchmark. Then we propose MultiWiki, a method that adopts an integrated approach to the text passage alignment using semantic similarity measures and greedy algorithms and achieves precise results with respect to the user-defined alignment. The MultiWiki demonstration is publicly available and currently supports four language pairs.

KW - Interlingual text alignment

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=85017197222&partnerID=8YFLogxK

U2 - 10.1145/3004296

DO - 10.1145/3004296

M3 - Article

AN - SCOPUS:85017197222

VL - 11

JO - ACM transactions on the web

JF - ACM transactions on the web

SN - 1559-1131

IS - 1

M1 - 6

ER -

Von denselben Autoren