Citation needed: A taxonomy and algorithmic assessment of Wikipedia's verifiability

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Miriam Redi
  • Jonathan Morgan
  • Besnik Fetahu
  • Dario Taraborelli

Organisationseinheiten

Externe Organisationen

  • Wikimedia-Gesellschaft
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksThe Web Conference 2019
UntertitelProceedings of the World Wide Web Conference, WWW 2019
Herausgeber/-innenLing Liu, Ryen White
ErscheinungsortNew York
Seiten1567-1578
Seitenumfang12
ISBN (elektronisch)9781450366748
PublikationsstatusVeröffentlicht - 13 Mai 2019
Veranstaltung2019 World Wide Web Conference, WWW 2019 - San Francisco, USA / Vereinigte Staaten
Dauer: 13 Mai 201917 Mai 2019

Abstract

Wikipedia is playing an increasingly central role on the web, and the policies its contributors follow when sourcing and fact-checking content affect million of readers. Among these core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required, by collecting labeled data from editors of multiple Wikipedia language editions. We then crowdsource a large-scale dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design algorithmic models to determine if a statement requires a citation, and to predict the citation reason. We evaluate the accuracy of such models across different classes of Wikipedia articles of varying quality, and on external datasets of claims annotated for fact-checking purposes.

ASJC Scopus Sachgebiete

Zitieren

Citation needed: A taxonomy and algorithmic assessment of Wikipedia's verifiability. / Redi, Miriam; Morgan, Jonathan; Fetahu, Besnik et al.
The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. Hrsg. / Ling Liu; Ryen White. New York, 2019. S. 1567-1578.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Redi, M, Morgan, J, Fetahu, B & Taraborelli, D 2019, Citation needed: A taxonomy and algorithmic assessment of Wikipedia's verifiability. in L Liu & R White (Hrsg.), The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. New York, S. 1567-1578, 2019 World Wide Web Conference, WWW 2019, San Francisco, USA / Vereinigte Staaten, 13 Mai 2019. https://doi.org/10.1145/3308558.3313618
Redi, M., Morgan, J., Fetahu, B., & Taraborelli, D. (2019). Citation needed: A taxonomy and algorithmic assessment of Wikipedia's verifiability. In L. Liu, & R. White (Hrsg.), The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019 (S. 1567-1578). https://doi.org/10.1145/3308558.3313618
Redi M, Morgan J, Fetahu B, Taraborelli D. Citation needed: A taxonomy and algorithmic assessment of Wikipedia's verifiability. in Liu L, White R, Hrsg., The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. New York. 2019. S. 1567-1578 doi: 10.1145/3308558.3313618
Redi, Miriam ; Morgan, Jonathan ; Fetahu, Besnik et al. / Citation needed : A taxonomy and algorithmic assessment of Wikipedia's verifiability. The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. Hrsg. / Ling Liu ; Ryen White. New York, 2019. S. 1567-1578
Download
@inproceedings{e865de039f9f400d8377219b746dc472,
title = "Citation needed: A taxonomy and algorithmic assessment of Wikipedia's verifiability",
abstract = "Wikipedia is playing an increasingly central role on the web, and the policies its contributors follow when sourcing and fact-checking content affect million of readers. Among these core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required, by collecting labeled data from editors of multiple Wikipedia language editions. We then crowdsource a large-scale dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design algorithmic models to determine if a statement requires a citation, and to predict the citation reason. We evaluate the accuracy of such models across different classes of Wikipedia articles of varying quality, and on external datasets of claims annotated for fact-checking purposes.",
keywords = "Citations, Crowdsourcing, Neural Networks, Wikipedia",
author = "Miriam Redi and Jonathan Morgan and Besnik Fetahu and Dario Taraborelli",
note = "Funding information: We would like to thank the community members of the English, French and Italian Wikipedia for helping with data labeling and for their precious suggestions, and Bahodir Mansurov and Aaron Halfaker from the Wikimedia Foundation, for their help building the WikiLabels task. This work is partly funded by the ERC Advanced Grant ALEXANDRIA (grant no. 339233), and BMBF Simple-ML project (grant no. 01IS18054A).; 2019 World Wide Web Conference, WWW 2019 ; Conference date: 13-05-2019 Through 17-05-2019",
year = "2019",
month = may,
day = "13",
doi = "10.1145/3308558.3313618",
language = "English",
pages = "1567--1578",
editor = "Ling Liu and Ryen White",
booktitle = "The Web Conference 2019",

}

Download

TY - GEN

T1 - Citation needed

T2 - 2019 World Wide Web Conference, WWW 2019

AU - Redi, Miriam

AU - Morgan, Jonathan

AU - Fetahu, Besnik

AU - Taraborelli, Dario

N1 - Funding information: We would like to thank the community members of the English, French and Italian Wikipedia for helping with data labeling and for their precious suggestions, and Bahodir Mansurov and Aaron Halfaker from the Wikimedia Foundation, for their help building the WikiLabels task. This work is partly funded by the ERC Advanced Grant ALEXANDRIA (grant no. 339233), and BMBF Simple-ML project (grant no. 01IS18054A).

PY - 2019/5/13

Y1 - 2019/5/13

N2 - Wikipedia is playing an increasingly central role on the web, and the policies its contributors follow when sourcing and fact-checking content affect million of readers. Among these core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required, by collecting labeled data from editors of multiple Wikipedia language editions. We then crowdsource a large-scale dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design algorithmic models to determine if a statement requires a citation, and to predict the citation reason. We evaluate the accuracy of such models across different classes of Wikipedia articles of varying quality, and on external datasets of claims annotated for fact-checking purposes.

AB - Wikipedia is playing an increasingly central role on the web, and the policies its contributors follow when sourcing and fact-checking content affect million of readers. Among these core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required, by collecting labeled data from editors of multiple Wikipedia language editions. We then crowdsource a large-scale dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design algorithmic models to determine if a statement requires a citation, and to predict the citation reason. We evaluate the accuracy of such models across different classes of Wikipedia articles of varying quality, and on external datasets of claims annotated for fact-checking purposes.

KW - Citations

KW - Crowdsourcing

KW - Neural Networks

KW - Wikipedia

UR - http://www.scopus.com/inward/record.url?scp=85066892225&partnerID=8YFLogxK

U2 - 10.1145/3308558.3313618

DO - 10.1145/3308558.3313618

M3 - Conference contribution

AN - SCOPUS:85066892225

SP - 1567

EP - 1578

BT - The Web Conference 2019

A2 - Liu, Ling

A2 - White, Ryen

CY - New York

Y2 - 13 May 2019 through 17 May 2019

ER -