Details
Original language | English |
---|---|
Title of host publication | The Web Conference 2019 |
Subtitle of host publication | Proceedings of the World Wide Web Conference, WWW 2019 |
Editors | Ling Liu, Ryen White |
Place of Publication | New York |
Pages | 1567-1578 |
Number of pages | 12 |
ISBN (electronic) | 9781450366748 |
Publication status | Published - 13 May 2019 |
Event | 2019 World Wide Web Conference, WWW 2019 - San Francisco, United States Duration: 13 May 2019 → 17 May 2019 |
Abstract
Wikipedia is playing an increasingly central role on the web, and the policies its contributors follow when sourcing and fact-checking content affect million of readers. Among these core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required, by collecting labeled data from editors of multiple Wikipedia language editions. We then crowdsource a large-scale dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design algorithmic models to determine if a statement requires a citation, and to predict the citation reason. We evaluate the accuracy of such models across different classes of Wikipedia articles of varying quality, and on external datasets of claims annotated for fact-checking purposes.
Keywords
- Citations, Crowdsourcing, Neural Networks, Wikipedia
ASJC Scopus subject areas
- Computer Science(all)
- Computer Networks and Communications
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
The Web Conference 2019: Proceedings of the World Wide Web Conference, WWW 2019. ed. / Ling Liu; Ryen White. New York, 2019. p. 1567-1578.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Citation needed
T2 - 2019 World Wide Web Conference, WWW 2019
AU - Redi, Miriam
AU - Morgan, Jonathan
AU - Fetahu, Besnik
AU - Taraborelli, Dario
N1 - Funding information: We would like to thank the community members of the English, French and Italian Wikipedia for helping with data labeling and for their precious suggestions, and Bahodir Mansurov and Aaron Halfaker from the Wikimedia Foundation, for their help building the WikiLabels task. This work is partly funded by the ERC Advanced Grant ALEXANDRIA (grant no. 339233), and BMBF Simple-ML project (grant no. 01IS18054A).
PY - 2019/5/13
Y1 - 2019/5/13
N2 - Wikipedia is playing an increasingly central role on the web, and the policies its contributors follow when sourcing and fact-checking content affect million of readers. Among these core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required, by collecting labeled data from editors of multiple Wikipedia language editions. We then crowdsource a large-scale dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design algorithmic models to determine if a statement requires a citation, and to predict the citation reason. We evaluate the accuracy of such models across different classes of Wikipedia articles of varying quality, and on external datasets of claims annotated for fact-checking purposes.
AB - Wikipedia is playing an increasingly central role on the web, and the policies its contributors follow when sourcing and fact-checking content affect million of readers. Among these core guiding principles, verifiability policies have a particularly important role. Verifiability requires that information included in a Wikipedia article be corroborated against reliable secondary sources. Because of the manual labor needed to curate Wikipedia at scale, however, its contents do not always evenly comply with these policies. Citations (i.e. reference to external sources) may not conform to verifiability requirements or may be missing altogether, potentially weakening the reliability of specific topic areas of the free encyclopedia. In this paper, we aim to provide an empirical characterization of the reasons why and how Wikipedia cites external sources to comply with its own verifiability guidelines. First, we construct a taxonomy of reasons why inline citations are required, by collecting labeled data from editors of multiple Wikipedia language editions. We then crowdsource a large-scale dataset of Wikipedia sentences annotated with categories derived from this taxonomy. Finally, we design algorithmic models to determine if a statement requires a citation, and to predict the citation reason. We evaluate the accuracy of such models across different classes of Wikipedia articles of varying quality, and on external datasets of claims annotated for fact-checking purposes.
KW - Citations
KW - Crowdsourcing
KW - Neural Networks
KW - Wikipedia
UR - http://www.scopus.com/inward/record.url?scp=85066892225&partnerID=8YFLogxK
U2 - 10.1145/3308558.3313618
DO - 10.1145/3308558.3313618
M3 - Conference contribution
AN - SCOPUS:85066892225
SP - 1567
EP - 1578
BT - The Web Conference 2019
A2 - Liu, Ling
A2 - White, Ryen
CY - New York
Y2 - 13 May 2019 through 17 May 2019
ER -