Detecting Biased Statements in Wikipedia

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Christoph Hube
  • Besnik Fetahu

Research Organisations

View graph of relations

Details

Original languageEnglish
Title of host publicationWWW '18: Companion Proceedings of the The Web Conference 2018
Pages1779-1786
Number of pages8
ISBN (electronic)9781450356404
Publication statusPublished - 23 Apr 2018
Event27th International World Wide Web, WWW 2018 - Lyon, France
Duration: 23 Apr 201827 Apr 2018

Abstract

Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.

Keywords

    language bias, NPOV, wikipedia quality

ASJC Scopus subject areas

Cite this

Detecting Biased Statements in Wikipedia. / Hube, Christoph; Fetahu, Besnik.
WWW '18: Companion Proceedings of the The Web Conference 2018. 2018. p. 1779-1786.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Hube, C & Fetahu, B 2018, Detecting Biased Statements in Wikipedia. in WWW '18: Companion Proceedings of the The Web Conference 2018. pp. 1779-1786, 27th International World Wide Web, WWW 2018, Lyon, France, 23 Apr 2018. https://doi.org/10.1145/3184558.3191640
Hube, C., & Fetahu, B. (2018). Detecting Biased Statements in Wikipedia. In WWW '18: Companion Proceedings of the The Web Conference 2018 (pp. 1779-1786) https://doi.org/10.1145/3184558.3191640
Hube C, Fetahu B. Detecting Biased Statements in Wikipedia. In WWW '18: Companion Proceedings of the The Web Conference 2018. 2018. p. 1779-1786 doi: 10.1145/3184558.3191640
Hube, Christoph ; Fetahu, Besnik. / Detecting Biased Statements in Wikipedia. WWW '18: Companion Proceedings of the The Web Conference 2018. 2018. pp. 1779-1786
Download
@inproceedings{5502efbf214f42c89401caf43463f61a,
title = "Detecting Biased Statements in Wikipedia",
abstract = "Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.",
keywords = "language bias, NPOV, wikipedia quality",
author = "Christoph Hube and Besnik Fetahu",
year = "2018",
month = apr,
day = "23",
doi = "10.1145/3184558.3191640",
language = "English",
pages = "1779--1786",
booktitle = "WWW '18: Companion Proceedings of the The Web Conference 2018",
note = "27th International World Wide Web, WWW 2018 ; Conference date: 23-04-2018 Through 27-04-2018",

}

Download

TY - GEN

T1 - Detecting Biased Statements in Wikipedia

AU - Hube, Christoph

AU - Fetahu, Besnik

PY - 2018/4/23

Y1 - 2018/4/23

N2 - Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.

AB - Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.

KW - language bias

KW - NPOV

KW - wikipedia quality

UR - http://www.scopus.com/inward/record.url?scp=85058932118&partnerID=8YFLogxK

U2 - 10.1145/3184558.3191640

DO - 10.1145/3184558.3191640

M3 - Conference contribution

AN - SCOPUS:85058932118

SP - 1779

EP - 1786

BT - WWW '18: Companion Proceedings of the The Web Conference 2018

T2 - 27th International World Wide Web, WWW 2018

Y2 - 23 April 2018 through 27 April 2018

ER -