Detecting Biased Statements in Wikipedia

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autoren

  • Christoph Hube
  • Besnik Fetahu

Organisationseinheiten

Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksWWW '18: Companion Proceedings of the The Web Conference 2018
Seiten1779-1786
Seitenumfang8
ISBN (elektronisch)9781450356404
PublikationsstatusVeröffentlicht - 23 Apr. 2018
Veranstaltung27th International World Wide Web, WWW 2018 - Lyon, Frankreich
Dauer: 23 Apr. 201827 Apr. 2018

Abstract

Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.

ASJC Scopus Sachgebiete

Zitieren

Detecting Biased Statements in Wikipedia. / Hube, Christoph; Fetahu, Besnik.
WWW '18: Companion Proceedings of the The Web Conference 2018. 2018. S. 1779-1786.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Hube, C & Fetahu, B 2018, Detecting Biased Statements in Wikipedia. in WWW '18: Companion Proceedings of the The Web Conference 2018. S. 1779-1786, 27th International World Wide Web, WWW 2018, Lyon, Frankreich, 23 Apr. 2018. https://doi.org/10.1145/3184558.3191640
Hube, C., & Fetahu, B. (2018). Detecting Biased Statements in Wikipedia. In WWW '18: Companion Proceedings of the The Web Conference 2018 (S. 1779-1786) https://doi.org/10.1145/3184558.3191640
Hube C, Fetahu B. Detecting Biased Statements in Wikipedia. in WWW '18: Companion Proceedings of the The Web Conference 2018. 2018. S. 1779-1786 doi: 10.1145/3184558.3191640
Hube, Christoph ; Fetahu, Besnik. / Detecting Biased Statements in Wikipedia. WWW '18: Companion Proceedings of the The Web Conference 2018. 2018. S. 1779-1786
Download
@inproceedings{5502efbf214f42c89401caf43463f61a,
title = "Detecting Biased Statements in Wikipedia",
abstract = "Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.",
keywords = "language bias, NPOV, wikipedia quality",
author = "Christoph Hube and Besnik Fetahu",
year = "2018",
month = apr,
day = "23",
doi = "10.1145/3184558.3191640",
language = "English",
pages = "1779--1786",
booktitle = "WWW '18: Companion Proceedings of the The Web Conference 2018",
note = "27th International World Wide Web, WWW 2018 ; Conference date: 23-04-2018 Through 27-04-2018",

}

Download

TY - GEN

T1 - Detecting Biased Statements in Wikipedia

AU - Hube, Christoph

AU - Fetahu, Besnik

PY - 2018/4/23

Y1 - 2018/4/23

N2 - Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.

AB - Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.

KW - language bias

KW - NPOV

KW - wikipedia quality

UR - http://www.scopus.com/inward/record.url?scp=85058932118&partnerID=8YFLogxK

U2 - 10.1145/3184558.3191640

DO - 10.1145/3184558.3191640

M3 - Conference contribution

AN - SCOPUS:85058932118

SP - 1779

EP - 1786

BT - WWW '18: Companion Proceedings of the The Web Conference 2018

T2 - 27th International World Wide Web, WWW 2018

Y2 - 23 April 2018 through 27 April 2018

ER -