Details
Original language | English |
---|---|
Title of host publication | WWW '18: Companion Proceedings of the The Web Conference 2018 |
Pages | 1779-1786 |
Number of pages | 8 |
ISBN (electronic) | 9781450356404 |
Publication status | Published - 23 Apr 2018 |
Event | 27th International World Wide Web, WWW 2018 - Lyon, France Duration: 23 Apr 2018 → 27 Apr 2018 |
Abstract
Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.
Keywords
- language bias, NPOV, wikipedia quality
ASJC Scopus subject areas
- Computer Science(all)
- Computer Networks and Communications
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
WWW '18: Companion Proceedings of the The Web Conference 2018. 2018. p. 1779-1786.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Detecting Biased Statements in Wikipedia
AU - Hube, Christoph
AU - Fetahu, Besnik
PY - 2018/4/23
Y1 - 2018/4/23
N2 - Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.
AB - Quality in Wikipedia is enforced through a set of editing policies and guidelines recommended for Wikipedia editors. Neutral point of view (NPOV) is one of the main principles in Wikipedia, which ensures that for controversial information all possible points of view are represented proportionally. Furthermore, language used in Wikipedia should be neutral and not opinionated. However, due to the large number of Wikipedia articles and its operating principle based on a voluntary basis of Wikipedia editors; quality assurances and Wikipedia guidelines cannot always be enforced. Currently, there are more than 40,000 articles, which are flagged with NPOV or similar quality tags. Furthermore, these represent only the portion of articles for which such quality issues are explicitly flagged by the Wikipedia editors, however, the real number may be higher considering that only a small percentage of articles are of good quality or featured as categorized by Wikipedia. In this work, we focus on the case of language bias at the sentence level in Wikipedia. Language bias is a hard problem, as it represents a subjective task and usually the linguistic cues are subtle and can be determined only through its context. We propose a supervised classification approach, which relies on an automatically created lexicon of bias words, and other syntactical and semantic characteristics of biased statements. We experimentally evaluate our approach on a dataset consisting of biased and unbiased statements, and show that we are able to detect biased statements with an accuracy of 74%. Furthermore, we show that competitors that determine bias words are not suitable for detecting biased statements, which we outperform with a relative improvement of over 20%.
KW - language bias
KW - NPOV
KW - wikipedia quality
UR - http://www.scopus.com/inward/record.url?scp=85058932118&partnerID=8YFLogxK
U2 - 10.1145/3184558.3191640
DO - 10.1145/3184558.3191640
M3 - Conference contribution
AN - SCOPUS:85058932118
SP - 1779
EP - 1786
BT - WWW '18: Companion Proceedings of the The Web Conference 2018
T2 - 27th International World Wide Web, WWW 2018
Y2 - 23 April 2018 through 27 April 2018
ER -