Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Proceedings of The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022) |
Seiten | 2081-2093 |
Seitenumfang | 13 |
Publikationsstatus | Veröffentlicht - Dez. 2022 |
Veranstaltung | 2022 Findings of the Association for Computational Linguistics: EMNLP 2022 - Abu Dhabi, Vereinigte Arabische Emirate Dauer: 7 Dez. 2022 → 11 Dez. 2022 |
Abstract
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Theoretische Informatik und Mathematik
- Informatik (insg.)
- Angewandte Informatik
- Informatik (insg.)
- Information systems
Ziele für nachhaltige Entwicklung
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Proceedings of The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022). 2022. S. 2081-2093.
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review
}
TY - GEN
T1 - No Word Embedding Model Is Perfect
T2 - 2022 Findings of the Association for Computational Linguistics: EMNLP 2022
AU - Spliethöver, Maximilian
AU - Keiff, Maximilian
AU - Wachsmuth, Henning
N1 - Publisher Copyright: © 2022 Association for Computational Linguistics.
PY - 2022/12
Y1 - 2022/12
N2 - News articles both shape and reflect public opinion across the political spectrum. Analyzing them for social bias can thus provide valuable insights, such as prevailing stereotypes in society and the media, which are often adopted by NLP models trained on respective data. Recent work has relied on word embedding bias measures, such as WEAT. However, several representation issues of embeddings can harm the measures’ accuracy, including low-resource settings and token frequency differences. In this work, we study what kind of embedding algorithm serves best to accurately measure types of social bias known to exist in US online news articles. To cover the whole spectrum of political bias in the US, we collect 500k articles and review psychology literature with respect to expected social bias. We then quantify social bias using WEAT along with embedding algorithms that account for the aforementioned issues. We compare how models trained with the algorithms on news articles represent the expected social bias. Our results suggest that the standard way to quantify bias does not align well with knowledge from psychology. While the proposed algorithms reduce the gap, they still do not fully match the literature.
AB - News articles both shape and reflect public opinion across the political spectrum. Analyzing them for social bias can thus provide valuable insights, such as prevailing stereotypes in society and the media, which are often adopted by NLP models trained on respective data. Recent work has relied on word embedding bias measures, such as WEAT. However, several representation issues of embeddings can harm the measures’ accuracy, including low-resource settings and token frequency differences. In this work, we study what kind of embedding algorithm serves best to accurately measure types of social bias known to exist in US online news articles. To cover the whole spectrum of political bias in the US, we collect 500k articles and review psychology literature with respect to expected social bias. We then quantify social bias using WEAT along with embedding algorithms that account for the aforementioned issues. We compare how models trained with the algorithms on news articles represent the expected social bias. Our results suggest that the standard way to quantify bias does not align well with knowledge from psychology. While the proposed algorithms reduce the gap, they still do not fully match the literature.
UR - http://www.scopus.com/inward/record.url?scp=85149869212&partnerID=8YFLogxK
U2 - 10.18653/v1/2022.findings-emnlp.152
DO - 10.18653/v1/2022.findings-emnlp.152
M3 - Conference contribution
SP - 2081
EP - 2093
BT - Proceedings of The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)
Y2 - 7 December 2022 through 11 December 2022
ER -