Debiasing word embeddings from sentiment associations in names

Christoph Hube; Maximilian Idahl; Besnik Fetahu

doi:10.1145/3336191.3371779

Details

Originalsprache	Englisch
Titel des Sammelwerks	WSDM 2020
Untertitel	Proceedings of the 13th International Conference on Web Search and Data Mining
Seiten	259-267
Seitenumfang	9
ISBN (elektronisch)	9781450368223
Publikationsstatus	Veröffentlicht - 20 Jan. 2020
Veranstaltung	13th ACM International Conference on Web Search and Data Mining, WSDM 2020 - Houston, USA / Vereinigte Staaten Dauer: 3 Feb. 2020 → 7 Feb. 2020

Abstract

Word embeddings trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. “Smith”), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.

ASJC Scopus Sachgebiete

Informatik (insg.)
Computernetzwerke und -kommunikation
Informatik (insg.)
Software
Informatik (insg.)
Angewandte Informatik

Zitieren

Debiasing word embeddings from sentiment associations in names. / Hube, Christoph; Idahl, Maximilian; Fetahu, Besnik.
WSDM 2020: Proceedings of the 13th International Conference on Web Search and Data Mining. 2020. S. 259-267.

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Hube, C, Idahl, M & Fetahu, B 2020, Debiasing word embeddings from sentiment associations in names. in WSDM 2020: Proceedings of the 13th International Conference on Web Search and Data Mining. S. 259-267, 13th ACM International Conference on Web Search and Data Mining, WSDM 2020, Houston, USA / Vereinigte Staaten, 3 Feb. 2020. https://doi.org/10.1145/3336191.3371779

Hube, C., Idahl, M., & Fetahu, B. (2020). Debiasing word embeddings from sentiment associations in names. In WSDM 2020: Proceedings of the 13th International Conference on Web Search and Data Mining (S. 259-267) https://doi.org/10.1145/3336191.3371779

Hube C, Idahl M, Fetahu B. Debiasing word embeddings from sentiment associations in names. in WSDM 2020: Proceedings of the 13th International Conference on Web Search and Data Mining. 2020. S. 259-267 doi: 10.1145/3336191.3371779

Hube, Christoph ; Idahl, Maximilian ; Fetahu, Besnik. / Debiasing word embeddings from sentiment associations in names. WSDM 2020: Proceedings of the 13th International Conference on Web Search and Data Mining. 2020. S. 259-267

Download

@inproceedings{c0b3d3e2ab964aed8c52b1e4b3201626,

title = "Debiasing word embeddings from sentiment associations in names",

abstract = "Word embeddings trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. “Smith”), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.",

author = "Christoph Hube and Maximilian Idahl and Besnik Fetahu",

note = "Funding Information: Acknowledgments. This work is funded by the DESIR (grant no. 731081) and SimpleML (grant no. 01IS18054).; 13th ACM International Conference on Web Search and Data Mining, WSDM 2020 ; Conference date: 03-02-2020 Through 07-02-2020",

year = "2020",

month = jan,

day = "20",

doi = "10.1145/3336191.3371779",

language = "English",

pages = "259--267",

booktitle = "WSDM 2020",

}

Download

TY - GEN

T1 - Debiasing word embeddings from sentiment associations in names

AU - Hube, Christoph

AU - Idahl, Maximilian

AU - Fetahu, Besnik

N1 - Funding Information: Acknowledgments. This work is funded by the DESIR (grant no. 731081) and SimpleML (grant no. 01IS18054).

PY - 2020/1/20

Y1 - 2020/1/20

N2 - Word embeddings trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. “Smith”), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.

AB - Word embeddings trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. “Smith”), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.

UR - http://www.scopus.com/inward/record.url?scp=85079517449&partnerID=8YFLogxK

U2 - 10.1145/3336191.3371779

DO - 10.1145/3336191.3371779

M3 - Conference contribution

AN - SCOPUS:85079517449

SP - 259

EP - 267

BT - WSDM 2020

T2 - 13th ACM International Conference on Web Search and Data Mining, WSDM 2020

Y2 - 3 February 2020 through 7 February 2020

ER -

Research@Leibniz University

Debiasing word embeddings from sentiment associations in names

Autorschaft

Organisationseinheiten

Details

Abstract

ASJC Scopus Sachgebiete

Zitieren