Details
Original language | English |
---|---|
Title of host publication | WSDM 2020 |
Subtitle of host publication | Proceedings of the 13th International Conference on Web Search and Data Mining |
Pages | 259-267 |
Number of pages | 9 |
ISBN (electronic) | 9781450368223 |
Publication status | Published - 20 Jan 2020 |
Event | 13th ACM International Conference on Web Search and Data Mining, WSDM 2020 - Houston, United States Duration: 3 Feb 2020 → 7 Feb 2020 |
Abstract
Word embeddings trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. “Smith”), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.
ASJC Scopus subject areas
- Computer Science(all)
- Computer Networks and Communications
- Computer Science(all)
- Software
- Computer Science(all)
- Computer Science Applications
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
WSDM 2020: Proceedings of the 13th International Conference on Web Search and Data Mining. 2020. p. 259-267.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Debiasing word embeddings from sentiment associations in names
AU - Hube, Christoph
AU - Idahl, Maximilian
AU - Fetahu, Besnik
N1 - Funding Information: Acknowledgments. This work is funded by the DESIR (grant no. 731081) and SimpleML (grant no. 01IS18054).
PY - 2020/1/20
Y1 - 2020/1/20
N2 - Word embeddings trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. “Smith”), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.
AB - Word embeddings trained through models like skip-gram, have shown to be prone to capturing the biases from the training corpus, e.g. gender bias. Such biases are unwanted as they spill in downstream tasks, thus, leading to discriminatory behavior. In this work, we address the problem of prior sentiment associated with names in word embeddings where for a given name representation (e.g. “Smith”), a sentiment classifier will categorize it as either positive or negative. We propose DebiasEmb, a skip-gram based word embedding approach that, for a given oracle sentiment classification model, will debias the name representations, such that they cannot be associated with either positive or negative sentiment. Evaluation on standard word embedding benchmarks and a downstream analysis show that our approach is able to maintain a high quality of embeddings and at the same time mitigate sentiment bias in name embeddings.
UR - http://www.scopus.com/inward/record.url?scp=85079517449&partnerID=8YFLogxK
U2 - 10.1145/3336191.3371779
DO - 10.1145/3336191.3371779
M3 - Conference contribution
AN - SCOPUS:85079517449
SP - 259
EP - 267
BT - WSDM 2020
T2 - 13th ACM International Conference on Web Search and Data Mining, WSDM 2020
Y2 - 3 February 2020 through 7 February 2020
ER -