Details
Originalsprache | Englisch |
---|---|
Titel des Sammelwerks | Ethics in Artificial Intelligence |
Untertitel | Bias, Fairness and Beyond |
Herausgeber (Verlag) | Springer Science and Business Media Deutschland GmbH |
Seiten | 131-143 |
Seitenumfang | 13 |
ISBN (elektronisch) | 978-981-99-7184-8 |
ISBN (Print) | 978-981-99-7186-2, 978-981-99-7183-1 |
Publikationsstatus | Veröffentlicht - 30 Dez. 2023 |
Publikationsreihe
Name | Studies in Computational Intelligence |
---|---|
Band | 1123 |
ISSN (Print) | 1860-949X |
ISSN (elektronisch) | 1860-9503 |
Abstract
The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.
ASJC Scopus Sachgebiete
- Informatik (insg.)
- Artificial intelligence
Zitieren
- Standard
- Harvard
- Apa
- Vancouver
- BibTex
- RIS
Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH, 2023. S. 131-143 (Studies in Computational Intelligence; Band 1123).
Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Beitrag in Buch/Sammelwerk › Forschung › Peer-Review
}
TY - CHAP
T1 - Interpretability of Deep Neural Models
AU - Sikdar, Sandipan
AU - Bhattacharya, Parantapa
PY - 2023/12/30
Y1 - 2023/12/30
N2 - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.
AB - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.
UR - http://www.scopus.com/inward/record.url?scp=85182477687&partnerID=8YFLogxK
U2 - 10.1007/978-981-99-7184-8_8
DO - 10.1007/978-981-99-7184-8_8
M3 - Contribution to book/anthology
AN - SCOPUS:85182477687
SN - 978-981-99-7186-2
SN - 978-981-99-7183-1
T3 - Studies in Computational Intelligence
SP - 131
EP - 143
BT - Ethics in Artificial Intelligence
PB - Springer Science and Business Media Deutschland GmbH
ER -