Interpretability of Deep Neural Models

Sandipan Sikdar; Parantapa Bhattacharya

doi:10.1007/978-981-99-7184-8_8

Details

Originalsprache	Englisch
Titel des Sammelwerks	Ethics in Artificial Intelligence
Untertitel	Bias, Fairness and Beyond
Herausgeber (Verlag)	Springer Science and Business Media Deutschland GmbH
Seiten	131-143
Seitenumfang	13
ISBN (elektronisch)	978-981-99-7184-8
ISBN (Print)	978-981-99-7186-2, 978-981-99-7183-1
Publikationsstatus	Veröffentlicht - 30 Dez. 2023

Publikationsreihe

Name	Studies in Computational Intelligence
Band	1123
ISSN (Print)	1860-949X
ISSN (elektronisch)	1860-9503

Abstract

The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

ASJC Scopus Sachgebiete

Informatik (insg.)
Artificial intelligence

Zitieren

Interpretability of Deep Neural Models. / Sikdar, Sandipan; Bhattacharya, Parantapa.
Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH, 2023. S. 131-143 (Studies in Computational Intelligence; Band 1123).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Beitrag in Buch/Sammelwerk › Forschung › Peer-Review

Sikdar, S & Bhattacharya, P 2023, Interpretability of Deep Neural Models. in Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Studies in Computational Intelligence, Bd. 1123, Springer Science and Business Media Deutschland GmbH, S. 131-143. https://doi.org/10.1007/978-981-99-7184-8_8

Sikdar, S., & Bhattacharya, P. (2023). Interpretability of Deep Neural Models. In Ethics in Artificial Intelligence: Bias, Fairness and Beyond (S. 131-143). (Studies in Computational Intelligence; Band 1123). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-7184-8_8

Sikdar S, Bhattacharya P. Interpretability of Deep Neural Models. in Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH. 2023. S. 131-143. (Studies in Computational Intelligence). doi: 10.1007/978-981-99-7184-8_8

Sikdar, Sandipan ; Bhattacharya, Parantapa. / Interpretability of Deep Neural Models. Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH, 2023. S. 131-143 (Studies in Computational Intelligence).

Download

@inbook{da3b89af1dce4fe0bd9c6b884e05076e,

title = "Interpretability of Deep Neural Models",

abstract = "The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.",

author = "Sandipan Sikdar and Parantapa Bhattacharya",

year = "2023",

month = dec,

day = "30",

doi = "10.1007/978-981-99-7184-8_8",

language = "English",

isbn = "978-981-99-7186-2",

series = "Studies in Computational Intelligence",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "131--143",

booktitle = "Ethics in Artificial Intelligence",

address = "Germany",

}

Download

TY - CHAP

T1 - Interpretability of Deep Neural Models

AU - Sikdar, Sandipan

AU - Bhattacharya, Parantapa

PY - 2023/12/30

Y1 - 2023/12/30

N2 - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

AB - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

UR - http://www.scopus.com/inward/record.url?scp=85182477687&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-7184-8_8

DO - 10.1007/978-981-99-7184-8_8

M3 - Contribution to book/anthology

AN - SCOPUS:85182477687

SN - 978-981-99-7186-2

SN - 978-981-99-7183-1

T3 - Studies in Computational Intelligence

SP - 131

EP - 143

BT - Ethics in Artificial Intelligence

PB - Springer Science and Business Media Deutschland GmbH

ER -

Research@Leibniz University

Interpretability of Deep Neural Models

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren