Interpretability of Deep Neural Models

Sandipan Sikdar; Parantapa Bhattacharya

doi:10.1007/978-981-99-7184-8_8

Details

Original language	English
Title of host publication	Ethics in Artificial Intelligence
Subtitle of host publication	Bias, Fairness and Beyond
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	131-143
Number of pages	13
ISBN (electronic)	978-981-99-7184-8
ISBN (print)	978-981-99-7186-2, 978-981-99-7183-1
Publication status	Published - 30 Dec 2023

Publication series

Name	Studies in Computational Intelligence
Volume	1123
ISSN (Print)	1860-949X
ISSN (electronic)	1860-9503

Abstract

The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

ASJC Scopus subject areas

Computer Science(all)
Artificial Intelligence

Cite this

Interpretability of Deep Neural Models. / Sikdar, Sandipan; Bhattacharya, Parantapa.
Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH, 2023. p. 131-143 (Studies in Computational Intelligence; Vol. 1123).

Research output: Chapter in book/report/conference proceeding › Contribution to book/anthology › Research › peer review

Sikdar, S & Bhattacharya, P 2023, Interpretability of Deep Neural Models. in Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Studies in Computational Intelligence, vol. 1123, Springer Science and Business Media Deutschland GmbH, pp. 131-143. https://doi.org/10.1007/978-981-99-7184-8_8

Sikdar, S., & Bhattacharya, P. (2023). Interpretability of Deep Neural Models. In Ethics in Artificial Intelligence: Bias, Fairness and Beyond (pp. 131-143). (Studies in Computational Intelligence; Vol. 1123). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-7184-8_8

Sikdar S, Bhattacharya P. Interpretability of Deep Neural Models. In Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH. 2023. p. 131-143. (Studies in Computational Intelligence). doi: 10.1007/978-981-99-7184-8_8

Sikdar, Sandipan ; Bhattacharya, Parantapa. / Interpretability of Deep Neural Models. Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH, 2023. pp. 131-143 (Studies in Computational Intelligence).

Download

@inbook{da3b89af1dce4fe0bd9c6b884e05076e,

title = "Interpretability of Deep Neural Models",

abstract = "The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.",

author = "Sandipan Sikdar and Parantapa Bhattacharya",

year = "2023",

month = dec,

day = "30",

doi = "10.1007/978-981-99-7184-8_8",

language = "English",

isbn = "978-981-99-7186-2",

series = "Studies in Computational Intelligence",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "131--143",

booktitle = "Ethics in Artificial Intelligence",

address = "Germany",

}

Download

TY - CHAP

T1 - Interpretability of Deep Neural Models

AU - Sikdar, Sandipan

AU - Bhattacharya, Parantapa

PY - 2023/12/30

Y1 - 2023/12/30

N2 - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

AB - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

UR - http://www.scopus.com/inward/record.url?scp=85182477687&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-7184-8_8

DO - 10.1007/978-981-99-7184-8_8

M3 - Contribution to book/anthology

AN - SCOPUS:85182477687

SN - 978-981-99-7186-2

SN - 978-981-99-7183-1

T3 - Studies in Computational Intelligence

SP - 131

EP - 143

BT - Ethics in Artificial Intelligence

PB - Springer Science and Business Media Deutschland GmbH

ER -

Research@Leibniz University

Interpretability of Deep Neural Models

Authors

Research Organisations

External Research Organisations

Details

Publication series

Abstract

ASJC Scopus subject areas

Cite this