Interpretability of Deep Neural Models

Research output: Chapter in book/report/conference proceedingContribution to book/anthologyResearchpeer review

Authors

  • Sandipan Sikdar
  • Parantapa Bhattacharya

Research Organisations

External Research Organisations

  • University of Virginia
View graph of relations

Details

Original languageEnglish
Title of host publicationEthics in Artificial Intelligence
Subtitle of host publicationBias, Fairness and Beyond
PublisherSpringer Science and Business Media Deutschland GmbH
Pages131-143
Number of pages13
ISBN (electronic)978-981-99-7184-8
ISBN (print)978-981-99-7186-2, 978-981-99-7183-1
Publication statusPublished - 30 Dec 2023

Publication series

NameStudies in Computational Intelligence
Volume1123
ISSN (Print)1860-949X
ISSN (electronic)1860-9503

Abstract

The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

ASJC Scopus subject areas

Cite this

Interpretability of Deep Neural Models. / Sikdar, Sandipan; Bhattacharya, Parantapa.
Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH, 2023. p. 131-143 (Studies in Computational Intelligence; Vol. 1123).

Research output: Chapter in book/report/conference proceedingContribution to book/anthologyResearchpeer review

Sikdar, S & Bhattacharya, P 2023, Interpretability of Deep Neural Models. in Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Studies in Computational Intelligence, vol. 1123, Springer Science and Business Media Deutschland GmbH, pp. 131-143. https://doi.org/10.1007/978-981-99-7184-8_8
Sikdar, S., & Bhattacharya, P. (2023). Interpretability of Deep Neural Models. In Ethics in Artificial Intelligence: Bias, Fairness and Beyond (pp. 131-143). (Studies in Computational Intelligence; Vol. 1123). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-7184-8_8
Sikdar S, Bhattacharya P. Interpretability of Deep Neural Models. In Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH. 2023. p. 131-143. (Studies in Computational Intelligence). doi: 10.1007/978-981-99-7184-8_8
Sikdar, Sandipan ; Bhattacharya, Parantapa. / Interpretability of Deep Neural Models. Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH, 2023. pp. 131-143 (Studies in Computational Intelligence).
Download
@inbook{da3b89af1dce4fe0bd9c6b884e05076e,
title = "Interpretability of Deep Neural Models",
abstract = "The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.",
author = "Sandipan Sikdar and Parantapa Bhattacharya",
year = "2023",
month = dec,
day = "30",
doi = "10.1007/978-981-99-7184-8_8",
language = "English",
isbn = "978-981-99-7186-2",
series = "Studies in Computational Intelligence",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "131--143",
booktitle = "Ethics in Artificial Intelligence",
address = "Germany",

}

Download

TY - CHAP

T1 - Interpretability of Deep Neural Models

AU - Sikdar, Sandipan

AU - Bhattacharya, Parantapa

PY - 2023/12/30

Y1 - 2023/12/30

N2 - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

AB - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

UR - http://www.scopus.com/inward/record.url?scp=85182477687&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-7184-8_8

DO - 10.1007/978-981-99-7184-8_8

M3 - Contribution to book/anthology

AN - SCOPUS:85182477687

SN - 978-981-99-7186-2

SN - 978-981-99-7183-1

T3 - Studies in Computational Intelligence

SP - 131

EP - 143

BT - Ethics in Artificial Intelligence

PB - Springer Science and Business Media Deutschland GmbH

ER -