Interpretability of Deep Neural Models

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandBeitrag in Buch/SammelwerkForschungPeer-Review

Autoren

  • Sandipan Sikdar
  • Parantapa Bhattacharya

Organisationseinheiten

Externe Organisationen

  • University of Virginia
Forschungs-netzwerk anzeigen

Details

OriginalspracheEnglisch
Titel des SammelwerksEthics in Artificial Intelligence
UntertitelBias, Fairness and Beyond
Herausgeber (Verlag)Springer Science and Business Media Deutschland GmbH
Seiten131-143
Seitenumfang13
ISBN (elektronisch)978-981-99-7184-8
ISBN (Print)978-981-99-7186-2, 978-981-99-7183-1
PublikationsstatusVeröffentlicht - 30 Dez. 2023

Publikationsreihe

NameStudies in Computational Intelligence
Band1123
ISSN (Print)1860-949X
ISSN (elektronisch)1860-9503

Abstract

The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

ASJC Scopus Sachgebiete

Zitieren

Interpretability of Deep Neural Models. / Sikdar, Sandipan; Bhattacharya, Parantapa.
Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH, 2023. S. 131-143 (Studies in Computational Intelligence; Band 1123).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandBeitrag in Buch/SammelwerkForschungPeer-Review

Sikdar, S & Bhattacharya, P 2023, Interpretability of Deep Neural Models. in Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Studies in Computational Intelligence, Bd. 1123, Springer Science and Business Media Deutschland GmbH, S. 131-143. https://doi.org/10.1007/978-981-99-7184-8_8
Sikdar, S., & Bhattacharya, P. (2023). Interpretability of Deep Neural Models. In Ethics in Artificial Intelligence: Bias, Fairness and Beyond (S. 131-143). (Studies in Computational Intelligence; Band 1123). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-7184-8_8
Sikdar S, Bhattacharya P. Interpretability of Deep Neural Models. in Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH. 2023. S. 131-143. (Studies in Computational Intelligence). doi: 10.1007/978-981-99-7184-8_8
Sikdar, Sandipan ; Bhattacharya, Parantapa. / Interpretability of Deep Neural Models. Ethics in Artificial Intelligence: Bias, Fairness and Beyond. Springer Science and Business Media Deutschland GmbH, 2023. S. 131-143 (Studies in Computational Intelligence).
Download
@inbook{da3b89af1dce4fe0bd9c6b884e05076e,
title = "Interpretability of Deep Neural Models",
abstract = "The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.",
author = "Sandipan Sikdar and Parantapa Bhattacharya",
year = "2023",
month = dec,
day = "30",
doi = "10.1007/978-981-99-7184-8_8",
language = "English",
isbn = "978-981-99-7186-2",
series = "Studies in Computational Intelligence",
publisher = "Springer Science and Business Media Deutschland GmbH",
pages = "131--143",
booktitle = "Ethics in Artificial Intelligence",
address = "Germany",

}

Download

TY - CHAP

T1 - Interpretability of Deep Neural Models

AU - Sikdar, Sandipan

AU - Bhattacharya, Parantapa

PY - 2023/12/30

Y1 - 2023/12/30

N2 - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

AB - The rise of deep neural networks in machine learning has been remarkable, leading to their deployment in algorithmic decision-making. However, this has raised questions about the explainability and interpretability of these models, given their growing importance in society. To address this, the field of interpretability in machine learning has been developed, with the goal of creating frameworks that can explain the decisions of a machine learning model in a way that is comprehensible to humans. This could be essential in building trust in the system, as well as debugging models for potential errors and meeting legal requirements (e.g., GDPR). Even though the success of deep neural network is attributed to its ability to capture higher level feature interactions, most of existing frameworks still focus on highlighting important individual features (e.g., words in text or pixels in images). Hence, to further improve interpretability, we propose to quantify the importance of feature interactions in addition to individual features. In this work, we introduce integrated directional gradients (IDG), a game-theory inspired method for assigning importance scores to higher level feature interactions. Our experiments with DNN-based text classifiers on the task of sentiment classification demonstrate that IDG is able to effectively capture the importance of feature interactions.

UR - http://www.scopus.com/inward/record.url?scp=85182477687&partnerID=8YFLogxK

U2 - 10.1007/978-981-99-7184-8_8

DO - 10.1007/978-981-99-7184-8_8

M3 - Contribution to book/anthology

AN - SCOPUS:85182477687

SN - 978-981-99-7186-2

SN - 978-981-99-7183-1

T3 - Studies in Computational Intelligence

SP - 131

EP - 143

BT - Ethics in Artificial Intelligence

PB - Springer Science and Business Media Deutschland GmbH

ER -