Criteria and Metrics for the Explainability of Software

Hannah Luca Deters

Details

Original language	English
Qualification	Master of Science
Awarding Institution	Leibniz University Hannover
Supervised by	Schneider, K., Supervisor
Place of Publication	Hannover
Publication status	Published - 28 Sept 2022

Abstract

In this master thesis, a concept for the evaluation of explainability in software
systems was developed. For this purpose, a comprehensive literature review
was conducted in which 86 relevant papers were obtained from an initial
set of 1025 papers. These papers contributed to the conceptualization of
the evaluation method. During this conceptualization, it was found that
the characteristics of explainability are strongly linked to the objective that
the explanations are supposed to achieve. It became clear that it is not
possible to achieve a satisfactory result if the evaluation of explainability
does not take these objectives into account. What has also been noticed
is that the literature already provides methods for evaluating single aspects
of explainability, but these consist almost exclusively of user studies. Since
conducting multiple user studies would be unrealistically expensive in non-
research settings, heuristics were developed to provide a first estimate of
explainability. Overall, an overarching concept was developed that links the
definition of objectives, the initial assessment with heuristics, and the more
reliable evaluation with user studies.
In the second part of the master’s thesis, a user study was conducted to
evaluate whether the developed heuristics produce reliable results. For this
purpose, the interrater agreement was examined to see whether the heuristics
allow uniform ratings. It was found that a group of evaluators together can
produce a uniform result. Significance tests were then used to determine
whether the heuristics are capable of identifying significant differences in the
explainability of two systems. It was found that significant differences were
revealed within the different aspects of explainability.

ASJC Scopus subject areas

Computer Science(all)

Cite this

Criteria and Metrics for the Explainability of Software. / Deters, Hannah Luca.
Hannover, 2022. 114 p.

Research output: Thesis › Master's thesis

Deters, HL 2022, 'Criteria and Metrics for the Explainability of Software', Master of Science, Leibniz University Hannover, Hannover. <https://www.pi.uni-hannover.de/fileadmin/pi/se/Stud-Arbeiten/2022/MA-Deters-2022.pdf>

Deters, H. L. (2022). Criteria and Metrics for the Explainability of Software. [Master's thesis, Leibniz University Hannover]. https://www.pi.uni-hannover.de/fileadmin/pi/se/Stud-Arbeiten/2022/MA-Deters-2022.pdf

Deters HL. Criteria and Metrics for the Explainability of Software. Hannover, 2022. 114 p.

Deters, Hannah Luca. / Criteria and Metrics for the Explainability of Software. Hannover, 2022. 114 p.

Download

@mastersthesis{bc2a1123353b435a9e6259b640015c3e,

title = "Criteria and Metrics for the Explainability of Software",

abstract = "In this master thesis, a concept for the evaluation of explainability in softwaresystems was developed. For this purpose, a comprehensive literature reviewwas conducted in which 86 relevant papers were obtained from an initialset of 1025 papers. These papers contributed to the conceptualization ofthe evaluation method. During this conceptualization, it was found thatthe characteristics of explainability are strongly linked to the objective thatthe explanations are supposed to achieve. It became clear that it is notpossible to achieve a satisfactory result if the evaluation of explainabilitydoes not take these objectives into account. What has also been noticedis that the literature already provides methods for evaluating single aspectsof explainability, but these consist almost exclusively of user studies. Sinceconducting multiple user studies would be unrealistically expensive in non-research settings, heuristics were developed to provide a first estimate ofexplainability. Overall, an overarching concept was developed that links thedefinition of objectives, the initial assessment with heuristics, and the morereliable evaluation with user studies.In the second part of the master{\textquoteright}s thesis, a user study was conducted toevaluate whether the developed heuristics produce reliable results. For thispurpose, the interrater agreement was examined to see whether the heuristicsallow uniform ratings. It was found that a group of evaluators together canproduce a uniform result. Significance tests were then used to determinewhether the heuristics are capable of identifying significant differences in theexplainability of two systems. It was found that significant differences wererevealed within the different aspects of explainability.",

author = "Deters, {Hannah Luca}",

year = "2022",

month = sep,

day = "28",

language = "English",

school = "Leibniz University Hannover",

}

Download

TY - GEN

T1 - Criteria and Metrics for the Explainability of Software

AU - Deters, Hannah Luca

PY - 2022/9/28

Y1 - 2022/9/28

N2 - In this master thesis, a concept for the evaluation of explainability in softwaresystems was developed. For this purpose, a comprehensive literature reviewwas conducted in which 86 relevant papers were obtained from an initialset of 1025 papers. These papers contributed to the conceptualization ofthe evaluation method. During this conceptualization, it was found thatthe characteristics of explainability are strongly linked to the objective thatthe explanations are supposed to achieve. It became clear that it is notpossible to achieve a satisfactory result if the evaluation of explainabilitydoes not take these objectives into account. What has also been noticedis that the literature already provides methods for evaluating single aspectsof explainability, but these consist almost exclusively of user studies. Sinceconducting multiple user studies would be unrealistically expensive in non-research settings, heuristics were developed to provide a first estimate ofexplainability. Overall, an overarching concept was developed that links thedefinition of objectives, the initial assessment with heuristics, and the morereliable evaluation with user studies.In the second part of the master’s thesis, a user study was conducted toevaluate whether the developed heuristics produce reliable results. For thispurpose, the interrater agreement was examined to see whether the heuristicsallow uniform ratings. It was found that a group of evaluators together canproduce a uniform result. Significance tests were then used to determinewhether the heuristics are capable of identifying significant differences in theexplainability of two systems. It was found that significant differences wererevealed within the different aspects of explainability.

AB - In this master thesis, a concept for the evaluation of explainability in softwaresystems was developed. For this purpose, a comprehensive literature reviewwas conducted in which 86 relevant papers were obtained from an initialset of 1025 papers. These papers contributed to the conceptualization ofthe evaluation method. During this conceptualization, it was found thatthe characteristics of explainability are strongly linked to the objective thatthe explanations are supposed to achieve. It became clear that it is notpossible to achieve a satisfactory result if the evaluation of explainabilitydoes not take these objectives into account. What has also been noticedis that the literature already provides methods for evaluating single aspectsof explainability, but these consist almost exclusively of user studies. Sinceconducting multiple user studies would be unrealistically expensive in non-research settings, heuristics were developed to provide a first estimate ofexplainability. Overall, an overarching concept was developed that links thedefinition of objectives, the initial assessment with heuristics, and the morereliable evaluation with user studies.In the second part of the master’s thesis, a user study was conducted toevaluate whether the developed heuristics produce reliable results. For thispurpose, the interrater agreement was examined to see whether the heuristicsallow uniform ratings. It was found that a group of evaluators together canproduce a uniform result. Significance tests were then used to determinewhether the heuristics are capable of identifying significant differences in theexplainability of two systems. It was found that significant differences wererevealed within the different aspects of explainability.

M3 - Master's thesis

CY - Hannover

ER -

Research@Leibniz University

Criteria and Metrics for the Explainability of Software

Authors

Research Organisations

Details

Abstract

ASJC Scopus subject areas

Cite this

By the same author(s)

What you see is what you trace: a two-stage interview study on traceability practices and eye tracking potential

Paving the Way Towards an Effective Vision Video Usage: An Exploratory Study

Human factors in model-driven engineering: future research goals and initiatives for MDE

How Explainable Is Your System? Towards a Quality Model for Explainability

Turning asynchronicity into an opportunity: asynchronous communication for shared understanding with vision videos