ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation

Tim-Marek Thomas; Christian Dietrich; Oskar Pusz; Daniel Lohmann

doi:10.1007/978-3-031-14835-4_17

Details

Originalsprache	Englisch
Titel des Sammelwerks	Computer Safety, Reliability, and Security - 41st International Conference, SAFECOMP 2022, Proceedings
Untertitel	SAFECOMP 2022 - Proceedings
Herausgeber/-innen	Mario Trapp, Francesca Saglietti, Marc Spisländer, Friedemann Bitsch
Seiten	252-266
Seitenumfang	15
ISBN (elektronisch)	978-3-031-14835-4
Publikationsstatus	Veröffentlicht - 25 Aug. 2022
Veranstaltung	41st SAFECOMP: International Conference on Computer Safety, Reliability, and Security, 2022 - Munic, Deutschland Dauer: 6 Sept. 2022 → 9 Sept. 2022

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	13414 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Abstract

Fault-injection (FI) campaigns provide an in-depth resilience analysis of safety-critical systems in the presence of transient hardware faults. However, FI campaigns require many independent injection experiments and, combined, long run times, especially if we aim for a high coverage of the fault space. Besides reducing the number of pilot injections (e.g., with def-use pruning) in the first place, we can also speed up the overall campaign by speeding up individual experiments. From our experiments, we see that the timeout failure class is especially important here: Although timeouts account only for 8% (QSort) of the injections, they require 32% of the campaign run time. In this paper, we analyze and discuss the nature of timeouts as a failure class, and reason about the general design of dynamic timeout detectors. Based on those insights, we propose ACTOR, a method to identify and abort stuck experiments early by performing autocorrelation on the branch-target history. Applied to seven MiBench benchmarks, we can reduce the number of executed post-injection instructions by up to 30%, which translates into an end-to-end saving of 27%. Thereby, the absolute classification error of experiments as timeouts was always less than 0.5%.

ASJC Scopus Sachgebiete

Mathematik (insg.)
Theoretische Informatik
Informatik (insg.)
Allgemeine Computerwissenschaft

Zitieren

ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation. / Thomas, Tim-Marek; Dietrich, Christian; Pusz, Oskar et al.
Computer Safety, Reliability, and Security - 41st International Conference, SAFECOMP 2022, Proceedings: SAFECOMP 2022 - Proceedings. Hrsg. / Mario Trapp; Francesca Saglietti; Marc Spisländer; Friedemann Bitsch. 2022. S. 252-266 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13414 LNCS).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Thomas, T-M, Dietrich, C, Pusz, O & Lohmann, D 2022, ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation. in M Trapp, F Saglietti, M Spisländer & F Bitsch (Hrsg.), Computer Safety, Reliability, and Security - 41st International Conference, SAFECOMP 2022, Proceedings: SAFECOMP 2022 - Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 13414 LNCS, S. 252-266, 41st SAFECOMP: International Conference on Computer Safety, Reliability, and Security, 2022, Munic, Deutschland, 6 Sept. 2022. https://doi.org/10.1007/978-3-031-14835-4_17

Thomas, T.-M., Dietrich, C., Pusz, O., & Lohmann, D. (2022). ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation. In M. Trapp, F. Saglietti, M. Spisländer, & F. Bitsch (Hrsg.), Computer Safety, Reliability, and Security - 41st International Conference, SAFECOMP 2022, Proceedings: SAFECOMP 2022 - Proceedings (S. 252-266). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 13414 LNCS). https://doi.org/10.1007/978-3-031-14835-4_17

Thomas TM, Dietrich C, Pusz O, Lohmann D. ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation. in Trapp M, Saglietti F, Spisländer M, Bitsch F, Hrsg., Computer Safety, Reliability, and Security - 41st International Conference, SAFECOMP 2022, Proceedings: SAFECOMP 2022 - Proceedings. 2022. S. 252-266. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-14835-4_17

Thomas, Tim-Marek ; Dietrich, Christian ; Pusz, Oskar et al. / ACTOR : Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation. Computer Safety, Reliability, and Security - 41st International Conference, SAFECOMP 2022, Proceedings: SAFECOMP 2022 - Proceedings. Hrsg. / Mario Trapp ; Francesca Saglietti ; Marc Spisländer ; Friedemann Bitsch. 2022. S. 252-266 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

Download

@inproceedings{87af925a20b549b49afaa17941c92a7d,

title = "ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation",

abstract = "Fault-injection (FI) campaigns provide an in-depth resilience analysis of safety-critical systems in the presence of transient hardware faults. However, FI campaigns require many independent injection experiments and, combined, long run times, especially if we aim for a high coverage of the fault space. Besides reducing the number of pilot injections (e.g., with def-use pruning) in the first place, we can also speed up the overall campaign by speeding up individual experiments. From our experiments, we see that the timeout failure class is especially important here: Although timeouts account only for 8% (QSort) of the injections, they require 32% of the campaign run time. In this paper, we analyze and discuss the nature of timeouts as a failure class, and reason about the general design of dynamic timeout detectors. Based on those insights, we propose ACTOR, a method to identify and abort stuck experiments early by performing autocorrelation on the branch-target history. Applied to seven MiBench benchmarks, we can reduce the number of executed post-injection instructions by up to 30%, which translates into an end-to-end saving of 27%. Thereby, the absolute classification error of experiments as timeouts was always less than 0.5%.",

author = "Tim-Marek Thomas and Christian Dietrich and Oskar Pusz and Daniel Lohmann",

year = "2022",

month = aug,

day = "25",

doi = "10.1007/978-3-031-14835-4_17",

language = "English",

isbn = "978-3-031-14834-7",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

pages = "252--266",

editor = "Mario Trapp and Francesca Saglietti and Marc Spisl{\"a}nder and Friedemann Bitsch",

booktitle = "Computer Safety, Reliability, and Security - 41st International Conference, SAFECOMP 2022, Proceedings",

note = "41st SAFECOMP: International Conference on Computer Safety, Reliability, and Security, 2022 ; Conference date: 06-09-2022 Through 09-09-2022",

}

Download

TY - GEN

T1 - ACTOR

T2 - 41st SAFECOMP: International Conference on Computer Safety, Reliability, and Security, 2022

AU - Thomas, Tim-Marek

AU - Dietrich, Christian

AU - Pusz, Oskar

AU - Lohmann, Daniel

PY - 2022/8/25

Y1 - 2022/8/25

N2 - Fault-injection (FI) campaigns provide an in-depth resilience analysis of safety-critical systems in the presence of transient hardware faults. However, FI campaigns require many independent injection experiments and, combined, long run times, especially if we aim for a high coverage of the fault space. Besides reducing the number of pilot injections (e.g., with def-use pruning) in the first place, we can also speed up the overall campaign by speeding up individual experiments. From our experiments, we see that the timeout failure class is especially important here: Although timeouts account only for 8% (QSort) of the injections, they require 32% of the campaign run time. In this paper, we analyze and discuss the nature of timeouts as a failure class, and reason about the general design of dynamic timeout detectors. Based on those insights, we propose ACTOR, a method to identify and abort stuck experiments early by performing autocorrelation on the branch-target history. Applied to seven MiBench benchmarks, we can reduce the number of executed post-injection instructions by up to 30%, which translates into an end-to-end saving of 27%. Thereby, the absolute classification error of experiments as timeouts was always less than 0.5%.

AB - Fault-injection (FI) campaigns provide an in-depth resilience analysis of safety-critical systems in the presence of transient hardware faults. However, FI campaigns require many independent injection experiments and, combined, long run times, especially if we aim for a high coverage of the fault space. Besides reducing the number of pilot injections (e.g., with def-use pruning) in the first place, we can also speed up the overall campaign by speeding up individual experiments. From our experiments, we see that the timeout failure class is especially important here: Although timeouts account only for 8% (QSort) of the injections, they require 32% of the campaign run time. In this paper, we analyze and discuss the nature of timeouts as a failure class, and reason about the general design of dynamic timeout detectors. Based on those insights, we propose ACTOR, a method to identify and abort stuck experiments early by performing autocorrelation on the branch-target history. Applied to seven MiBench benchmarks, we can reduce the number of executed post-injection instructions by up to 30%, which translates into an end-to-end saving of 27%. Thereby, the absolute classification error of experiments as timeouts was always less than 0.5%.

UR - http://www.scopus.com/inward/record.url?scp=85137998003&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-14835-4_17

DO - 10.1007/978-3-031-14835-4_17

M3 - Conference contribution

SN - 978-3-031-14834-7

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 252

EP - 266

BT - Computer Safety, Reliability, and Security - 41st International Conference, SAFECOMP 2022, Proceedings

A2 - Trapp, Mario

A2 - Saglietti, Francesca

A2 - Spisländer, Marc

A2 - Bitsch, Friedemann

Y2 - 6 September 2022 through 9 September 2022

ER -

Research@Leibniz University

ACTOR: Accelerating Fault Injection Campaigns Using Timeout Detection Based on Autocorrelation

Autoren

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren