Loading [MathJax]/extensions/tex2jax.js

Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer

Research output: ThesisDoctoral thesis

Authors

  • Oskar Pusz

Details

Original languageEnglish
QualificationDoctor of Engineering
Awarding Institution
Supervised by
  • Daniel Lohmann, Supervisor
Date of Award19 Aug 2024
Place of PublicationHannover
Publication statusPublished - 27 Aug 2024

Abstract

Due to shrinking transistor structure sizes and operating voltages, hardware becomes more suscepti- ble to transient hardware faults. In the domain of safety-critical systems, fault injection campaigns on the instruction-set–architecture layer have become a widespread approach to assess the resilience of a system concerning this kind of fault. Full fault-injection campaigns are an approach to systematically assess the reliability of a system and the effectiveness of implemented software-based hardening techniques on fixed hardware. A straightforward fault-injection campaign may result in practically unrealizable runtimes, especially when aiming for a comprehensive and complete reliability analysis of the system under test. Established acceleration methods are common to either reduce the number of necessary fault injections or speed up individual injections, ultimately decreasing the overall runtime of the whole campaign. However, despite the effectiveness of these established methods, the runtimes may still be infeasible, the campaign results lack precision, or the focus might be limited to specific aspects of the system under test only. This dissertation introduces three new approaches to handling these challenges. The approaches use extracted program structures of the executed software, tailored to the running program inde- pendent from system behaviors under evaluation. The first approach extracts the data flow and instruction semantics to utilize instructions’ propa- gation and masking effects through a data flow graph. Compared to the ground truth method, my data-flow-sensitive acceleration method significantly reduces the number of necessary injections for a comprehensive reliability analysis by up to 18.4 percent precisely. The second approach utilizes extracted dynamic jump addresses to represent the control flow, partitioning the program’s execution into temporal segments. These fault-space regions operate as distinct entities, each with its data flow potentially flowing from one to the next. Injecting the traversing data flows and approximating their results to the other non-traversing data flows leads to an injection reduction of up to 77.5 percent system-wide, accompanied by an approximation error of only 2 percent and a strong locality of the results. The last contribution focuses on accelerating individual injections that do not lead to the ter- mination of systems, thus, reaching a fixed timeout threshold. This work presents an analysis of timeouts in this context and initial approaches to predict such timeouts during runtime. The final part of this contribution is the timeout detector, ACTOR. This detector uses autocorrelation to detect whether patterns exist in the program’s taken jumps, thereby approximating whether the program is in a loop. ACTOR can achieve end-to-end campaign accelerations of up to 27.6 percent through timeout predictions in individual injections. Thereby, the absolute prediction error is always less than 0.5 percent. The methods developed in this work expand the overall portfolio of potential acceleration methods in the fault-injection community. These generically designed methods, implemented and evaluated in the instruction-set–architecture layer, can also be conceptually applied to other system layers. They offer versatility and are seamlessly combinable with each other and established acceleration methods.

Cite this

Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer. / Pusz, Oskar.
Hannover, 2024. 207 p.

Research output: ThesisDoctoral thesis

Download
@phdthesis{b195d784d94540b09ecd1508707ec9b5,
title = "Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer",
abstract = "Due to shrinking transistor structure sizes and operating voltages, hardware becomes more suscepti- ble to transient hardware faults. In the domain of safety-critical systems, fault injection campaigns on the instruction-set–architecture layer have become a widespread approach to assess the resilience of a system concerning this kind of fault. Full fault-injection campaigns are an approach to systematically assess the reliability of a system and the effectiveness of implemented software-based hardening techniques on fixed hardware. A straightforward fault-injection campaign may result in practically unrealizable runtimes, especially when aiming for a comprehensive and complete reliability analysis of the system under test. Established acceleration methods are common to either reduce the number of necessary fault injections or speed up individual injections, ultimately decreasing the overall runtime of the whole campaign. However, despite the effectiveness of these established methods, the runtimes may still be infeasible, the campaign results lack precision, or the focus might be limited to specific aspects of the system under test only. This dissertation introduces three new approaches to handling these challenges. The approaches use extracted program structures of the executed software, tailored to the running program inde- pendent from system behaviors under evaluation. The first approach extracts the data flow and instruction semantics to utilize instructions{\textquoteright} propa- gation and masking effects through a data flow graph. Compared to the ground truth method, my data-flow-sensitive acceleration method significantly reduces the number of necessary injections for a comprehensive reliability analysis by up to 18.4 percent precisely. The second approach utilizes extracted dynamic jump addresses to represent the control flow, partitioning the program{\textquoteright}s execution into temporal segments. These fault-space regions operate as distinct entities, each with its data flow potentially flowing from one to the next. Injecting the traversing data flows and approximating their results to the other non-traversing data flows leads to an injection reduction of up to 77.5 percent system-wide, accompanied by an approximation error of only 2 percent and a strong locality of the results. The last contribution focuses on accelerating individual injections that do not lead to the ter- mination of systems, thus, reaching a fixed timeout threshold. This work presents an analysis of timeouts in this context and initial approaches to predict such timeouts during runtime. The final part of this contribution is the timeout detector, ACTOR. This detector uses autocorrelation to detect whether patterns exist in the program{\textquoteright}s taken jumps, thereby approximating whether the program is in a loop. ACTOR can achieve end-to-end campaign accelerations of up to 27.6 percent through timeout predictions in individual injections. Thereby, the absolute prediction error is always less than 0.5 percent. The methods developed in this work expand the overall portfolio of potential acceleration methods in the fault-injection community. These generically designed methods, implemented and evaluated in the instruction-set–architecture layer, can also be conceptually applied to other system layers. They offer versatility and are seamlessly combinable with each other and established acceleration methods.",
author = "Oskar Pusz",
year = "2024",
month = aug,
day = "27",
doi = "10.15488/17924",
language = "English",
school = "Leibniz University Hannover",

}

Download

TY - BOOK

T1 - Program-structure-guided reduction of the execution time of fault-injection campaigns on the ISA layer

AU - Pusz, Oskar

PY - 2024/8/27

Y1 - 2024/8/27

N2 - Due to shrinking transistor structure sizes and operating voltages, hardware becomes more suscepti- ble to transient hardware faults. In the domain of safety-critical systems, fault injection campaigns on the instruction-set–architecture layer have become a widespread approach to assess the resilience of a system concerning this kind of fault. Full fault-injection campaigns are an approach to systematically assess the reliability of a system and the effectiveness of implemented software-based hardening techniques on fixed hardware. A straightforward fault-injection campaign may result in practically unrealizable runtimes, especially when aiming for a comprehensive and complete reliability analysis of the system under test. Established acceleration methods are common to either reduce the number of necessary fault injections or speed up individual injections, ultimately decreasing the overall runtime of the whole campaign. However, despite the effectiveness of these established methods, the runtimes may still be infeasible, the campaign results lack precision, or the focus might be limited to specific aspects of the system under test only. This dissertation introduces three new approaches to handling these challenges. The approaches use extracted program structures of the executed software, tailored to the running program inde- pendent from system behaviors under evaluation. The first approach extracts the data flow and instruction semantics to utilize instructions’ propa- gation and masking effects through a data flow graph. Compared to the ground truth method, my data-flow-sensitive acceleration method significantly reduces the number of necessary injections for a comprehensive reliability analysis by up to 18.4 percent precisely. The second approach utilizes extracted dynamic jump addresses to represent the control flow, partitioning the program’s execution into temporal segments. These fault-space regions operate as distinct entities, each with its data flow potentially flowing from one to the next. Injecting the traversing data flows and approximating their results to the other non-traversing data flows leads to an injection reduction of up to 77.5 percent system-wide, accompanied by an approximation error of only 2 percent and a strong locality of the results. The last contribution focuses on accelerating individual injections that do not lead to the ter- mination of systems, thus, reaching a fixed timeout threshold. This work presents an analysis of timeouts in this context and initial approaches to predict such timeouts during runtime. The final part of this contribution is the timeout detector, ACTOR. This detector uses autocorrelation to detect whether patterns exist in the program’s taken jumps, thereby approximating whether the program is in a loop. ACTOR can achieve end-to-end campaign accelerations of up to 27.6 percent through timeout predictions in individual injections. Thereby, the absolute prediction error is always less than 0.5 percent. The methods developed in this work expand the overall portfolio of potential acceleration methods in the fault-injection community. These generically designed methods, implemented and evaluated in the instruction-set–architecture layer, can also be conceptually applied to other system layers. They offer versatility and are seamlessly combinable with each other and established acceleration methods.

AB - Due to shrinking transistor structure sizes and operating voltages, hardware becomes more suscepti- ble to transient hardware faults. In the domain of safety-critical systems, fault injection campaigns on the instruction-set–architecture layer have become a widespread approach to assess the resilience of a system concerning this kind of fault. Full fault-injection campaigns are an approach to systematically assess the reliability of a system and the effectiveness of implemented software-based hardening techniques on fixed hardware. A straightforward fault-injection campaign may result in practically unrealizable runtimes, especially when aiming for a comprehensive and complete reliability analysis of the system under test. Established acceleration methods are common to either reduce the number of necessary fault injections or speed up individual injections, ultimately decreasing the overall runtime of the whole campaign. However, despite the effectiveness of these established methods, the runtimes may still be infeasible, the campaign results lack precision, or the focus might be limited to specific aspects of the system under test only. This dissertation introduces three new approaches to handling these challenges. The approaches use extracted program structures of the executed software, tailored to the running program inde- pendent from system behaviors under evaluation. The first approach extracts the data flow and instruction semantics to utilize instructions’ propa- gation and masking effects through a data flow graph. Compared to the ground truth method, my data-flow-sensitive acceleration method significantly reduces the number of necessary injections for a comprehensive reliability analysis by up to 18.4 percent precisely. The second approach utilizes extracted dynamic jump addresses to represent the control flow, partitioning the program’s execution into temporal segments. These fault-space regions operate as distinct entities, each with its data flow potentially flowing from one to the next. Injecting the traversing data flows and approximating their results to the other non-traversing data flows leads to an injection reduction of up to 77.5 percent system-wide, accompanied by an approximation error of only 2 percent and a strong locality of the results. The last contribution focuses on accelerating individual injections that do not lead to the ter- mination of systems, thus, reaching a fixed timeout threshold. This work presents an analysis of timeouts in this context and initial approaches to predict such timeouts during runtime. The final part of this contribution is the timeout detector, ACTOR. This detector uses autocorrelation to detect whether patterns exist in the program’s taken jumps, thereby approximating whether the program is in a loop. ACTOR can achieve end-to-end campaign accelerations of up to 27.6 percent through timeout predictions in individual injections. Thereby, the absolute prediction error is always less than 0.5 percent. The methods developed in this work expand the overall portfolio of potential acceleration methods in the fault-injection community. These generically designed methods, implemented and evaluated in the instruction-set–architecture layer, can also be conceptually applied to other system layers. They offer versatility and are seamlessly combinable with each other and established acceleration methods.

U2 - 10.15488/17924

DO - 10.15488/17924

M3 - Doctoral thesis

CY - Hannover

ER -