Effectiveness of Fault Detection Mechanisms in Static and Dynamic Operating System Designs

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Martin Hoffmann
  • Christoph Borchert
  • Christian Dietrich
  • Horst Schirmeier
  • Rudiger Kapitza
  • Olaf Spinczyk
  • Daniel Lohmann

External Research Organisations

  • Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU Erlangen-Nürnberg)
  • TU Dortmund University
  • Technische Universität Braunschweig
View graph of relations

Details

Original languageEnglish
Title of host publicationIEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages230-237
Number of pages8
ISBN (electronic)9781479944309
Publication statusPublished - 18 Sept 2014
Externally publishedYes
Event17th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014 - Reno, United States
Duration: 10 Jun 201412 Jun 2014

Abstract

Developers of embedded (real-time) systems can choose from a variety of operating systems. While some embedded operating systems provide very flexible APIs, e.g., a POSIX-compliant interface for run-time management, others have a completely static structure, which is generated at compile time by utilizing detailed application knowledge. A prominent example for the latter class from the domain of automotive operating systems is OSEK/OS and its successor AUTOSAR/OS. As we have shown in previous work, the design of the operating system has a strong impact on its vulnerability for system failure caused by hardware faults. This observation is gaining importance, because there is an ongoing trend towards low-power and low-cost, yet less reliable, hardware. This work quantifies the difference in vulnerability for soft errors in main memory of a flexible (dynamic) operating systems (eCos) and a static system (CiAO), which has an OSEK-compliant structure. We also analyze the additional degree of robustness that is achieved by hardening an operating system with software-based and hardware-based fault-tolerance measures and the corresponding costs. Covering this design space gives developers a better chance for good design decisions with respect to the trade-off between fault tolerance, resource consumption, and interface convenience. Our results indicate that with a combination of hardware- and software-based fault-tolerance measures, silent data corruptions in both operating systems can be reduced to below one percent (compared to eCos). However, the analyzed fault-tolerance mechanisms are expensive for the dynamic system, whereas the statically designed operating system can be hardened at much lower price.

Keywords

    AUTOSAR, Dependability, ecos, Fault Injection, Operating System, OSEK, Real-time System, Reliability

ASJC Scopus subject areas

Cite this

Effectiveness of Fault Detection Mechanisms in Static and Dynamic Operating System Designs. / Hoffmann, Martin; Borchert, Christoph; Dietrich, Christian et al.
IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 230-237 6899154.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Hoffmann, M, Borchert, C, Dietrich, C, Schirmeier, H, Kapitza, R, Spinczyk, O & Lohmann, D 2014, Effectiveness of Fault Detection Mechanisms in Static and Dynamic Operating System Designs. in IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014., 6899154, Institute of Electrical and Electronics Engineers Inc., pp. 230-237, 17th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014, Reno, United States, 10 Jun 2014. https://doi.org/10.1109/ISORC.2014.26
Hoffmann, M., Borchert, C., Dietrich, C., Schirmeier, H., Kapitza, R., Spinczyk, O., & Lohmann, D. (2014). Effectiveness of Fault Detection Mechanisms in Static and Dynamic Operating System Designs. In IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014 (pp. 230-237). Article 6899154 Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ISORC.2014.26
Hoffmann M, Borchert C, Dietrich C, Schirmeier H, Kapitza R, Spinczyk O et al. Effectiveness of Fault Detection Mechanisms in Static and Dynamic Operating System Designs. In IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 230-237. 6899154 doi: 10.1109/ISORC.2014.26
Hoffmann, Martin ; Borchert, Christoph ; Dietrich, Christian et al. / Effectiveness of Fault Detection Mechanisms in Static and Dynamic Operating System Designs. IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 230-237
Download
@inproceedings{c7aa9927a1f94ca4af790171cf730216,
title = "Effectiveness of Fault Detection Mechanisms in Static and Dynamic Operating System Designs",
abstract = "Developers of embedded (real-time) systems can choose from a variety of operating systems. While some embedded operating systems provide very flexible APIs, e.g., a POSIX-compliant interface for run-time management, others have a completely static structure, which is generated at compile time by utilizing detailed application knowledge. A prominent example for the latter class from the domain of automotive operating systems is OSEK/OS and its successor AUTOSAR/OS. As we have shown in previous work, the design of the operating system has a strong impact on its vulnerability for system failure caused by hardware faults. This observation is gaining importance, because there is an ongoing trend towards low-power and low-cost, yet less reliable, hardware. This work quantifies the difference in vulnerability for soft errors in main memory of a flexible (dynamic) operating systems (eCos) and a static system (CiAO), which has an OSEK-compliant structure. We also analyze the additional degree of robustness that is achieved by hardening an operating system with software-based and hardware-based fault-tolerance measures and the corresponding costs. Covering this design space gives developers a better chance for good design decisions with respect to the trade-off between fault tolerance, resource consumption, and interface convenience. Our results indicate that with a combination of hardware- and software-based fault-tolerance measures, silent data corruptions in both operating systems can be reduced to below one percent (compared to eCos). However, the analyzed fault-tolerance mechanisms are expensive for the dynamic system, whereas the statically designed operating system can be hardened at much lower price.",
keywords = "AUTOSAR, Dependability, ecos, Fault Injection, Operating System, OSEK, Real-time System, Reliability",
author = "Martin Hoffmann and Christoph Borchert and Christian Dietrich and Horst Schirmeier and Rudiger Kapitza and Olaf Spinczyk and Daniel Lohmann",
year = "2014",
month = sep,
day = "18",
doi = "10.1109/ISORC.2014.26",
language = "English",
pages = "230--237",
booktitle = "IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",
note = "17th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014 ; Conference date: 10-06-2014 Through 12-06-2014",

}

Download

TY - GEN

T1 - Effectiveness of Fault Detection Mechanisms in Static and Dynamic Operating System Designs

AU - Hoffmann, Martin

AU - Borchert, Christoph

AU - Dietrich, Christian

AU - Schirmeier, Horst

AU - Kapitza, Rudiger

AU - Spinczyk, Olaf

AU - Lohmann, Daniel

PY - 2014/9/18

Y1 - 2014/9/18

N2 - Developers of embedded (real-time) systems can choose from a variety of operating systems. While some embedded operating systems provide very flexible APIs, e.g., a POSIX-compliant interface for run-time management, others have a completely static structure, which is generated at compile time by utilizing detailed application knowledge. A prominent example for the latter class from the domain of automotive operating systems is OSEK/OS and its successor AUTOSAR/OS. As we have shown in previous work, the design of the operating system has a strong impact on its vulnerability for system failure caused by hardware faults. This observation is gaining importance, because there is an ongoing trend towards low-power and low-cost, yet less reliable, hardware. This work quantifies the difference in vulnerability for soft errors in main memory of a flexible (dynamic) operating systems (eCos) and a static system (CiAO), which has an OSEK-compliant structure. We also analyze the additional degree of robustness that is achieved by hardening an operating system with software-based and hardware-based fault-tolerance measures and the corresponding costs. Covering this design space gives developers a better chance for good design decisions with respect to the trade-off between fault tolerance, resource consumption, and interface convenience. Our results indicate that with a combination of hardware- and software-based fault-tolerance measures, silent data corruptions in both operating systems can be reduced to below one percent (compared to eCos). However, the analyzed fault-tolerance mechanisms are expensive for the dynamic system, whereas the statically designed operating system can be hardened at much lower price.

AB - Developers of embedded (real-time) systems can choose from a variety of operating systems. While some embedded operating systems provide very flexible APIs, e.g., a POSIX-compliant interface for run-time management, others have a completely static structure, which is generated at compile time by utilizing detailed application knowledge. A prominent example for the latter class from the domain of automotive operating systems is OSEK/OS and its successor AUTOSAR/OS. As we have shown in previous work, the design of the operating system has a strong impact on its vulnerability for system failure caused by hardware faults. This observation is gaining importance, because there is an ongoing trend towards low-power and low-cost, yet less reliable, hardware. This work quantifies the difference in vulnerability for soft errors in main memory of a flexible (dynamic) operating systems (eCos) and a static system (CiAO), which has an OSEK-compliant structure. We also analyze the additional degree of robustness that is achieved by hardening an operating system with software-based and hardware-based fault-tolerance measures and the corresponding costs. Covering this design space gives developers a better chance for good design decisions with respect to the trade-off between fault tolerance, resource consumption, and interface convenience. Our results indicate that with a combination of hardware- and software-based fault-tolerance measures, silent data corruptions in both operating systems can be reduced to below one percent (compared to eCos). However, the analyzed fault-tolerance mechanisms are expensive for the dynamic system, whereas the statically designed operating system can be hardened at much lower price.

KW - AUTOSAR

KW - Dependability

KW - ecos

KW - Fault Injection

KW - Operating System

KW - OSEK

KW - Real-time System

KW - Reliability

UR - http://www.scopus.com/inward/record.url?scp=84941286164&partnerID=8YFLogxK

U2 - 10.1109/ISORC.2014.26

DO - 10.1109/ISORC.2014.26

M3 - Conference contribution

AN - SCOPUS:84941286164

SP - 230

EP - 237

BT - IEEE 17th International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 17th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014

Y2 - 10 June 2014 through 12 June 2014

ER -