A Practitioner’s Guide to Software-based Soft-Error Mitigation Using AN-Codes

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Martin Hoffmann
  • Peter Ulbrich
  • Christian Dietrich
  • Horst Schirmeier
  • Daniel Lohmann
  • Wolfgang Schroder-Preikschat

External Research Organisations

  • Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU Erlangen-Nürnberg)
  • TU Dortmund University
View graph of relations

Details

Original languageEnglish
Title of host publication2014 IEEE 15th International Symposium on High-Assurance Systems Engineering
Pages33-40
Number of pages8
ISBN (electronic)978-1-4799-3466-9, 978-1-4799-3465-2
Publication statusPublished - 6 Mar 2014
Externally publishedYes
Event2014 IEEE 15th International Symposium on High-Assurance Systems Engineering, HASE 2014 - Miami, FL, United States
Duration: 9 Jan 201411 Jan 2014

Abstract

Arithmetic error coding schemes (AN codes) are a well known and effective technique for soft error mitigation. Although coding theory being a rich area of mathematics, their implementation seems to be fairly easy. However, compliance with the theory can be lost easily while moving towards an actual implementation - finally jeopardizing the aspired fault-tolerance characteristics. In this paper, we present our experiences and lessons learned from implementing AN codes in the Cored dependable voter. We focus on the challenges and pitfalls in the transition from maths to machine code for a binary computer from a systems perspective. Our results show, that practical misconceptions (such as the use of prime numbers) and architecture-dependent implementation glitches occur on every stage of this transition. We identify typical pitfalls and describe practical measures to find and resolve them. Our measures eliminate all remaining SDCs in the Cored voter, which is validated by an extensive fault-injection campaign that covers 100 percent of the fault space for 1-bit and 2-bit errors.

Keywords

    AN code, Arithmetic error coding, Fault injection, Redundancy, Soft errors, Software-based fault tolerance

ASJC Scopus subject areas

Cite this

A Practitioner’s Guide to Software-based Soft-Error Mitigation Using AN-Codes. / Hoffmann, Martin; Ulbrich, Peter; Dietrich, Christian et al.
2014 IEEE 15th International Symposium on High-Assurance Systems Engineering. 2014. p. 33-40.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Hoffmann, M, Ulbrich, P, Dietrich, C, Schirmeier, H, Lohmann, D & Schroder-Preikschat, W 2014, A Practitioner’s Guide to Software-based Soft-Error Mitigation Using AN-Codes. in 2014 IEEE 15th International Symposium on High-Assurance Systems Engineering. pp. 33-40, 2014 IEEE 15th International Symposium on High-Assurance Systems Engineering, HASE 2014, Miami, FL, United States, 9 Jan 2014. https://doi.org/10.1109/hase.2014.14
Hoffmann, M., Ulbrich, P., Dietrich, C., Schirmeier, H., Lohmann, D., & Schroder-Preikschat, W. (2014). A Practitioner’s Guide to Software-based Soft-Error Mitigation Using AN-Codes. In 2014 IEEE 15th International Symposium on High-Assurance Systems Engineering (pp. 33-40) https://doi.org/10.1109/hase.2014.14
Hoffmann M, Ulbrich P, Dietrich C, Schirmeier H, Lohmann D, Schroder-Preikschat W. A Practitioner’s Guide to Software-based Soft-Error Mitigation Using AN-Codes. In 2014 IEEE 15th International Symposium on High-Assurance Systems Engineering. 2014. p. 33-40 doi: 10.1109/hase.2014.14
Hoffmann, Martin ; Ulbrich, Peter ; Dietrich, Christian et al. / A Practitioner’s Guide to Software-based Soft-Error Mitigation Using AN-Codes. 2014 IEEE 15th International Symposium on High-Assurance Systems Engineering. 2014. pp. 33-40
Download
@inproceedings{9ebb9843109e438aa7d42e02d706f2dc,
title = "A Practitioner{\textquoteright}s Guide to Software-based Soft-Error Mitigation Using AN-Codes",
abstract = "Arithmetic error coding schemes (AN codes) are a well known and effective technique for soft error mitigation. Although coding theory being a rich area of mathematics, their implementation seems to be fairly easy. However, compliance with the theory can be lost easily while moving towards an actual implementation - finally jeopardizing the aspired fault-tolerance characteristics. In this paper, we present our experiences and lessons learned from implementing AN codes in the Cored dependable voter. We focus on the challenges and pitfalls in the transition from maths to machine code for a binary computer from a systems perspective. Our results show, that practical misconceptions (such as the use of prime numbers) and architecture-dependent implementation glitches occur on every stage of this transition. We identify typical pitfalls and describe practical measures to find and resolve them. Our measures eliminate all remaining SDCs in the Cored voter, which is validated by an extensive fault-injection campaign that covers 100 percent of the fault space for 1-bit and 2-bit errors.",
keywords = "AN code, Arithmetic error coding, Fault injection, Redundancy, Soft errors, Software-based fault tolerance",
author = "Martin Hoffmann and Peter Ulbrich and Christian Dietrich and Horst Schirmeier and Daniel Lohmann and Wolfgang Schroder-Preikschat",
year = "2014",
month = mar,
day = "6",
doi = "10.1109/hase.2014.14",
language = "English",
pages = "33--40",
booktitle = "2014 IEEE 15th International Symposium on High-Assurance Systems Engineering",
note = "2014 IEEE 15th International Symposium on High-Assurance Systems Engineering, HASE 2014 ; Conference date: 09-01-2014 Through 11-01-2014",

}

Download

TY - GEN

T1 - A Practitioner’s Guide to Software-based Soft-Error Mitigation Using AN-Codes

AU - Hoffmann, Martin

AU - Ulbrich, Peter

AU - Dietrich, Christian

AU - Schirmeier, Horst

AU - Lohmann, Daniel

AU - Schroder-Preikschat, Wolfgang

PY - 2014/3/6

Y1 - 2014/3/6

N2 - Arithmetic error coding schemes (AN codes) are a well known and effective technique for soft error mitigation. Although coding theory being a rich area of mathematics, their implementation seems to be fairly easy. However, compliance with the theory can be lost easily while moving towards an actual implementation - finally jeopardizing the aspired fault-tolerance characteristics. In this paper, we present our experiences and lessons learned from implementing AN codes in the Cored dependable voter. We focus on the challenges and pitfalls in the transition from maths to machine code for a binary computer from a systems perspective. Our results show, that practical misconceptions (such as the use of prime numbers) and architecture-dependent implementation glitches occur on every stage of this transition. We identify typical pitfalls and describe practical measures to find and resolve them. Our measures eliminate all remaining SDCs in the Cored voter, which is validated by an extensive fault-injection campaign that covers 100 percent of the fault space for 1-bit and 2-bit errors.

AB - Arithmetic error coding schemes (AN codes) are a well known and effective technique for soft error mitigation. Although coding theory being a rich area of mathematics, their implementation seems to be fairly easy. However, compliance with the theory can be lost easily while moving towards an actual implementation - finally jeopardizing the aspired fault-tolerance characteristics. In this paper, we present our experiences and lessons learned from implementing AN codes in the Cored dependable voter. We focus on the challenges and pitfalls in the transition from maths to machine code for a binary computer from a systems perspective. Our results show, that practical misconceptions (such as the use of prime numbers) and architecture-dependent implementation glitches occur on every stage of this transition. We identify typical pitfalls and describe practical measures to find and resolve them. Our measures eliminate all remaining SDCs in the Cored voter, which is validated by an extensive fault-injection campaign that covers 100 percent of the fault space for 1-bit and 2-bit errors.

KW - AN code

KW - Arithmetic error coding

KW - Fault injection

KW - Redundancy

KW - Soft errors

KW - Software-based fault tolerance

UR - http://www.scopus.com/inward/record.url?scp=84898643057&partnerID=8YFLogxK

U2 - 10.1109/hase.2014.14

DO - 10.1109/hase.2014.14

M3 - Conference contribution

AN - SCOPUS:84898643057

SP - 33

EP - 40

BT - 2014 IEEE 15th International Symposium on High-Assurance Systems Engineering

T2 - 2014 IEEE 15th International Symposium on High-Assurance Systems Engineering, HASE 2014

Y2 - 9 January 2014 through 11 January 2014

ER -