Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

View graph of relations

Details

Original languageEnglish
Title of host publicationEmbedded Computer Systems
Subtitle of host publicationArchitectures, Modeling, and Simulation - 23rd International Conference, SAMOS 2023, Proceedings
EditorsCristina Silvano, Marc Reichenbach, Christian Pilato
PublisherSpringer International Publishing AG
Pages19-32
Number of pages14
ISBN (electronic)978-3-031-46077-7
ISBN (print)978-3-031-46076-0
Publication statusPublished - 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14385 LNCS
ISSN (Print)0302-9743
ISSN (electronic)1611-3349

Abstract

Field-programmable gate array (FPGAs) in space applications come with the drawback of radiation effects, which inevitably will occur in devices of small process size. This also applies to the electronics of the Bose Einstein Condensate and Cold Atom Laboratory (BECCAL) apparatus, which will operate on the International Space Station (ISS) for several years. A total of more than 100 FPGAs distributed throughout the setup will be used for high-precision control of specialized sensors and actuators at nanosecond scale. On ISS, radiation effects must be taken into account, the functionality of the electronics must be monitored, and errors must be handled properly. Due to the large number of devices in BECCAL, commercial off-the-shelf (COTS) FPGAs are used, which are not radiation hardened. This paper describes the methods and measures used to mitigate the effects of radiation in an application specific COTS-FPGA-based communication network. Based on the firmware for a central communication network switch in BECCAL the steps are described to integrate redundancy into the design while optimizing the firmware to stay within the FPGA’s resource constraints. A redundant integrity checker module is developed that can notify preceding network devices of data and configuration bit errors. The firmware is validated and evaluated by injecting faults into data and configuration registers in simulation and real hardware. In the end, the FPGA resource usage of the firmware is reduced by more than half, enabling the use of dual modular redundancy (DMR) for the switching fabric. Together with the triple modular redundancy (TMR) protected integrity checker, this combination completely prevents silent data corruptions in the design as shown in simulation and by injecting faults into hardware using the Intel Fault Injection FPGA IP Core while staying within the resource limitation of a COTS FPGA.

Keywords

    commercial off-the-shelf, fault detection, field-programmable gate array, space application

ASJC Scopus subject areas

Cite this

Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit. / Oberschulte, Tim; Marten, Jakob; Blume, Holger.
Embedded Computer Systems: Architectures, Modeling, and Simulation - 23rd International Conference, SAMOS 2023, Proceedings. ed. / Cristina Silvano; Marc Reichenbach; Christian Pilato. Springer International Publishing AG, 2023. p. 19-32 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14385 LNCS).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Oberschulte, T, Marten, J & Blume, H 2023, Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit. in C Silvano, M Reichenbach & C Pilato (eds), Embedded Computer Systems: Architectures, Modeling, and Simulation - 23rd International Conference, SAMOS 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14385 LNCS, Springer International Publishing AG, pp. 19-32. https://doi.org/10.1007/978-3-031-46077-7_2
Oberschulte, T., Marten, J., & Blume, H. (2023). Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit. In C. Silvano, M. Reichenbach, & C. Pilato (Eds.), Embedded Computer Systems: Architectures, Modeling, and Simulation - 23rd International Conference, SAMOS 2023, Proceedings (pp. 19-32). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 14385 LNCS). Springer International Publishing AG. https://doi.org/10.1007/978-3-031-46077-7_2
Oberschulte T, Marten J, Blume H. Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit. In Silvano C, Reichenbach M, Pilato C, editors, Embedded Computer Systems: Architectures, Modeling, and Simulation - 23rd International Conference, SAMOS 2023, Proceedings. Springer International Publishing AG. 2023. p. 19-32. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). Epub 2023 Nov 7. doi: 10.1007/978-3-031-46077-7_2
Oberschulte, Tim ; Marten, Jakob ; Blume, Holger. / Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit. Embedded Computer Systems: Architectures, Modeling, and Simulation - 23rd International Conference, SAMOS 2023, Proceedings. editor / Cristina Silvano ; Marc Reichenbach ; Christian Pilato. Springer International Publishing AG, 2023. pp. 19-32 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Download
@inproceedings{3aad5faf34e14cffb96afe277e7b5272,
title = "Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit",
abstract = "Field-programmable gate array (FPGAs) in space applications come with the drawback of radiation effects, which inevitably will occur in devices of small process size. This also applies to the electronics of the Bose Einstein Condensate and Cold Atom Laboratory (BECCAL) apparatus, which will operate on the International Space Station (ISS) for several years. A total of more than 100 FPGAs distributed throughout the setup will be used for high-precision control of specialized sensors and actuators at nanosecond scale. On ISS, radiation effects must be taken into account, the functionality of the electronics must be monitored, and errors must be handled properly. Due to the large number of devices in BECCAL, commercial off-the-shelf (COTS) FPGAs are used, which are not radiation hardened. This paper describes the methods and measures used to mitigate the effects of radiation in an application specific COTS-FPGA-based communication network. Based on the firmware for a central communication network switch in BECCAL the steps are described to integrate redundancy into the design while optimizing the firmware to stay within the FPGA{\textquoteright}s resource constraints. A redundant integrity checker module is developed that can notify preceding network devices of data and configuration bit errors. The firmware is validated and evaluated by injecting faults into data and configuration registers in simulation and real hardware. In the end, the FPGA resource usage of the firmware is reduced by more than half, enabling the use of dual modular redundancy (DMR) for the switching fabric. Together with the triple modular redundancy (TMR) protected integrity checker, this combination completely prevents silent data corruptions in the design as shown in simulation and by injecting faults into hardware using the Intel Fault Injection FPGA IP Core while staying within the resource limitation of a COTS FPGA.",
keywords = "commercial off-the-shelf, fault detection, field-programmable gate array, space application",
author = "Tim Oberschulte and Jakob Marten and Holger Blume",
year = "2023",
doi = "10.1007/978-3-031-46077-7_2",
language = "English",
isbn = "978-3-031-46076-0",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer International Publishing AG",
pages = "19--32",
editor = "Cristina Silvano and Marc Reichenbach and Christian Pilato",
booktitle = "Embedded Computer Systems",
address = "Switzerland",

}

Download

TY - GEN

T1 - Fault Detection Mechanisms for COTS FPGA Systems Used in Low Earth Orbit

AU - Oberschulte, Tim

AU - Marten, Jakob

AU - Blume, Holger

PY - 2023

Y1 - 2023

N2 - Field-programmable gate array (FPGAs) in space applications come with the drawback of radiation effects, which inevitably will occur in devices of small process size. This also applies to the electronics of the Bose Einstein Condensate and Cold Atom Laboratory (BECCAL) apparatus, which will operate on the International Space Station (ISS) for several years. A total of more than 100 FPGAs distributed throughout the setup will be used for high-precision control of specialized sensors and actuators at nanosecond scale. On ISS, radiation effects must be taken into account, the functionality of the electronics must be monitored, and errors must be handled properly. Due to the large number of devices in BECCAL, commercial off-the-shelf (COTS) FPGAs are used, which are not radiation hardened. This paper describes the methods and measures used to mitigate the effects of radiation in an application specific COTS-FPGA-based communication network. Based on the firmware for a central communication network switch in BECCAL the steps are described to integrate redundancy into the design while optimizing the firmware to stay within the FPGA’s resource constraints. A redundant integrity checker module is developed that can notify preceding network devices of data and configuration bit errors. The firmware is validated and evaluated by injecting faults into data and configuration registers in simulation and real hardware. In the end, the FPGA resource usage of the firmware is reduced by more than half, enabling the use of dual modular redundancy (DMR) for the switching fabric. Together with the triple modular redundancy (TMR) protected integrity checker, this combination completely prevents silent data corruptions in the design as shown in simulation and by injecting faults into hardware using the Intel Fault Injection FPGA IP Core while staying within the resource limitation of a COTS FPGA.

AB - Field-programmable gate array (FPGAs) in space applications come with the drawback of radiation effects, which inevitably will occur in devices of small process size. This also applies to the electronics of the Bose Einstein Condensate and Cold Atom Laboratory (BECCAL) apparatus, which will operate on the International Space Station (ISS) for several years. A total of more than 100 FPGAs distributed throughout the setup will be used for high-precision control of specialized sensors and actuators at nanosecond scale. On ISS, radiation effects must be taken into account, the functionality of the electronics must be monitored, and errors must be handled properly. Due to the large number of devices in BECCAL, commercial off-the-shelf (COTS) FPGAs are used, which are not radiation hardened. This paper describes the methods and measures used to mitigate the effects of radiation in an application specific COTS-FPGA-based communication network. Based on the firmware for a central communication network switch in BECCAL the steps are described to integrate redundancy into the design while optimizing the firmware to stay within the FPGA’s resource constraints. A redundant integrity checker module is developed that can notify preceding network devices of data and configuration bit errors. The firmware is validated and evaluated by injecting faults into data and configuration registers in simulation and real hardware. In the end, the FPGA resource usage of the firmware is reduced by more than half, enabling the use of dual modular redundancy (DMR) for the switching fabric. Together with the triple modular redundancy (TMR) protected integrity checker, this combination completely prevents silent data corruptions in the design as shown in simulation and by injecting faults into hardware using the Intel Fault Injection FPGA IP Core while staying within the resource limitation of a COTS FPGA.

KW - commercial off-the-shelf

KW - fault detection

KW - field-programmable gate array

KW - space application

UR - http://www.scopus.com/inward/record.url?scp=85187706278&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-46077-7_2

DO - 10.1007/978-3-031-46077-7_2

M3 - Conference contribution

SN - 978-3-031-46076-0

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 19

EP - 32

BT - Embedded Computer Systems

A2 - Silvano, Cristina

A2 - Reichenbach, Marc

A2 - Pilato, Christian

PB - Springer International Publishing AG

ER -

By the same author(s)