Crosscheck: Hardening replicated multithreaded services

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

  • Arthur Martens
  • Christoph Borchert
  • Tobias Oliver Geissler
  • Daniel Lohmann
  • Olaf Spinczyk
  • Rudiger Kapitza

External Research Organisations

  • Technische Universität Braunschweig
  • TU Dortmund University
  • Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU Erlangen-Nürnberg)
View graph of relations

Details

Original languageEnglish
Title of host publicationThe 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages648-653
Number of pages6
ISBN (electronic)9781479922338
Publication statusPublished - 22 Sept 2014
Externally publishedYes
Event44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014 - Atlanta, United States
Duration: 23 Jun 201426 Jun 2014

Abstract

State-machine replication has received widespread attention for the provisioning of highly available services in data centers. However, current production systems focus on tolerating crash faults only and prominent service outages caused by state corruptions have indicated that this is a risky strategy. In the future, state corruptions due to transient faults (such as bit flips) become even more likely, caused by ongoing hardware trends regarding the shrinking of structure sizes and reduction of operating voltages. In this paper we present Crosscheck, an approach to tolerate arbitrary state corruption (ASC) in the context of fault-tolerant replication of multithreaded services. Crosscheck is able to detect silent data corruptions ahead of execution, and by crosschecking state changes with co-executing replicas, even ASCs can be detected. Finally, fault tolerance is achieved by a fine-grained recovery using fault-free replicas. Our implementation is transparent to the application by utilizing fine-grained software-hardening mechanisms using aspect-oriented programming. To validate Crosscheck we present a replicated multithreaded key-value store that is resilient to state corruptions.

Keywords

    AspectC++, Determinism, Multithreading, Replication, Software Error Hardening

ASJC Scopus subject areas

Cite this

Crosscheck: Hardening replicated multithreaded services. / Martens, Arthur; Borchert, Christoph; Geissler, Tobias Oliver et al.
The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 648-653 6903619.

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Martens, A, Borchert, C, Geissler, TO, Lohmann, D, Spinczyk, O & Kapitza, R 2014, Crosscheck: Hardening replicated multithreaded services. in The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014., 6903619, Institute of Electrical and Electronics Engineers Inc., pp. 648-653, 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014, Atlanta, United States, 23 Jun 2014. https://doi.org/10.1109/dsn.2014.98
Martens, A., Borchert, C., Geissler, T. O., Lohmann, D., Spinczyk, O., & Kapitza, R. (2014). Crosscheck: Hardening replicated multithreaded services. In The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014 (pp. 648-653). Article 6903619 Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/dsn.2014.98
Martens A, Borchert C, Geissler TO, Lohmann D, Spinczyk O, Kapitza R. Crosscheck: Hardening replicated multithreaded services. In The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 648-653. 6903619 doi: 10.1109/dsn.2014.98
Martens, Arthur ; Borchert, Christoph ; Geissler, Tobias Oliver et al. / Crosscheck: Hardening replicated multithreaded services. The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 648-653
Download
@inproceedings{d8b68d615faf438f9802112abbd9a8b1,
title = "Crosscheck: Hardening replicated multithreaded services",
abstract = "State-machine replication has received widespread attention for the provisioning of highly available services in data centers. However, current production systems focus on tolerating crash faults only and prominent service outages caused by state corruptions have indicated that this is a risky strategy. In the future, state corruptions due to transient faults (such as bit flips) become even more likely, caused by ongoing hardware trends regarding the shrinking of structure sizes and reduction of operating voltages. In this paper we present Crosscheck, an approach to tolerate arbitrary state corruption (ASC) in the context of fault-tolerant replication of multithreaded services. Crosscheck is able to detect silent data corruptions ahead of execution, and by crosschecking state changes with co-executing replicas, even ASCs can be detected. Finally, fault tolerance is achieved by a fine-grained recovery using fault-free replicas. Our implementation is transparent to the application by utilizing fine-grained software-hardening mechanisms using aspect-oriented programming. To validate Crosscheck we present a replicated multithreaded key-value store that is resilient to state corruptions.",
keywords = "AspectC++, Determinism, Multithreading, Replication, Software Error Hardening",
author = "Arthur Martens and Christoph Borchert and Geissler, {Tobias Oliver} and Daniel Lohmann and Olaf Spinczyk and Rudiger Kapitza",
year = "2014",
month = sep,
day = "22",
doi = "10.1109/dsn.2014.98",
language = "English",
pages = "648--653",
booktitle = "The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",
note = "44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014 ; Conference date: 23-06-2014 Through 26-06-2014",

}

Download

TY - GEN

T1 - Crosscheck: Hardening replicated multithreaded services

AU - Martens, Arthur

AU - Borchert, Christoph

AU - Geissler, Tobias Oliver

AU - Lohmann, Daniel

AU - Spinczyk, Olaf

AU - Kapitza, Rudiger

PY - 2014/9/22

Y1 - 2014/9/22

N2 - State-machine replication has received widespread attention for the provisioning of highly available services in data centers. However, current production systems focus on tolerating crash faults only and prominent service outages caused by state corruptions have indicated that this is a risky strategy. In the future, state corruptions due to transient faults (such as bit flips) become even more likely, caused by ongoing hardware trends regarding the shrinking of structure sizes and reduction of operating voltages. In this paper we present Crosscheck, an approach to tolerate arbitrary state corruption (ASC) in the context of fault-tolerant replication of multithreaded services. Crosscheck is able to detect silent data corruptions ahead of execution, and by crosschecking state changes with co-executing replicas, even ASCs can be detected. Finally, fault tolerance is achieved by a fine-grained recovery using fault-free replicas. Our implementation is transparent to the application by utilizing fine-grained software-hardening mechanisms using aspect-oriented programming. To validate Crosscheck we present a replicated multithreaded key-value store that is resilient to state corruptions.

AB - State-machine replication has received widespread attention for the provisioning of highly available services in data centers. However, current production systems focus on tolerating crash faults only and prominent service outages caused by state corruptions have indicated that this is a risky strategy. In the future, state corruptions due to transient faults (such as bit flips) become even more likely, caused by ongoing hardware trends regarding the shrinking of structure sizes and reduction of operating voltages. In this paper we present Crosscheck, an approach to tolerate arbitrary state corruption (ASC) in the context of fault-tolerant replication of multithreaded services. Crosscheck is able to detect silent data corruptions ahead of execution, and by crosschecking state changes with co-executing replicas, even ASCs can be detected. Finally, fault tolerance is achieved by a fine-grained recovery using fault-free replicas. Our implementation is transparent to the application by utilizing fine-grained software-hardening mechanisms using aspect-oriented programming. To validate Crosscheck we present a replicated multithreaded key-value store that is resilient to state corruptions.

KW - AspectC++

KW - Determinism

KW - Multithreading

KW - Replication

KW - Software Error Hardening

UR - http://www.scopus.com/inward/record.url?scp=84937147584&partnerID=8YFLogxK

U2 - 10.1109/dsn.2014.98

DO - 10.1109/dsn.2014.98

M3 - Conference contribution

SP - 648

EP - 653

BT - The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014

Y2 - 23 June 2014 through 26 June 2014

ER -