Details
Original language | English |
---|---|
Title of host publication | The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
Pages | 648-653 |
Number of pages | 6 |
ISBN (electronic) | 9781479922338 |
Publication status | Published - 22 Sept 2014 |
Externally published | Yes |
Event | 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014 - Atlanta, United States Duration: 23 Jun 2014 → 26 Jun 2014 |
Abstract
State-machine replication has received widespread attention for the provisioning of highly available services in data centers. However, current production systems focus on tolerating crash faults only and prominent service outages caused by state corruptions have indicated that this is a risky strategy. In the future, state corruptions due to transient faults (such as bit flips) become even more likely, caused by ongoing hardware trends regarding the shrinking of structure sizes and reduction of operating voltages. In this paper we present Crosscheck, an approach to tolerate arbitrary state corruption (ASC) in the context of fault-tolerant replication of multithreaded services. Crosscheck is able to detect silent data corruptions ahead of execution, and by crosschecking state changes with co-executing replicas, even ASCs can be detected. Finally, fault tolerance is achieved by a fine-grained recovery using fault-free replicas. Our implementation is transparent to the application by utilizing fine-grained software-hardening mechanisms using aspect-oriented programming. To validate Crosscheck we present a replicated multithreaded key-value store that is resilient to state corruptions.
Keywords
- AspectC++, Determinism, Multithreading, Replication, Software Error Hardening
ASJC Scopus subject areas
- Computer Science(all)
- Computer Networks and Communications
- Computer Science(all)
- Hardware and Architecture
- Computer Science(all)
- Software
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 648-653 6903619.
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Crosscheck: Hardening replicated multithreaded services
AU - Martens, Arthur
AU - Borchert, Christoph
AU - Geissler, Tobias Oliver
AU - Lohmann, Daniel
AU - Spinczyk, Olaf
AU - Kapitza, Rudiger
PY - 2014/9/22
Y1 - 2014/9/22
N2 - State-machine replication has received widespread attention for the provisioning of highly available services in data centers. However, current production systems focus on tolerating crash faults only and prominent service outages caused by state corruptions have indicated that this is a risky strategy. In the future, state corruptions due to transient faults (such as bit flips) become even more likely, caused by ongoing hardware trends regarding the shrinking of structure sizes and reduction of operating voltages. In this paper we present Crosscheck, an approach to tolerate arbitrary state corruption (ASC) in the context of fault-tolerant replication of multithreaded services. Crosscheck is able to detect silent data corruptions ahead of execution, and by crosschecking state changes with co-executing replicas, even ASCs can be detected. Finally, fault tolerance is achieved by a fine-grained recovery using fault-free replicas. Our implementation is transparent to the application by utilizing fine-grained software-hardening mechanisms using aspect-oriented programming. To validate Crosscheck we present a replicated multithreaded key-value store that is resilient to state corruptions.
AB - State-machine replication has received widespread attention for the provisioning of highly available services in data centers. However, current production systems focus on tolerating crash faults only and prominent service outages caused by state corruptions have indicated that this is a risky strategy. In the future, state corruptions due to transient faults (such as bit flips) become even more likely, caused by ongoing hardware trends regarding the shrinking of structure sizes and reduction of operating voltages. In this paper we present Crosscheck, an approach to tolerate arbitrary state corruption (ASC) in the context of fault-tolerant replication of multithreaded services. Crosscheck is able to detect silent data corruptions ahead of execution, and by crosschecking state changes with co-executing replicas, even ASCs can be detected. Finally, fault tolerance is achieved by a fine-grained recovery using fault-free replicas. Our implementation is transparent to the application by utilizing fine-grained software-hardening mechanisms using aspect-oriented programming. To validate Crosscheck we present a replicated multithreaded key-value store that is resilient to state corruptions.
KW - AspectC++
KW - Determinism
KW - Multithreading
KW - Replication
KW - Software Error Hardening
UR - http://www.scopus.com/inward/record.url?scp=84937147584&partnerID=8YFLogxK
U2 - 10.1109/dsn.2014.98
DO - 10.1109/dsn.2014.98
M3 - Conference contribution
SP - 648
EP - 653
BT - The 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2014
Y2 - 23 June 2014 through 26 June 2014
ER -