Fault Tolerance based on Time-Staggered Redundancy

Echtle, Klaus

doi:10.1007/978-3-642-45628-2_31

Klaus Echtle³

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 147))

61 Accesses
4 Citations

Abstract

A new class of fault tolerance techniques is introduced: Time-staggered redundancy is a modification of static redundancy (replication of processes and fault masking). Some of the replicas are executed in parallel, others with an adjustable delay. The latter contribute to n-out-of-m majority voting as usual, and to backward error recovery, too. The delayed processes represent former state information of the process system and therefore can be taken as a recovery point. Staggered execution of process copies enables the concepts of static and dynamic redundancy at a time — without additional checkpointing overhead. As comparison tests and acceptance tests can be applied both, a higher degree of fault tolerance is achieved. Moreover, testing the results of the early processes detects when wrong input data have been processed. In this case improved input data are requested for the late processes. Finally correct output data are chosen among the results of all processes (early and late ones). Time-staggered redundancy should be preferred if multiple faults of different types have to be tolerated, and if time redundancy is limited, but sufficient for delayed process execution. In contrast to periodic or event-driven checkpointing, available time redundancy can be used completely for backward error recovery at any time: The late processes serve as “computing recovery points” with “continuous checkpointing”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. G. Akt On the Security of Compressed Encodings. Crypto 83, Conf. Proc., Plenum Press, New York, 1984, pp. 209–230.
Google Scholar
T. Anderson, P. A. Lee: Fault Tolerance — Principles and Practice. Prentice Hall, London, 1981.
Google Scholar
A. Avizienis et at The UCLA Dedix System: A Distributed Testbed for Multiple-Version Software. FTCS-15, Conf. Proc., IEEE, 1986, pp. 126-134.
Google Scholar
O. Babaoglu, R. Drummond, P. Stephenson: The Impact of Communication Network Properties on Reliable Broadcast Protocols. FTCS-16, Conf. Proc., IEEE, 1986, pp. 212-217.
Google Scholar
W. Bücken: Synchronisierung von Prozeßexemplaren bei zeitgestaffelter statischer Redundanz. Diplomarbeit, Fak. für Informatik, Univ. Karlsruhe, 1986.
Google Scholar
J. M. Chang, N. F. Maxemchuk: Reliable Broadcast Protocols. ACM ToCS 2, No. 3,1984, pp. 251–273.
Article Google Scholar
B. Chor, B. Coan: A Simple and Efficient Randomized Byzantine Agreement Algorithm. IEEE Trans. Softw. Eng. SE-11, No. 6, 1985, pp. 531–539.
Article MathSciNet Google Scholar
E C. Cooper: Replicated Distributed Programs. ACM Operating Systems Review 19, No. 5, 1985, pp. 53–78.
Article Google Scholar
F. Cristian, H Aghili, R. Strong: Atomic Broadcast: From simple Message Diffusion to Byzantine Agreement. FTCS-15, Conf. Proc., IEEE, 1985, pp. 200-206.
Google Scholar
M. Dal Cin et al: ATTEMPTO, a Fault-Tolerant Multiprocessor Working Station, Design and Concepts. FTCS-13, Conf. Proc., IEEE, 1983, pp. 10-13.
Google Scholar
Denning Cryptography and Data Security. Addison Wesley Publishing Company, London, 1982.
Google Scholar
F. Demmelmeier, W. Ries: Implementierung von anwendungsspezifischer Fehlertoleranz für Prozeßautomatisierungssysteme. IFB 54, Springer, Heidelberg, 1982, pp. 299–314.
Google Scholar
M. Dertinger: Vergleichende Bewertung von Fehlertoleranz-Verfahren aufgrund zeitgestaffelter statischer Redundanz. Diplomarbeit, Fak. für Informatik, Univ. Karlsruhe, 1986.
Google Scholar
K. Echtle: Fehlermaskierende verteilte Systeme zur Erfüllung hoher Zuverlässigkeitsanforderungen in Prozeßrechner-Netzen. IFB 78, Springer, Heidelberg, 1984, pp. 315–328.
Google Scholar
K. Echtle: Fehlermodellierung bei Simulation und Verifikation von Fehlertoleranz-Algorithmen für verteilte Systeme. IFB 83, Springer, Heidelberg, 1984, pp. 73–88.
Google Scholar
K. Echtle: Fehlermaskierung durch verteilte Systeme. PhD-Thesis, IFB 121, Springer, Heidelberg, 1986.
Book MATH Google Scholar
K. Echtle: Fault-Masking with Reduced Redundant Communication. FTCS-16, Conf. Proc, IEEE, 1986, pp. 178-183.
Google Scholar
K. Echtle: Fault Masking and Sequence Agreement by a Voting Protocol with Low Message Number. 6^th Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc. IEEE, 1987.
Google Scholar
R. A. Frohwerk: Signature Analysis: A New Digital Field Service Method Hewlett-Packard Journal, May 1977, pp. 2-8.
Google Scholar
P. Gunningberg Voting and Redundancy Management implemented by Protocols in Distributed Systems. FTCS-13, Conf. Proc., IEEE, 1983, pp. 182-185.
Google Scholar
R. Hofmann: Fehlerbehandlung bei zeitgestaffelter statischer Redundanz. Diplomarbeit, Fak. für Informatik, Univ. Karlsruhe, 1986.
Google Scholar
K. Küspert: Datenbank-Recovery und Fehlertoleranz in Datenbanksystemen. Newsletter of GI-NTG-GMA-Fachgruppe Fehlertolerierende Rechensysteme, Jan. 1986, pp. 4-19.
Google Scholar
L. Lamport, R. Shostak, M. Pease: The Byzantine Generals Problem. ACM ToPLaS 4, No. 3, 1982, pp. 382–401.
Article MATH Google Scholar
G. LeLann: Issues in Fault-Tolerant Real-Time Local Area Networks. 5^th Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc., IEEE, 1986, pp. 28-32.
Google Scholar
N. Lynch, M. Fischer, R. Fowler: A Simple and Efficient Byzantine Generals Algorithm. 2^nd Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc., IEEE, 1982, pp. 46-52.
Google Scholar
L. Mancini: Modular Redundancy in a Message Passing System. IEEE Trans. Softw. Eng SE-12, No. 1, 1986, pp. 79–86.
Google Scholar
F. Ptteli, H. Garcia-Molina: Database Processing with Triple Modular Redundancy. 5^th Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc., IEEE, 1986, pp. 95-103.
Google Scholar
M. L. Powell, D. L. Presotto: Publishing A Reliable Broadcast Communication Mechanism. ACM Operating Systems Review 17, No. 5, 1983, pp. 100–109.
Article Google Scholar
D. Pradham, S. M. Reddy: A Fault-Tolerant Communication Architecture for Distributed Systems. FTCS-11, Conf. Proc., IEEE, 1981, pp. 214-220.
Google Scholar
R. K. Scott, J. W. Gault, D. F. McAllister: The consensus recovery block. Total systems reliability symposium, U. S. National Bureau of Standards NBS, Gaithersburg 12 /1983, pp. 74-85.
Google Scholar
F. B. Schneiden Byzantine Generals in Action: Implementing Fail-Stop Processors. ACM ToCS 2, No. 2, 1984, pp. 145–154.
Article Google Scholar
H. R. Strong D. Dolev: Byzantine Agreement. Comosac 83, Conf. Proc., IEEE 1983, pp. 77-81.
Google Scholar
N. Theuretzbacher: VOTRICS: Voting Triple Modular Computing System FTCS-16, Conf. Proc., IEEE, 1986, pp. 144-150.
Google Scholar
M. N. Wegman, L Carter: New Classes and Applications of Hash Functions. 20^th Annual Symp. on Foundations of Computer Science, Conf. Proc, 1979, pp. 175-182.
Google Scholar
G. York, D. Siewiorek, Z. Segall: Software-Voting in Asynchronous NMR Computer Structures. Int. Report CMU CS 83 128, Carnegie-Melon Uni, 1983.
Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Informatik IV, Universität Karlsruhe, Zirkel 2, D 7500, Karlsruhe, Germany
Klaus Echtle

Authors

Klaus Echtle
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fachbereich 2, Hochschule Bremerhaven, Bürgermeister-Smidt-Straße 20, D-2850, Bremerhaven, Germany
F. Belli
Institut für Rechnerentwurf und Fehlertoleranz Fakultät für Informatik, Universität Karlsruhe, Postfach 6980, D-7500, Karlsruhe 1, Germany
W. Görke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Echtle, K. (1987). Fault Tolerance based on Time-Staggered Redundancy. In: Belli, F., Görke, W. (eds) Fehlertolerierende Rechensysteme / Fault-Tolerant Computing Systems. Informatik-Fachberichte, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45628-2_31

Download citation

DOI: https://doi.org/10.1007/978-3-642-45628-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-18294-8
Online ISBN: 978-3-642-45628-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics