Skip to main content

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 147))

Abstract

A new class of fault tolerance techniques is introduced: Time-staggered redundancy is a modification of static redundancy (replication of processes and fault masking). Some of the replicas are executed in parallel, others with an adjustable delay. The latter contribute to n-out-of-m majority voting as usual, and to backward error recovery, too. The delayed processes represent former state information of the process system and therefore can be taken as a recovery point. Staggered execution of process copies enables the concepts of static and dynamic redundancy at a time — without additional checkpointing overhead. As comparison tests and acceptance tests can be applied both, a higher degree of fault tolerance is achieved. Moreover, testing the results of the early processes detects when wrong input data have been processed. In this case improved input data are requested for the late processes. Finally correct output data are chosen among the results of all processes (early and late ones). Time-staggered redundancy should be preferred if multiple faults of different types have to be tolerated, and if time redundancy is limited, but sufficient for delayed process execution. In contrast to periodic or event-driven checkpointing, available time redundancy can be used completely for backward error recovery at any time: The late processes serve as “computing recovery points” with “continuous checkpointing”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. G. Akt On the Security of Compressed Encodings. Crypto 83, Conf. Proc., Plenum Press, New York, 1984, pp. 209–230.

    Google Scholar 

  2. T. Anderson, P. A. Lee: Fault Tolerance — Principles and Practice. Prentice Hall, London, 1981.

    Google Scholar 

  3. A. Avizienis et at The UCLA Dedix System: A Distributed Testbed for Multiple-Version Software. FTCS-15, Conf. Proc., IEEE, 1986, pp. 126-134.

    Google Scholar 

  4. O. Babaoglu, R. Drummond, P. Stephenson: The Impact of Communication Network Properties on Reliable Broadcast Protocols. FTCS-16, Conf. Proc., IEEE, 1986, pp. 212-217.

    Google Scholar 

  5. W. Bücken: Synchronisierung von Prozeßexemplaren bei zeitgestaffelter statischer Redundanz. Diplomarbeit, Fak. für Informatik, Univ. Karlsruhe, 1986.

    Google Scholar 

  6. J. M. Chang, N. F. Maxemchuk: Reliable Broadcast Protocols. ACM ToCS 2, No. 3,1984, pp. 251–273.

    Article  Google Scholar 

  7. B. Chor, B. Coan: A Simple and Efficient Randomized Byzantine Agreement Algorithm. IEEE Trans. Softw. Eng. SE-11, No. 6, 1985, pp. 531–539.

    Article  MathSciNet  Google Scholar 

  8. E C. Cooper: Replicated Distributed Programs. ACM Operating Systems Review 19, No. 5, 1985, pp. 53–78.

    Article  Google Scholar 

  9. F. Cristian, H Aghili, R. Strong: Atomic Broadcast: From simple Message Diffusion to Byzantine Agreement. FTCS-15, Conf. Proc., IEEE, 1985, pp. 200-206.

    Google Scholar 

  10. M. Dal Cin et al: ATTEMPTO, a Fault-Tolerant Multiprocessor Working Station, Design and Concepts. FTCS-13, Conf. Proc., IEEE, 1983, pp. 10-13.

    Google Scholar 

  11. Denning Cryptography and Data Security. Addison Wesley Publishing Company, London, 1982.

    Google Scholar 

  12. F. Demmelmeier, W. Ries: Implementierung von anwendungsspezifischer Fehlertoleranz für Prozeßautomatisierungssysteme. IFB 54, Springer, Heidelberg, 1982, pp. 299–314.

    Google Scholar 

  13. M. Dertinger: Vergleichende Bewertung von Fehlertoleranz-Verfahren aufgrund zeitgestaffelter statischer Redundanz. Diplomarbeit, Fak. für Informatik, Univ. Karlsruhe, 1986.

    Google Scholar 

  14. K. Echtle: Fehlermaskierende verteilte Systeme zur Erfüllung hoher Zuverlässigkeitsanforderungen in Prozeßrechner-Netzen. IFB 78, Springer, Heidelberg, 1984, pp. 315–328.

    Google Scholar 

  15. K. Echtle: Fehlermodellierung bei Simulation und Verifikation von Fehlertoleranz-Algorithmen für verteilte Systeme. IFB 83, Springer, Heidelberg, 1984, pp. 73–88.

    Google Scholar 

  16. K. Echtle: Fehlermaskierung durch verteilte Systeme. PhD-Thesis, IFB 121, Springer, Heidelberg, 1986.

    Book  MATH  Google Scholar 

  17. K. Echtle: Fault-Masking with Reduced Redundant Communication. FTCS-16, Conf. Proc, IEEE, 1986, pp. 178-183.

    Google Scholar 

  18. K. Echtle: Fault Masking and Sequence Agreement by a Voting Protocol with Low Message Number. 6th Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc. IEEE, 1987.

    Google Scholar 

  19. R. A. Frohwerk: Signature Analysis: A New Digital Field Service Method Hewlett-Packard Journal, May 1977, pp. 2-8.

    Google Scholar 

  20. P. Gunningberg Voting and Redundancy Management implemented by Protocols in Distributed Systems. FTCS-13, Conf. Proc., IEEE, 1983, pp. 182-185.

    Google Scholar 

  21. R. Hofmann: Fehlerbehandlung bei zeitgestaffelter statischer Redundanz. Diplomarbeit, Fak. für Informatik, Univ. Karlsruhe, 1986.

    Google Scholar 

  22. K. Küspert: Datenbank-Recovery und Fehlertoleranz in Datenbanksystemen. Newsletter of GI-NTG-GMA-Fachgruppe Fehlertolerierende Rechensysteme, Jan. 1986, pp. 4-19.

    Google Scholar 

  23. L. Lamport, R. Shostak, M. Pease: The Byzantine Generals Problem. ACM ToPLaS 4, No. 3, 1982, pp. 382–401.

    Article  MATH  Google Scholar 

  24. G. LeLann: Issues in Fault-Tolerant Real-Time Local Area Networks. 5th Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc., IEEE, 1986, pp. 28-32.

    Google Scholar 

  25. N. Lynch, M. Fischer, R. Fowler: A Simple and Efficient Byzantine Generals Algorithm. 2nd Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc., IEEE, 1982, pp. 46-52.

    Google Scholar 

  26. L. Mancini: Modular Redundancy in a Message Passing System. IEEE Trans. Softw. Eng SE-12, No. 1, 1986, pp. 79–86.

    Google Scholar 

  27. F. Ptteli, H. Garcia-Molina: Database Processing with Triple Modular Redundancy. 5th Symp. on Reliability in Distr. Software and Database Systems, Conf. Proc., IEEE, 1986, pp. 95-103.

    Google Scholar 

  28. M. L. Powell, D. L. Presotto: Publishing A Reliable Broadcast Communication Mechanism. ACM Operating Systems Review 17, No. 5, 1983, pp. 100–109.

    Article  Google Scholar 

  29. D. Pradham, S. M. Reddy: A Fault-Tolerant Communication Architecture for Distributed Systems. FTCS-11, Conf. Proc., IEEE, 1981, pp. 214-220.

    Google Scholar 

  30. R. K. Scott, J. W. Gault, D. F. McAllister: The consensus recovery block. Total systems reliability symposium, U. S. National Bureau of Standards NBS, Gaithersburg 12 /1983, pp. 74-85.

    Google Scholar 

  31. F. B. Schneiden Byzantine Generals in Action: Implementing Fail-Stop Processors. ACM ToCS 2, No. 2, 1984, pp. 145–154.

    Article  Google Scholar 

  32. H. R. Strong D. Dolev: Byzantine Agreement. Comosac 83, Conf. Proc., IEEE 1983, pp. 77-81.

    Google Scholar 

  33. N. Theuretzbacher: VOTRICS: Voting Triple Modular Computing System FTCS-16, Conf. Proc., IEEE, 1986, pp. 144-150.

    Google Scholar 

  34. M. N. Wegman, L Carter: New Classes and Applications of Hash Functions. 20th Annual Symp. on Foundations of Computer Science, Conf. Proc, 1979, pp. 175-182.

    Google Scholar 

  35. G. York, D. Siewiorek, Z. Segall: Software-Voting in Asynchronous NMR Computer Structures. Int. Report CMU CS 83 128, Carnegie-Melon Uni, 1983.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1987 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Echtle, K. (1987). Fault Tolerance based on Time-Staggered Redundancy. In: Belli, F., Görke, W. (eds) Fehlertolerierende Rechensysteme / Fault-Tolerant Computing Systems. Informatik-Fachberichte, vol 147. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45628-2_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45628-2_31

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-18294-8

  • Online ISBN: 978-3-642-45628-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics