Skip to main content

Towards a theory of replicated processing

  • Conference paper
  • First Online:
Formal Techniques in Real-Time and Fault-Tolerant Systems (FTRTFT 1988)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 331))

Abstract

In the N-Modular Redundancy (NMR) approach, a computation is made reliable by executing it on several computers, and determining its results by a decision algorithm. This paper investigates a formal approach to the use of NMR in replicated distributed systems, for which it introduces a notion of correctness based on consistency with their non-replicated counterpart, and a local correctness criterion. We discuss how a replicated system component may be implemented by N base copies, a majority of which is non-faulty. The formal approach sheds light on the necessity of coordinating the copies and on the requirements they should satisfy; in particular the difficulty of replicating synchronous communication is pointed out. A practical approach is also briefly examined and shown to be consistent with the formal model.

Inside every replicated system there is a non-replicated system trying to get out.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Avizienis, A., Kelly, J.K.J., “Fault tolerance by design diversity: concepts and experiments”, IEEE Computer, vol. 17, no. 8, pp. 67–80, Aug. 1984.

    Google Scholar 

  2. Bird, R. S., “The promotion and accumulation strategies in transformational programming”, ACM Transactions on Programming Languages and Systems, vol. 6, no. 4, Oct. 1984.

    Google Scholar 

  3. Cooper, E, “Replicated distributed programs”, Proc. of the 10th ACM Sym. on Operating Systems Principles, pp. 63–78, Washington, Dic. 1985.

    Google Scholar 

  4. Goldberg, J., “SIFT: A provable fault-tolerant computer for aircraft flight control”, Inform. Processing 80 Proc. IFIP Congr., pp. 151–156, Tokyo, Japan, Oct. 1980.

    Google Scholar 

  5. Hoare, C.A.R., “Communicating sequential processes”, Prentice Hall International, 1985.

    Google Scholar 

  6. Koutny, M., and Mancini, L., “Synchronizing events in replicated computations”, Technical Report TR/237, Computing Laboratory, University of Newcastle upon Tyne, June 1987 (to appear in The Journal of Systems and Software).

    Google Scholar 

  7. Lamport, L., “The implementation of reliable distributed multiprocess sustems”, Computer Networks, pp. 95–114, vol. 2, no. 2, May 1978.

    Article  Google Scholar 

  8. Lamport, L., “Time, clocks and the ordering of events in a distributed system”, Comm. ACM, vol. 21, no. 7, pp. 558–565, July 1978.

    Article  Google Scholar 

  9. Lamport, L., Shostak, R., Pease, M., “The Byzantine Generals problem”, ACM Transactions on Programming Languages and Systems, pp. 382–401, vol. 4, no. 3, July 1982.

    Article  Google Scholar 

  10. Lyons, R.E., Vanderkulk, W., “The use of triple-modular redundancy to improve computer reliability”, IBM Journal of Research and Development, pp. 200–209, vol. 6, no. 2, Apr. 1962.

    Google Scholar 

  11. Mancini, L., “Modular redundancy in a message passing system”, IEEE Trans. Software Eng., pp. 79–86, vol. SE-12, no. 1, Jan. 1986.

    Google Scholar 

  12. Mancini, L., Koutny, M., “Formal specification of N-modular redundancy”, 1986 ACM Computer Science Conference, pp. 199–204, Cincinnati, Ohio, Feb. 1986.

    Google Scholar 

  13. Mancini, L., Pappalardo, G., “The Join algorithm: ordering messages in replicated systems”, Safecomp '86, pp. 51–55, Sarlat, France, Oct. 1986.

    Google Scholar 

  14. Mancini, L., Pappalardo G., “On resolving nondeterminism in replicated distributed systems”, IFIP Conf. on Distributed Processing, Amsterdam, The Netherlands, Oct. 1987.

    Google Scholar 

  15. Mancini, L., Pappalardo G., “Proving correctness properties of a replicated synchronous program”, to appear in The Computer Journal.

    Google Scholar 

  16. Mancini, L., Shrivastava, S.K., “Exception handling in replicated systems with voting”, 16th Int. Conf. on Fault Tolerant Computing, pp. 384–389, Vienna, Austria, July 1986.

    Google Scholar 

  17. Melliar-Smith, P.M., Schwartz, R., “Formal specification and mechanical verification of SIFT: a fault-tolerant flight control system”, IEEE Trans. on Computers, vol. C-31, no. 7, pp. 616–630, July 1982.

    Google Scholar 

  18. Schneider, F.B., “Synchronization in distributed programs”, ACM Transactions on Programming Languages and Systems, vol. 4, no. 2, pp. 125–148, Apr. 1982.

    Article  Google Scholar 

  19. Schneider, F.B., “The state machine approach”, in Paul, M., and Siegert, H.J. (eds.), Distributed systems — methods and tools for specification, an advanced course, LNCS vol. 190, pp. 444–454, Springer-Verlag, 1985.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

M. Joseph

Rights and permissions

Reprints and permissions

Copyright information

© 1988 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mancini, L.V., Pappalardo, G. (1988). Towards a theory of replicated processing. In: Joseph, M. (eds) Formal Techniques in Real-Time and Fault-Tolerant Systems. FTRTFT 1988. Lecture Notes in Computer Science, vol 331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-50302-1_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-50302-1_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-50302-6

  • Online ISBN: 978-3-540-45965-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics