Skip to main content

Agreement Problems in Fault-Tolerant Distributed Systems

  • Conference paper
  • First Online:
SOFSEM 2001: Theory and Practice of Informatics (SOFSEM 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2234))

Abstract

Reaching agreement in a distributed system is a fundamental issue of both theoretical and practical importance. Consensus, Atomic Commitment, Atomic Broadcast, Group Membership which are different versions of this paradigmunderly much of existing fault-tolerant distributed systems. We describe these problems, explain their relationships, and state some fundamental results on their solvability, depending on the system model. We then review and compare basic techniques to circumvent impossibility results in asynchronous systems: randomization, models of partial synchrony, unreliable failure detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Y. Afek, H. Attiya, A. D. Fekete, M. Fischer, N. Lynch, Y. Mansour, D. Wang, and L. Zuck. Reliable communication over unreliable channels. Journal of the ACM, 41(6):1267–1297, 1994.

    Article  MathSciNet  Google Scholar 

  2. MarkosT Aguilera and Sam Toueg. A simple bivalency-based proof that t-resilient consensus requires t + 1 rounds. Information Processing Letters, 71(4):155–158, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  3. Yair Amir, Danny Dolev, Shlomo Kramer, and Dalia Malki. Membership algorithms for multicast communication groups. In Proceedings of the Sixth International Workshop on Distributed Algorithms, volume 647 of Lecture Notes on Computer Science, pages 292–312. Springer-Verlag, November 1992.

    Google Scholar 

  4. Emmanuelle Anceaume, Bernadette Charron-Bost, Pascale Minet, and Sam Toueg. On the formal specification of group membership services. Technical report, INRIA, Rocquencourt, July 1995.

    Google Scholar 

  5. Özalp Babaoğlu, Renzo Davoli, Luigi-Alberto Giachini, and Mary Gray Baker. RELACS: a communications infrastructure for constructing reliable applications in large-scale distributed systems. BROADCAST Project deliverable report, 1994. Department of Computing Science, University of Newcastle upon Tyne, UK.

    Google Scholar 

  6. Z. Bar-Joseh and Michael Ben-Or. A tight lower bound for randomized synchronous consensus. In Proceedings of the Seventeenth ACM Symposium on Principles of Distributed Computing, pages 193–199, August 1998.

    Google Scholar 

  7. K. A. Bartlett, R. A. Scantlebury, and P. T. Wilkinson. A note on reliable fullduplex transmission over half-duplex links. Communication of the ACM, 12(5):260–261, 1969.

    Article  Google Scholar 

  8. A. Basu, B. Charron-Bost, and S. Toueg. Simulating reliable links with unreliable links in the presence of process crashes. In Ö. Babaoğlu and K. Marzullo, editors, Proceedings of the Tenth International Workshop on Distributed Algorithms, volume 1151 of Lecture Notes on Computer Science, pages 105–122. Springer-Verlag, October 1996.

    Google Scholar 

  9. Michael Ben-Or. Another advantage of free choice: Completely asynchronous agreement protocols. In Proceedings of the Second ACM Symposium on Principles of Distributed Computing, pages 27–30, August 1983.

    Google Scholar 

  10. P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987.

    Google Scholar 

  11. Gabriel Bracha and Sam Toueg. Asynchronous consensus and broadcast protocols. Journal of the ACM, 32(4):824–840, October 1985.

    Article  MathSciNet  Google Scholar 

  12. T. D. Chandra, V. Hadzilacos, and S. Toueg. The weakest failure detector for solving consensus. Journal of the ACM, 43(4):685–722, July 1996.

    Article  MATH  MathSciNet  Google Scholar 

  13. T. D. Chandra and S. Toueg. Unreliable failure detectors for asynchronous systems. Journal of the ACM, 43(2):225–267, March 1996.

    Article  MATH  MathSciNet  Google Scholar 

  14. Tushar Deepak Chandra, Vassos Hadzilacos, Sam Toueg, and Bernadette Charron-Bost. On the impossibility of group membership. In Proceedings of the 15th ACM Symposium on Principles of Distributed Computing, pages 322–330, Philadelphia, Pennsylvania, USA, May 1996.

    Google Scholar 

  15. B. Charron-Bost and A. Schiper. Reliable broadcast is not so easy. Unpublished manuscript., July 2000.

    Google Scholar 

  16. B. Charron-Bost and A. Schiper. Uniformconsensus is harder than consensus. Technical Report DSC/2000/028, Département Systèmes de Communication, EPFL, May 2000.

    Google Scholar 

  17. Bernadette Charron-Bost. The weakest failure detector for solving atomic commitment. In preparation, July 2001.

    Google Scholar 

  18. Bernadette Charron-Bost and Sam Toueg. Comparing the atomic commitment and consensus problems. In preparation, January 2001.

    Google Scholar 

  19. Benny Chor and Cynthia Dwork. Randomization in byzantine agreement. Advances in Computer Research, 5:443–497, 1989.

    Google Scholar 

  20. Flaviu Cristian. Reaching agreement on processor group membership in synchronous distributed systems. Distributed Computing, 4(4):175–187, April 1991.

    Article  MATH  Google Scholar 

  21. D. Dolev, C. Dwork, and L. Stockmeyer. On the minimal synchronism needed for distributed consensus. Journal of the ACM, 34(1):77–97, January 1987.

    Article  MATH  MathSciNet  Google Scholar 

  22. Danny Dolev, Rüdiger Reischuk, and H. Raymond Strong. Early stopping in Byzantine agreement. Journal of the ACM, 37(4):720–741, October 1990.

    Article  MATH  MathSciNet  Google Scholar 

  23. C. Dwork, N. A. Lynch, and L. Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM, 35(2):288–323, April 1988.

    Article  MathSciNet  Google Scholar 

  24. C. Dwork and D. Skeen. Patterns of communication in consensus protocols. In Proceedings of the 3rd Annual ACM Symposium on Principles of Distributed Computing, pages 143–153, August 1984.

    Google Scholar 

  25. Cynthia Dwork and Yoram Moses. Knowledge and common knowledge in a Byzantine environment: Crash failures. Information and Computation, 88(2):156–186, October 1990.

    Article  MATH  MathSciNet  Google Scholar 

  26. Paul D. Ezhilchelvan, Raimundo A. Macědo, and Santosh K. Shrivastava. Newtop: a fault-tolerant group communication protocol. In Proceedings of the 15th International Conference on Distributed Computing Systems, Vancouver, BC, Canada, June 1995.

    Google Scholar 

  27. M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2):374–382, April 1985.

    Article  MATH  MathSciNet  Google Scholar 

  28. Michael J. Fischer and Nancy A. Lynch. A lower bound for the time to assure interactive consistency. Information Processing Letters, 14:183–186, 1982.

    Article  MATH  MathSciNet  Google Scholar 

  29. James N. Gray. Notes on database operating systems. In R. Bayer, R. M. Graham, and G. Seegmuller, editors, Operating Systems: An Advanced Course, volume 66 of Lecture Notes on Computer Science. Springer-Verlag, 1978. Also appears as IBM Research Laboratory Technical report RJ2188.

    Google Scholar 

  30. V. Hadzilacos and S. Toueg. A modular approach to fault-tolerant broadcasts and related problems. Technical ReportTR 94-1425, Cornell University, Dept. of Computer Science, May 1994.

    Google Scholar 

  31. Vassos Hadzilacos. On the relationship between the atomic commitment and consensus problems. Workshop on Fault-Tolerant Distributed Computing, March 17-19, 1986, Pacific Grove, CA. Lecture Notes in Computer Science, Vol. 448. Springer-Verlag., 1986.

    Google Scholar 

  32. Matti A. Hiltunen and Richard D. Schlichting. Properties of membership services. In Proceedings of the Second International Symposium on Autonomous Decentralized Systems, Phoenix, AZ, April 1995.

    Google Scholar 

  33. Farnam Jahanian, Sameh Fakhouri, and Ragunathan Rajkumar. Processor group membership protocols: specification, design and implementation. In Proceeding of the Twelfth IEEE Symposium on Reliable Distributed Systems, pages 2–11, Princeton, October 1993.

    Google Scholar 

  34. M. Frans Kaashoek and Andrew S. Tanenbaum. Group communication in the amoeba distributed operating system. In Proceedings of the Eleventh International Conference on Distributed Computer Systems, pages 222–230, Arlington, TX, May 1991.

    Google Scholar 

  35. L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558–565, July 1978.

    Article  MATH  Google Scholar 

  36. Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems, 4(3):382–401, July 1982.

    Article  MATH  Google Scholar 

  37. N. A. Lynch. Distributed Algorithms. Morgan Kaufmann, 1996.

    Google Scholar 

  38. P.M. Melliar-Smith, Louise Moser, and Vivek Agrawala. Processor membership in asynchronous distributed systems. IEEE Transactions on Parallel and Distributed Systems, 5(5):459–473, May 1994.

    Article  Google Scholar 

  39. Shivakant Mishra, Larry L. Peterson, and Richard D. Schlichting. A membership protocol based on partial order. In Proceedings of the IEEE International Working Conference on Dependable Computing For Critical Applications, pages 137–145, Tucson, AZ, February 1991.

    Google Scholar 

  40. Yoram Moses and Sergio Rajsbaum. The unified structure of consensus: a layered analysis approach. In Proceedings of the Seventeenth ACM Symposium on Principles of Distributed Computing, pages 123–132, August 1998.

    Google Scholar 

  41. G. Neiger and S. Toueg. Automatically increasing the fault-tolerance of distributed algorithms. Journal of Algorithms, 11(3):374–419, 1990.

    Article  MATH  MathSciNet  Google Scholar 

  42. Marshall Pease, Robert Shostak, and Leslie Lamport. Reaching agreement in the presence of faults. Journal of the ACM, 27(2):228–234, April 1980.

    Article  MATH  MathSciNet  Google Scholar 

  43. Michael Rabin. Randomized Byzantine generals. In Proceedings of the Twenty-Fourth Symposium on Foundations of Computer Science, pages 403–409. IEEE Computer Society Press, November 1983.

    Google Scholar 

  44. Rüdiger Reischuk. A new solution for the Byzantine general’s problem. Technical Report RJ 3673, IBM Research Laboratory, November 1982.

    Google Scholar 

  45. Aleta Ricciardi and Ken Birman. Using process groups to implement failure detection in asynchronous environments. In Proceedings of the Tenth ACM Symposium on Principles of Distributed Computing, pages 341–351. ACM Press, August 1991.

    Google Scholar 

  46. Fred B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4):299–319, December 1990.

    Article  Google Scholar 

  47. Dale Skeen. Nonblocking commit protocols. In Proceedings of the ACM SIGMOD Conf. on Management of Data, pages 133–147. ACM, June 1982.

    Google Scholar 

  48. N. V. Stenning. A data transfer protocol. Computer Networks, 1(2):99–110, 1976.

    Google Scholar 

  49. Robbert van Renesse, Kenneth P. Birman, Robert Cooper, Bradford Glade, and Patrick Stephenson. The horus system. In Kenneth P. Birman and Robbert van Renesse, editors, Reliable Distributed Computing with the Isis Toolkit, pages 133–147. IEEE Computer Society Press, Los Alamitos, CA, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Charron-Bost, B. (2001). Agreement Problems in Fault-Tolerant Distributed Systems. In: Pacholski, L., Ružička, P. (eds) SOFSEM 2001: Theory and Practice of Informatics. SOFSEM 2001. Lecture Notes in Computer Science, vol 2234. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45627-9_2

Download citation

  • DOI: https://doi.org/10.1007/3-540-45627-9_2

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42912-8

  • Online ISBN: 978-3-540-45627-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics