Skip to main content

Self-stabilizing Byzantine-Tolerant Distributed Replicated State Machine

  • Conference paper
  • First Online:
Stabilization, Safety, and Security of Distributed Systems (SSS 2016)

Abstract

Replicated state machine is a fundamental concept used for obtaining fault tolerant distributed computation. Legacy distributed computational architectures (such as Hadoop or Zookeeper) are designed to tolerate crashes of individual machines. Later, Byzantine fault-tolerant Paxos as well as self-stabilizing Paxos were introduced. Here we present for the first time the self-stabilizing Byzantine fault-tolerant version of a distributed replicated machine. It can cope with any adversarial takeover on less than one third of the participating replicas. It also ensures automatic recovery following any transient violation of the system state, in particular after periods in which more than one third of the participants are Byzantine. A prototype of self-stabilizing Byzantine-tolerant replicated Hadoop master node has been implemented. Experiments show that fully distributed recovery of cloud infrastructures against Byzantine faults can be made practical when relying on self-stabilization in local nodes. Thus automated cloud protection against a wide variety of faults and attacks is possible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The number of participants in the replicated state machine is typically small \(n=3f+1=4\), allowing quarter of the system to expose Byzantine behavior as the benefit of a larger system is bounded by tolerating one third of the participants being Byzantine (when approaching infinite number of participants). Moreover, the algorithm already proved itself in real practical systems. See [30] where it is stated that: “I used the entire paper a few years ago to design some middleware”, “The article is 19 pages long, very readable, and, as mentioned above, was used to create real software”. Subsequently, more complicated solutions to implement can be found in [16] and the references therein.

References

  1. Baron, J., El Defrawy, K., Lampkins, J., Ostrovsky, R.: How to withstand mobile virus attacks, revisited. In: PODC, pp. 293–302 (2014)

    Google Scholar 

  2. Binun, A., Bloch, M., Dolev, S., Kahil, M., Menuhin, B., Yagel, R., Coupaye, T., Lacoste, M., Wailly, A.: Self-stabilizing virtual machine hypervisor architecture for resilient cloud. In: IEEE International Workshop on Dependable and Secure Services (DSS) (2014)

    Google Scholar 

  3. Bonomi, S., Dolev, S., Potop-Butucaru, M., Raynal, M.: Stabilizing server-based storage in Byzantine asynchronous message-passing systems. In: PODC, pp. 471–479 (2015)

    Google Scholar 

  4. Blanchard, P., Dolev, S., Beauquier, J., Delaët, S.: Practically self-stabilizing paxos replicated state-machine. In: Noubir, G., Raynal, M. (eds.) NETYS 2014. LNCS, vol. 8593, pp. 99–121. Springer, Heidelberg (2014). doi:10.1007/978-3-319-09581-3_8

    Google Scholar 

  5. Brukman, O., Dolev, S.: Recovery oriented programming: runtime monitoring of safety and liveness. STTT 13(4), 377–395 (2011)

    Article  Google Scholar 

  6. Brukman, O., Dolev, S., Weinstock, M., Weiss, G.: Self-\(*\) Programming: run-time parallel control search for reflection box, Evolving Systems. Also in SASO 2008, pp. 481–482 (2013)

    Google Scholar 

  7. Brukman, O., Dolev, S., Haviv, Y., Lahiani, L., Kat, R., Schiller, E.M., Tzachar, N., Yagel, R.: Self-stabilization from theory to practice. Bulletin EATCS 94, 130–150 (2008)

    Google Scholar 

  8. Brukman, O., Dolev, S., Kolodner, E.K.: A self-stabilizing autonomic recoverer for eventual Byzantine software. J. Syst. Softw. 81(12), 2315–2327 (2008)

    Article  Google Scholar 

  9. Borran, F., Schiper, A.: A leader-free Byzantine consensus algorithm. In: Kant, K., Pemmaraju, S.V., Sivalingam, K.M., Wu, J. (eds.) ICDCN 2010. LNCS, vol. 5935, pp. 67–78. Springer, Heidelberg (2010). doi:10.1007/978-3-642-11322-2_11

    Chapter  Google Scholar 

  10. Bessani, A., Sousa, J., Alchieri, E.E.: State machine replication for the masses with BFT-SMART. In: Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA 23–26 June 2014

    Google Scholar 

  11. Castro, M., Liskov, B.: Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)

    Article  Google Scholar 

  12. Delaët, S., Dolev, S., Peres, O.: Safe and Eventually Safe: comparing self-stabilizing and non-stabilizing algorithms on a common ground. In: Abdelzaher, T., Raynal, M., Santoro, N. (eds.) OPODIS 2009. LNCS, vol. 5923, pp. 315–329. Springer, Heidelberg (2009). doi:10.1007/978-3-642-10877-8_25. Also, “Safer Than Safe: on the initial state of self-stabilizing systems”. In: SSS 2009, pp. 775–776 (2009)

    Chapter  Google Scholar 

  13. Dolev, S.: Self-Stabilization. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  14. Dolev, S., El Defrawy, K., Lampkins, J., Ostrovesky, R., Yung, M.: Proactive secret sharing with a dishonest majority. In: 10th International Conference, Security and Cryptography for Networks (SCN), brief announcment in PODC (2016)

    Google Scholar 

  15. Dolev, S., Georgiou, C., Marcoullis, I., Schiller, E.M.: Self-stabilizing Virtual Synchrony. In: SSS, pp. 248–264 (2015)

    Google Scholar 

  16. Dolev, D., Fuegger, M., Lenzen, C., Posch, M., Schmid, U., Steininger, A.: Rigorously modeling self-stabilizing fault-tolerant circuits: an ultra-robust clocking scheme for systems-on-chip. J. Comput. Syst. Sci. 80(4), 860–900 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  17. Dolev, S., Haviv, Y.A.: Self-stabilizing microprocessor: analyzing and overcoming soft errors. IEEE Trans. Comput. 55(4), 385–399 (2006)

    Article  Google Scholar 

  18. Dolev, S., Haviv, Y.A.: Stabilization enabling technology. IEEE Trans. Dependable Sec. Comput. 9(2), 275–288 (2012)

    Article  Google Scholar 

  19. Dolev, S., Haviv, Y.A., Sagiv, M.: Self-stabilization preserving compiler. ACM Trans. Program. Lang. Syst. 31(6) (2009)

    Google Scholar 

  20. Dolev, S., Kat, R.I.: Self-stabilizing distributed file system. J. High Speed Networks 14(2), 135–153 (2005)

    Google Scholar 

  21. Dolev, S., Liba, O., Schiller, E.M.: Self-stabilizing Byzantine resilient topology discovery and message delivery, CoRR abs/1208.5620 (2012)

    Google Scholar 

  22. Dolev, S., Hermann, T.: SuperStabilizing protocols for dynamic distributed systems. In: PODC (1995)

    Google Scholar 

  23. Dolev, S., Rajsbaum, S.: Stability of long-lived consensus. J. Comput. Syst. Sci. 67(1), 26–45 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  24. Dolev, S., Schiller, E.M., Spirakis, P.G., Tsigas, P.: Robust and scalable middleware for selfish-computer systems. Comput. Sci. Rev. 5(1), 69–84 (2011)

    Article  MATH  Google Scholar 

  25. Dolev, S., Schiller, E.M., Spirakis, P.G., Tsigas, P.: Strategies for repeated games with subsystem takeovers implementable by deterministic, self-stabilizing automata. IJAACS 4(1), 4–38 (2011)

    Article  Google Scholar 

  26. Dolev, S., Tzachar, N.: Randomization adaptive self-stabilization. Acta Inf. 47(5–6), 313–323 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  27. Dolev, S., Welch, J.L.: Self-stabilizing clock synchronization in the presence of Byzantine faults. J. ACM 51(5), 780–799 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  28. Dolev, S., Yagel, R.: Towards self-stabilizing operating systems. IEEE Trans. Softw. Eng. 34(4), 564–576 (2008)

    Article  MATH  Google Scholar 

  29. Dolev, S., Yagel, R.: Stabilizing trust and reputation for self-stabilizing efficient hosts in spite of Byzantine guests. Operating Syst. Rev. 44(3), 65–74 (2010)

    Article  Google Scholar 

  30. Dolev, S., Welch, J.: Bayard Kohlhepp Review #: CR130437 (0504-0452) on Self-stabilizing clock synchronization in the presence of Byzantine faults. J. ACM 51(5), 780–799 (2004). ACM Computing Review

    Article  MathSciNet  MATH  Google Scholar 

  31. Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)

    MATH  Google Scholar 

  32. Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)

    Article  Google Scholar 

  33. Lamport, L., Paxos made simple, fast, and Byzantine. In: OPODIS, pp. 7–9 (2002)

    Google Scholar 

  34. Tommy, M., McGuire, T.M., Gouda, M.G.: The Austin Protocol Compiler. Advances in Information Security. Springer, New York (2005)

    MATH  Google Scholar 

  35. Neumann, P.G.: Risks to the Public. ACM SIGSOFT Softw. Eng. Notes 37(1), 21–26 (2012)

    Article  Google Scholar 

  36. Ostrovsky, R., Yung, M.: How to withstand mobile virus attacks. In: PODC, pp. 51–59 (1991)

    Google Scholar 

  37. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of the USENIX Annual Technical Conference, p. 11, 23–25 June 2010, Boston, MA (2010)

    Google Scholar 

  38. Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  39. Wailly, A., Lacoste, M., Debar, H.: Vespa: multi-layered self-protection for cloud resources. In: Proceedings of the 9th Inter-national Conference on Autonomic Computing (ICAC) 2012, pp. 155–160, New York (2012)

    Google Scholar 

  40. Yagel, R., Dolev, S., Binun, A., Yankulin, L., Lacoste, M., Coupaye, T., Kassi-Lahlou, M., Palesandro, A., Wailly, A.: Data stabilization enforcement via active monitoring the cloud infrastructure consistency case. In: SSS (2015)

    Google Scholar 

  41. Clam Anti-Virus. https://www.clamav.net/

  42. Docker. https://www.docker.com/

  43. European Expert Group For IT-Security. www.eicar.org/download/eicar.com.txt

  44. Apache Hadoop. http://hadoop.apache.org/

  45. OpenStack. https://www.openstack.org/

Download references

Acknowledgments

The research was partially supported by the Rita Altura Trust Chair in Computer Sciences, Orange Labs under external research contract number 0050012310-C04021, grant of the Ministry of Science, Technology and Space, Israel, and the National Science Council (NSC) of Taiwan, and a grant of the Ministry of Science, Technology and Space, Israel, the Ministry of Foreign Affairs, Italy.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Alexander Binun or Shlomi Dolev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Binun, A. et al. (2016). Self-stabilizing Byzantine-Tolerant Distributed Replicated State Machine. In: Bonakdarpour, B., Petit, F. (eds) Stabilization, Safety, and Security of Distributed Systems. SSS 2016. Lecture Notes in Computer Science(), vol 10083. Springer, Cham. https://doi.org/10.1007/978-3-319-49259-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49259-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49258-2

  • Online ISBN: 978-3-319-49259-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics