Abstract
Replicated state machine is a fundamental concept used for obtaining fault tolerant distributed computation. Legacy distributed computational architectures (such as Hadoop or Zookeeper) are designed to tolerate crashes of individual machines. Later, Byzantine fault-tolerant Paxos as well as self-stabilizing Paxos were introduced. Here we present for the first time the self-stabilizing Byzantine fault-tolerant version of a distributed replicated machine. It can cope with any adversarial takeover on less than one third of the participating replicas. It also ensures automatic recovery following any transient violation of the system state, in particular after periods in which more than one third of the participants are Byzantine. A prototype of self-stabilizing Byzantine-tolerant replicated Hadoop master node has been implemented. Experiments show that fully distributed recovery of cloud infrastructures against Byzantine faults can be made practical when relying on self-stabilization in local nodes. Thus automated cloud protection against a wide variety of faults and attacks is possible.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The number of participants in the replicated state machine is typically small \(n=3f+1=4\), allowing quarter of the system to expose Byzantine behavior as the benefit of a larger system is bounded by tolerating one third of the participants being Byzantine (when approaching infinite number of participants). Moreover, the algorithm already proved itself in real practical systems. See [30] where it is stated that: “I used the entire paper a few years ago to design some middleware”, “The article is 19 pages long, very readable, and, as mentioned above, was used to create real software”. Subsequently, more complicated solutions to implement can be found in [16] and the references therein.
References
Baron, J., El Defrawy, K., Lampkins, J., Ostrovsky, R.: How to withstand mobile virus attacks, revisited. In: PODC, pp. 293–302 (2014)
Binun, A., Bloch, M., Dolev, S., Kahil, M., Menuhin, B., Yagel, R., Coupaye, T., Lacoste, M., Wailly, A.: Self-stabilizing virtual machine hypervisor architecture for resilient cloud. In: IEEE International Workshop on Dependable and Secure Services (DSS) (2014)
Bonomi, S., Dolev, S., Potop-Butucaru, M., Raynal, M.: Stabilizing server-based storage in Byzantine asynchronous message-passing systems. In: PODC, pp. 471–479 (2015)
Blanchard, P., Dolev, S., Beauquier, J., Delaët, S.: Practically self-stabilizing paxos replicated state-machine. In: Noubir, G., Raynal, M. (eds.) NETYS 2014. LNCS, vol. 8593, pp. 99–121. Springer, Heidelberg (2014). doi:10.1007/978-3-319-09581-3_8
Brukman, O., Dolev, S.: Recovery oriented programming: runtime monitoring of safety and liveness. STTT 13(4), 377–395 (2011)
Brukman, O., Dolev, S., Weinstock, M., Weiss, G.: Self-\(*\) Programming: run-time parallel control search for reflection box, Evolving Systems. Also in SASO 2008, pp. 481–482 (2013)
Brukman, O., Dolev, S., Haviv, Y., Lahiani, L., Kat, R., Schiller, E.M., Tzachar, N., Yagel, R.: Self-stabilization from theory to practice. Bulletin EATCS 94, 130–150 (2008)
Brukman, O., Dolev, S., Kolodner, E.K.: A self-stabilizing autonomic recoverer for eventual Byzantine software. J. Syst. Softw. 81(12), 2315–2327 (2008)
Borran, F., Schiper, A.: A leader-free Byzantine consensus algorithm. In: Kant, K., Pemmaraju, S.V., Sivalingam, K.M., Wu, J. (eds.) ICDCN 2010. LNCS, vol. 5935, pp. 67–78. Springer, Heidelberg (2010). doi:10.1007/978-3-642-11322-2_11
Bessani, A., Sousa, J., Alchieri, E.E.: State machine replication for the masses with BFT-SMART. In: Proceedings of the 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, Atlanta, GA 23–26 June 2014
Castro, M., Liskov, B.: Practical Byzantine fault tolerance and proactive recovery. ACM Trans. Comput. Syst. 20(4), 398–461 (2002)
Delaët, S., Dolev, S., Peres, O.: Safe and Eventually Safe: comparing self-stabilizing and non-stabilizing algorithms on a common ground. In: Abdelzaher, T., Raynal, M., Santoro, N. (eds.) OPODIS 2009. LNCS, vol. 5923, pp. 315–329. Springer, Heidelberg (2009). doi:10.1007/978-3-642-10877-8_25. Also, “Safer Than Safe: on the initial state of self-stabilizing systems”. In: SSS 2009, pp. 775–776 (2009)
Dolev, S.: Self-Stabilization. MIT Press, Cambridge (2000)
Dolev, S., El Defrawy, K., Lampkins, J., Ostrovesky, R., Yung, M.: Proactive secret sharing with a dishonest majority. In: 10th International Conference, Security and Cryptography for Networks (SCN), brief announcment in PODC (2016)
Dolev, S., Georgiou, C., Marcoullis, I., Schiller, E.M.: Self-stabilizing Virtual Synchrony. In: SSS, pp. 248–264 (2015)
Dolev, D., Fuegger, M., Lenzen, C., Posch, M., Schmid, U., Steininger, A.: Rigorously modeling self-stabilizing fault-tolerant circuits: an ultra-robust clocking scheme for systems-on-chip. J. Comput. Syst. Sci. 80(4), 860–900 (2014)
Dolev, S., Haviv, Y.A.: Self-stabilizing microprocessor: analyzing and overcoming soft errors. IEEE Trans. Comput. 55(4), 385–399 (2006)
Dolev, S., Haviv, Y.A.: Stabilization enabling technology. IEEE Trans. Dependable Sec. Comput. 9(2), 275–288 (2012)
Dolev, S., Haviv, Y.A., Sagiv, M.: Self-stabilization preserving compiler. ACM Trans. Program. Lang. Syst. 31(6) (2009)
Dolev, S., Kat, R.I.: Self-stabilizing distributed file system. J. High Speed Networks 14(2), 135–153 (2005)
Dolev, S., Liba, O., Schiller, E.M.: Self-stabilizing Byzantine resilient topology discovery and message delivery, CoRR abs/1208.5620 (2012)
Dolev, S., Hermann, T.: SuperStabilizing protocols for dynamic distributed systems. In: PODC (1995)
Dolev, S., Rajsbaum, S.: Stability of long-lived consensus. J. Comput. Syst. Sci. 67(1), 26–45 (2003)
Dolev, S., Schiller, E.M., Spirakis, P.G., Tsigas, P.: Robust and scalable middleware for selfish-computer systems. Comput. Sci. Rev. 5(1), 69–84 (2011)
Dolev, S., Schiller, E.M., Spirakis, P.G., Tsigas, P.: Strategies for repeated games with subsystem takeovers implementable by deterministic, self-stabilizing automata. IJAACS 4(1), 4–38 (2011)
Dolev, S., Tzachar, N.: Randomization adaptive self-stabilization. Acta Inf. 47(5–6), 313–323 (2010)
Dolev, S., Welch, J.L.: Self-stabilizing clock synchronization in the presence of Byzantine faults. J. ACM 51(5), 780–799 (2004)
Dolev, S., Yagel, R.: Towards self-stabilizing operating systems. IEEE Trans. Softw. Eng. 34(4), 564–576 (2008)
Dolev, S., Yagel, R.: Stabilizing trust and reputation for self-stabilizing efficient hosts in spite of Byzantine guests. Operating Syst. Rev. 44(3), 65–74 (2010)
Dolev, S., Welch, J.: Bayard Kohlhepp Review #: CR130437 (0504-0452) on Self-stabilizing clock synchronization in the presence of Byzantine faults. J. ACM 51(5), 780–799 (2004). ACM Computing Review
Lynch, N.A.: Distributed Algorithms. Morgan Kaufmann, San Francisco (1996)
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)
Lamport, L., Paxos made simple, fast, and Byzantine. In: OPODIS, pp. 7–9 (2002)
Tommy, M., McGuire, T.M., Gouda, M.G.: The Austin Protocol Compiler. Advances in Information Security. Springer, New York (2005)
Neumann, P.G.: Risks to the Public. ACM SIGSOFT Softw. Eng. Notes 37(1), 21–26 (2012)
Ostrovsky, R., Yung, M.: How to withstand mobile virus attacks. In: PODC, pp. 51–59 (1991)
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of the USENIX Annual Technical Conference, p. 11, 23–25 June 2010, Boston, MA (2010)
Pease, M., Shostak, R., Lamport, L.: Reaching agreement in the presence of faults. J. ACM 27(2), 228–234 (1980)
Wailly, A., Lacoste, M., Debar, H.: Vespa: multi-layered self-protection for cloud resources. In: Proceedings of the 9th Inter-national Conference on Autonomic Computing (ICAC) 2012, pp. 155–160, New York (2012)
Yagel, R., Dolev, S., Binun, A., Yankulin, L., Lacoste, M., Coupaye, T., Kassi-Lahlou, M., Palesandro, A., Wailly, A.: Data stabilization enforcement via active monitoring the cloud infrastructure consistency case. In: SSS (2015)
Clam Anti-Virus. https://www.clamav.net/
Docker. https://www.docker.com/
European Expert Group For IT-Security. www.eicar.org/download/eicar.com.txt
Apache Hadoop. http://hadoop.apache.org/
OpenStack. https://www.openstack.org/
Acknowledgments
The research was partially supported by the Rita Altura Trust Chair in Computer Sciences, Orange Labs under external research contract number 0050012310-C04021, grant of the Ministry of Science, Technology and Space, Israel, and the National Science Council (NSC) of Taiwan, and a grant of the Ministry of Science, Technology and Space, Israel, the Ministry of Foreign Affairs, Italy.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Binun, A. et al. (2016). Self-stabilizing Byzantine-Tolerant Distributed Replicated State Machine. In: Bonakdarpour, B., Petit, F. (eds) Stabilization, Safety, and Security of Distributed Systems. SSS 2016. Lecture Notes in Computer Science(), vol 10083. Springer, Cham. https://doi.org/10.1007/978-3-319-49259-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-49259-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49258-2
Online ISBN: 978-3-319-49259-9
eBook Packages: Computer ScienceComputer Science (R0)