Abstract
Data replication is a key design principle for achieving reliability, high-availability, survivability and load balancing in distributed computing systems. The common denominator of all existing replication systems is the need to keep replicas consistent. The main paradigm for supporting replicated data is active replication, in which replicas execute the same sequence of methods on the object in order to remain consistent. This paradigm led to the definition of State Machine Replication (SMR) [29.8], [29.13]. The necessary building block of SMR is an engine that delivers operations at each site in the same total order without gaps, thus keeping the replica states consistent.
This work was supported in part by the Israeli Ministry of Science grant #1230-3-01.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Boichat, P. Dutta, S. Frolund and R. Guerraoui. Deconstructing Paxos. Technical Report DSC ID:200106, Communication Systems Department (DSC), École Polytechnic Fédérale de Lausanne (EPFL), January 2001. Available at http://dscwww.epfl.ch/EN/publications/documents/tr01006.pdf.
T. D. Chandra and S. Toueg. Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43(2):225–267, March 1996.
G. Chockler and D. Malkhi. Active disk Paxos with infinitely many processes. In Proceedings of the 21st ACM Symposium on Principles of Distributed Computing (PODC’02 ), July 2002. To appear.
G. Chockler, D. Malkhi and M. K. Reiter. Backoff protocols for distributed mutual exclusion and ordering. In Proceedings of the 21st International Conference on Distributed Computing Systems, pages 11–20, April 2001.
M. J. Fischer, N. A. Lynch, and M. S. Paterson. Impossibility of distributed consensus with one faulty process. Journal of the ACM 32(2):374–382, April 1985.
E. Gafni and L. Lamport. Disk Paxos. In Proceedings of 14th International Symposium on Distributed Computing (DISC’2000), pages 330–344, October 2000.
P. Jayanti, T. Chandra, and S. Toueg. Fault-tolerant wait-free shared objects. Journal of the ACM 45(3):451–500, May 1998.
L. Lamport. Time, clocks, and the ordering of events in distributed systems. Communications of the ACM 21(7):558–565, July 1978.
L. Lamport. The Part-time parliament. ACMTransactions on Computer Systems 16(2):133–169, May 1998.
W. K. Lo and V. Hadzilacos. Using failure detectors to solve consensus in asynchronous shared-memory systems. In Proceedings of the 8th InternationalWorkshop on Distributed Algorithms (WDAG), Springer-Verlag LNCS 857:280–295, Berlin, 1994.
D. Malkhi and M. K. Reiter. An architecture for survivable coordination in largescale systems. IEEE Transactions on Knowledge and Data Engineering 12(2):187–202, March/April 2000.
J. P. Martin, L. Alvisi and M. Dahlin. Minimal Byzantine Storage. In Proceedings of the 16th International Conference on DIStribued Computing (DISC’02), pages 311–325, October 2002
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22(4):299–319, December 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chockler, G., Malkhi, D., Dolev, D. (2003). A Data-Centric Approach for Scalable State Machine Replication. In: Schiper, A., Shvartsman, A.A., Weatherspoon, H., Zhao, B.Y. (eds) Future Directions in Distributed Computing. Lecture Notes in Computer Science, vol 2584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-37795-6_29
Download citation
DOI: https://doi.org/10.1007/3-540-37795-6_29
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00912-2
Online ISBN: 978-3-540-37795-5
eBook Packages: Springer Book Archive