Skip to main content

Retrofitting Reliability into Complex Systems

  • Chapter
Guide to Reliable Distributed Systems

Part of the book series: Texts in Computer Science ((TCS))

  • 3116 Accesses

Abstract

Many systems evolve incrementally and hence the need arises to retrofit reliability into existing and often very complex systems. Here we discuss some of the major options for performing that task without needing to recode the existing application from scratch.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    This uses the term somewhat loosely: a VPN, in platforms like Windows and Linux, is a fairly specific technology packaging focused on providing secure remote access to a corporate network by tunneling through the firewall using a shared-key cryptographic scheme. In contrast, here we are employing the same term to connote a more general idea of overlaying a network with “other properties” on a base network with “base properties.” Others might call this an overlay network—but, overlay networks, like VPNs, also have come to have a fairly specific meaning, associated with end-to-end implementations of routing. Rather than invent some completely new term, the book uses VPN in a generalized way.

References

  • Ahamad, M., Burns, J., Hutto, P., Neiger, G.: Causal memory. Technical Report, College of Computing, Georgia Institute of Technology, July (1991)

    Google Scholar 

  • Alvisi, L., Bressoud, T., El-Khasab, A., Marzullo, K., Zagorodnov, D.: Wrapping server-side TCP to mask connection failures. In: INFOCOMM 2001, Anchorage, Alaska, 22–26 April 2001, vol. 1, pp. 329–337 (2001a)

    Google Scholar 

  • Birman, K.P., Joseph, T.A.: Exploiting virtual synchrony in distributed systems. In: Proceedings of the Eleventh Symposium on Operating Systems Principles, Austin, November 1987, pp. 123–138. ACM Press, New York (1987a)

    Chapter  Google Scholar 

  • Birman, K.P., van Renesse, R. (eds.): Reliable Distributed Computing with the Isis Toolkit. IEEE Computer Society Press, New York (1994)

    Google Scholar 

  • Birman, K.P., van Renesse, R.: Software for reliable networks. Sci. Am. 274(5), 64–69 (1996)

    Article  Google Scholar 

  • Borg, A., Baumbach, J., Glazer, S.: A message system for supporting fault tolerance. In: Proceedings of the Ninth Symposium on Operating Systems Principles, Bretton Woods, NH, October 1983, pp. 90–99 (1983)

    Chapter  Google Scholar 

  • Borg, A., et al.: Fault tolerance under UNIX. ACM Trans. Comput. Syst. 3(1), 1–23 (1985)

    Article  Google Scholar 

  • Bressoud, T.C., Schneider, F.B.: Hypervisor-based fault tolerance. In: Proceedings of the Fifteenth Symposium on Operating Systems Principles, Copper Mountain Resort, CO, December 1995, pp. 1–11. ACM Press, New York (1995). Also ACM Trans. Comput. Syst. 13(1) (1996)

    Chapter  Google Scholar 

  • Bykov, S., Geller, A., Kliot, G., Larus, J., Pandya, R., Thelin, J.: Orleans: Cloud computing for everyone. In: ACM Symposium on Cloud Computing (SOCC 2011), October 2011. ACM, New York (2011)

    Google Scholar 

  • Carter, J.: Efficient distributed shared memory based on multi-protocol release consistency. Ph.D. diss., Rice University, August (1993)

    Google Scholar 

  • Cho, K., Birman, K.P.: A group communication approach for mobile computing. Technical Report TR94-1424, Department of Computer Science, Cornell University, May (1994)

    Google Scholar 

  • Cooper, E.: Replicated distributed programs. In: Proceedings of the Tenth ACM Symposium on Operating Systems Principles, Orcas Island, WA, December 1985, pp. 63–78. ACM Press, New York (1985)

    Chapter  Google Scholar 

  • Coulouris, G., Dollimore, J., Kindberg, T.: Distributed Systems: Concepts and Design. Addison-Wesley, Reading (1994)

    Google Scholar 

  • Ekwall, R., Urbán, P., Schiper, A.: Robust TCP connections for fault tolerant computing. In: Proceedings of the 9th International Conference on Parallel and Distributed Systems (ICPDS), Taiwan ROC, Dec. 2002

    Google Scholar 

  • Feeley, M., et al.: Implementing global memory management in a workstation cluster. In: Proceedings of the Fifteenth ACM SIGOPS Symposium on Operating Systems Principles, Copper Mountain Resort, CO, December 1995, pp. 201–212 (1995)

    Chapter  Google Scholar 

  • Felton, E., Zahorjan, J.: Issues in the implementation of a remote memory paging system. Technical Report 91-03-09, Department of Computer Science and Engineering, University of Washington, March (1991)

    Google Scholar 

  • Gharachorloo, K., et al.: Memory consistency and event ordering in scalable shared-memory multiprocessors. In: Proceedings of the Seventeenth Annual International Symposium on Computer Architecture, Seattle, May 1990, pp. 15–26 (1990)

    Google Scholar 

  • Gosling, J., McGilton, H.: The Java language environment: A white paper. Sun Microsystems, Inc., October (1995a). Available as http://java.sun.com/langEnv/index.html

  • Gosling, J., McGilton, H.: The Java programmer’s guide: A white paper. Sun Microsystems, Inc., October (1995b). Available as http://java.sun.com/progGuide/index.html

  • Johansen, H., Allavena, A., van Renesse, R.: An introduction to the TACOMA distributed system (Version 1.0). Computer Science Technical Report 95-23, University of Tromsö, June (1995a)

    Google Scholar 

  • Johnson, K., Kaashoek, M.F., Wallach, D.: CRL: High-performance all software distributed shared memory. In: Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, Copper Mountain Resort, CO, December 1995, pp. 213–228 (1995)

    Chapter  Google Scholar 

  • Jones, M.B.: Interposition agents: Transparent interposing user code at the system interface. In: Proceedings of the Fourteenth ACM Symposium on Operating Systems Principles, Asheville, NC, December 1993, pp. 80–93. ACM Press, New York (1993)

    Chapter  Google Scholar 

  • Li, K., Hudak, P.: Memory coherence in a shared virtual memory system. ACM Trans. Comput. Syst. 7(4), 321–359 (1989)

    Article  Google Scholar 

  • Ousterhout, J.: TCL and the TK Toolkit. Addison-Wesley, Reading (1994)

    MATH  Google Scholar 

  • Rozier, M., et al.: Chorus distributed operating system. Comput. Syst. J. 1(4), 305–370 (1988a)

    Google Scholar 

  • Rozier, M., et al.: The Chorus distributed system. Comput. Syst. 299–328 (1988b)

    Google Scholar 

  • Tanenbaum, A.: Computer Networks, 2nd edn. Prentice Hall, Englewood Cliffs (1988)

    Google Scholar 

  • Wahbe, R., Lucco, S., Anderson, T., Graham, S.: Efficient software-based fault isolation. In: Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, Asheville, NC, December 1993, pp. 203–216. ACM Press, New York (1993)

    Chapter  Google Scholar 

  • Yu, Y., Isard, M., Fetterly, D., Budiu, M., Erlingsson, U., Gunda, P., Currey, J.: DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In: ACM Symposium on Operating System Design and Implementation (OSDI), San Diego, CA, December 8–10, 2008

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London Limited

About this chapter

Cite this chapter

Birman, K.P. (2012). Retrofitting Reliability into Complex Systems. In: Guide to Reliable Distributed Systems. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-2416-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2416-0_16

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-2415-3

  • Online ISBN: 978-1-4471-2416-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics