Skip to main content

Fault-Tolerance Issues of Local Area Multiprocessor (LAMP) Storage Subsystem

  • Chapter
Fault-Tolerant Parallel and Distributed Systems
  • 111 Accesses

Abstract

This paper discusses the fault tolerance issues of the Local Area Multiprocessor (LAMP) storage subsystem, and presents its architecture design, error detection and recovery algorithms, and logical volume reconstruction procedure. LAMP is a network of workstations with shared physical memory. Its basic communication protocol is load and store. The LAMP storage subsystem is developed for this class of distributed computing system: 1) It is with distributed shared memory; 2) It uses low-latency and high-bandwidth interconnection; 3) It provides remote DMA support. The LAMP storage subsystem stripes data across multiple nodes for higher I/O performance and availability. It organizes logical volumes (virtual disks) to store files according to the file size, data access pattern, as well as other criteria performance, availability, and security requirements. The LAMP storage subsystem implements RAID technology: RAID-0, 1, and 5 for each logical volume. The write-ahead logging is used to log data, metadata and parity updates of a recovery unit, which allows LAMP storage subsystem to perform fast error recovery. For rapid reconstruction of a failed logical volume, the LAMP logical volume reconstruction algorithm is implemented. In this paper, three main fault tolerance issues of the LAMP storage subsystem are discussed: system configurability for fault tolerance and performance, fast error detection and recovery, and fast logical volume reconstruction.

This work is sponsored in part by a grant from National Science Foundation CCR-941006

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. S. Asami, N. Talagala, T. Anderson, K. Lutz, and D. Patterson. The Design of Large-Scale Do-It-Yourself RAIDs. Draft 1.0. http://www.cs.berkeley.edu, Nov 10, 1995.

    Google Scholar 

  2. L.-F. Cabrera and D. Long. Swift: Using Distributed Disk Striping to Provide High I/O Data Rates. Computer Systems, 4(4):405–436, fall 1991.

    Google Scholar 

  3. D. Long, B. Montague, and L.-F. Cabrera. Swift/RAID: A Distributed RAID System. Computing Systems, 7(3):333–359, summer 1994

    Google Scholar 

  4. P. Dibble, M. Scott, and C. Ellis. Bridge: A High-Performance File System for Parallel Processors. Proceedings of the 8th International Conference on Distributed Computing Systems (ICDCS). IEEE, New York, 154–161, 1988.

    Google Scholar 

  5. P. Dibble, and M. Scott. Beyond Striping: The Bridge Multiprocessor File System. Computer Architecture News, 17(5):32–39, September 1989

    Article  Google Scholar 

  6. J. Hartman, and J. Ousterhout. The Zebra Striped Network File System. ACM Transactions on Computer Systems, 13(3):274–310, August 1995.

    Article  Google Scholar 

  7. R. Wong, and T. Anderson. xFS: A Wide Area Mass Storage File System. 4th Workshop on Workstation Operating Systems, 71–78, October 1993.

    Google Scholar 

  8. T. Anderson, M. Dahlin, J. Neefe, D. Patterson, D. Roselli, and R. Wang. Severless Network File Systems. 15th ACM Symposium on Operating Systems Principles, December 1995.

    Google Scholar 

  9. M. Rosenblum, and J. Ousterhout. The Design and Implementation of a Log-Structured File System. ACM Trans. on Computer Systems, 10(1):26–52, February 1992.

    Article  Google Scholar 

  10. P. Chen, E. Lee, G. Gibson, R. Katz, and D. Patterson. RAID: High-Performance, Reliable Secondary Storage. ACM Computing Surveys, 26(2): 145–188, June 1994.

    Article  Google Scholar 

  11. G. Gibson. Redundant Disk Arrays Reliable, Parallel Secondary Storage, MIT Press, 1992.

    Google Scholar 

  12. P. Corbett, D. Feitelson, J. Prost et.al. Parallel File Systems for the IBM SP Computers, IBM Systems Journal, 34(2): 222–248, 1995.

    Article  Google Scholar 

  13. S. Lo Verso, M. Isman, A. Nanopoulos et. al. sfs: A parallel File System for the CM5, Proceedings of the Summer 1993 USENIX Conference (Cincinnati, Ohio), 291–305. June 1993.

    Google Scholar 

  14. P. Pierce. A Concurrent File System for a Highly Parallel Mass Storage Subsystem, Proceedings of the 4th Conference on Hypercubes, Concurrent Computers and Applications (Monterey, California), 155–160, March 1989.

    Google Scholar 

  15. B. Walker, G Popek, R. English, et. al. The LOCUS Distributed Operating System, ACM SIGOPS Operating Systems Review 17(5):49–70, 1993.

    Article  Google Scholar 

  16. M. Satyanarayanan, J. Kistler, P. Kumar, et. al. Coda: A Highly Available File System for a Distributed Workstation Environment, IEEE Transactions on Computers 39(4):447–459, April 1990.

    Article  Google Scholar 

  17. B. Liskov, S. Ghemawat, R. Gruber, et. al. Replication in the Harp File System, ACM SIGOPS Operating Systems Review 25(5):226–238, 1991.

    Article  Google Scholar 

  18. J. del Rosario, R. Bordawekar, and A. Choudhary. Improved Parallel I/O via a Twophase Run-time Access Strategy, Computer Architecture News, 21(5): 31–38, December 1993.

    Article  Google Scholar 

  19. G. Gibson, D. Stodolsky, F. Chang, et. al. The Scotch Parallel Storage Systems, Proceedings of the IEEE CompCon Conference (San Francisco, California), March 1995.

    Google Scholar 

  20. N. Nieuwejaar, and D. Kotz. The Gaily Parallel File System, PCS-TR96-286, Department of Computer Science, Dartmouth College, Hanover, NH, available at URL ftp://ftp.cs.dartmouth.edU/pub/CS-techreports/TR96-286.ps.Z, 1996.

    Google Scholar 

  21. ANSI/IEEE std 1596–1992, Scalable Coherent Interface, August 1993.

    Google Scholar 

  22. D. Gustavson, and Q. Li. Local Area Multiprocessor: the Scalable Coherent Interface, Proceedings of the Second International Workshop on SCI-based High Performance Low-Cost Computing: 131–154, March 1995.

    Google Scholar 

  23. W. de Jonge, M. Kaashoek, and W. Hsieh. The Logical Disk: A New Approach to Improving File Systems, Laboratory for Computer Science, MIT, Cambridge, MA. 1994.

    Google Scholar 

  24. W. Courtright, and G. Gibson. Backward Error Recovery in Redundant Disk Arrays, Proceedings of the 1994 Computer Measurement Group (CMG) Conference, Vol. 1:63–74, December 1994

    Google Scholar 

  25. W. Courtright, G. Gibson, and M. Holland, et. al. A Structured Approach to Redundant Disk Array Implementation, Proceedings of the International Computer Performance and Dependency Symposium (IPDS), September 4–6, 1996.

    Google Scholar 

  26. M. Holland. On-Line Data Reconstruction In Redundant Disk Arrays, PhD Dissertation, Department of Electrical and Computer Engineering, Carnegie Mellon University, 1994.

    Google Scholar 

  27. M. Holland, G. Gibson, and D. Siewiorek. Architectures and Algorithms for On-Line Failure Recovery in Redundant Disk Arrays, Journal of Distributed and Parallel Databases, 2(3), July 1994.

    Google Scholar 

  28. M. Holland, and G. Gibson. Parity Declustering for Continuous Operation in Redundant Disk Arrays, Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems, 1992.

    Google Scholar 

  29. D. Stodolsky, G. Gibson, and M. Holland. Parity Logging: Overcoming the Small Write Problem in Redundant Disk Arrays, Proceedings of the 21th Annual International Symposium on Computer Architecture, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer Science+Business Media New York

About this chapter

Cite this chapter

Li, Q., Hong, E., Tsukerman, A. (1998). Fault-Tolerance Issues of Local Area Multiprocessor (LAMP) Storage Subsystem. In: Fault-Tolerant Parallel and Distributed Systems. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5449-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-5449-3_8

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7488-6

  • Online ISBN: 978-1-4615-5449-3

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics