Skip to main content

Enhancing Fault Tolerance of Real-Time Systems through Time Redundancy

  • Chapter
Foundations of Dependable Computing

Abstract

Fault-tolerant, real-time systems require correct, time-constrained results in the presence of faults. Missed deadlines in many high dependability systems can result in significant property damage or loss of human life. Historically, designers relied almost exclusively upon massive hardware replication to achieve their dependability goals. Research suggests that not only is this approach inadequate for dealing with certain fault classes, but also that it is inappropriate for many applications with strict space, weight, and cost constraints. Alternatively, time redundancy can be used to complement replication as a means to improve fault coverage and reduce the required level of replication for fault-tolerant system design. Although previous work has advocated the use of time redundancy to provide protection against hardware and software faults, there exists no formal methodology for allocating and managing such time. This chapter provides an overview of recent work in developing a comprehensive analytical framework for allocating and managing time redundancy to preserve the timing correctness of priority-driven, real-time systems in the presence of faults.

Research supported in part by Office of Naval Research under Contract N00014-92-J-1524 and by AT&T Bell Laboratories under the Cooperative Research Fellowship Program.

This work was done while the author was at Carnegie Mellon University, Pittsburgh, PA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Farnam Jahanian. State restoration in real-time fault-tolerant systems. Complex Systems Engineering Synthesis and Assessment Technology Workshop, pages 21–29, July 1992.

    Google Scholar 

  2. Tom Hand. Real-time systems need predictability. Computer Design RISC, Supplement:57–59, August 1989.

    Google Scholar 

  3. H. Kopetz, H. Kantz, G. Grunsteidl, P. Puschner, and J. Reisinger. Tolerating transient faults in mars. In International Symposium on Fault-Tolerant Computing, pages 466–473, NewCastle Upon Tyne, U.K., June 1990.

    Google Scholar 

  4. C.M. Krishna and A.D. Singh. Modelling correlated transient failures in fault-tolerant systems. In 1989 International Symposium on Fault-Tolerant Computing, pages 374–381, Chicago, Illinois, June 1989.

    Google Scholar 

  5. A.L. Hopkins, T.B. Smith III, and J.H. Lala. Ftmp — a highly reliable fault-tolerant multiprocessor for aircraft. Proceedings of the IEEE 66, pages 1221–1239, October 1978.

    Google Scholar 

  6. J. Goldberg et.al. Development and analysis of the software implemented fault-tolerance (sift) computer. Technical report, NASA CR-172146, 1984.

    Google Scholar 

  7. J.H. Wensley et.al. Sift: The design and analysis of a fault-tolerant computer for aircraft control. Proceedings of the IEEE 66, 66(10), October 1978.

    Google Scholar 

  8. J.H. Lala and L.S. Alger. Hardware and software fault tolerance: A unified architectural approach. In 1988 International Symposium on Fault-Tolerant Computing, pages 240–245, Tokyo, Japan, June 1988.

    Google Scholar 

  9. Y.K. Malaiya. Linearly correlated intermittent failures. IEEE Transactions on Reliability, R-31(2), 1982.

    Google Scholar 

  10. S.R. McConnel, D.P. Siewiorek, and M.M. Tsao. The measurement and analysis of transient errors in digital computing systems. In Digest of Papers, Ninth Annual International Conference on Fault-Tolerant Computing, pages 67–70, 1979.

    Google Scholar 

  11. Ting-Ting Y. Lin. Design and Evaluation of an On-line Predictive Diagnostic System. PhD thesis, Carnegie Mellon University, May 1988.

    Google Scholar 

  12. Jim Gray. Why do computers stop and what can be done about it? In Fifth Symposium on Reliability in Distributed Software and Database Systems, pages 374–381, Los Angeles, California, Jan. 1986.

    Google Scholar 

  13. Daniel P. Siewiorek. Architecture of fault-tolerant computers: An historical perspective. In Proceedings of the IEEE, volume 79, pages 1–25, December 1991.

    Google Scholar 

  14. B. Randell. System structure for software fault tolerance. IEEE Transactions on Sofware Engineering, pages 220–232, June 1975.

    Google Scholar 

  15. A. Avizienis and J. Kelly. Fault tolerance by design diversity: Concepts and experiments. IEEE Computer, August 1984.

    Google Scholar 

  16. L.J. Yount. Architectural solutions to safety problems for commercial transports. Proceedings of the 6th AIAA/IEEE Digital Avionics Systems Conference, December 1984.

    Google Scholar 

  17. G.F. Sullivan and G.M. Masson. Using certification trails to achieve software fault tolerance. In Proceedings of the IEEE 1990 Fault-Tolerant Computing Symposium, pages 423–431, 1990.

    Google Scholar 

  18. G.F. Sullivan and G.M. Masson. Certification trails for data structures. Technical Report JHU 90/17, John Hopkins University, MD., 1990.

    Google Scholar 

  19. P. Hood and V. Grover. Designing real-time systems in ada. Technical Report 1123-1, SofTech Inc., 460 Totten Pold Road, Waltham, MA 022540-9197, January 1986.

    Google Scholar 

  20. Kopetz et.al. Distributed fault-tolerant real-time systems: The mars approach. IEEE Micro, 9(1):25–40, February 1989.

    Article  Google Scholar 

  21. V. Nirkhe and W. Pugh. A partial evaluator for the maruti hard real-time system. In Real-Time Systems Symposium, pages 64–73, Dec. 1991.

    Google Scholar 

  22. J. Stankovic and K. Ramamritham. The spring kernel: A new paradigm for real-time operating systems. ACM Operating Systems Review, 23(3), July 1989.

    Google Scholar 

  23. T.B. Smith III. The Fault-Tolerant Multiprocessor Computer. Moyes Publications, 1986.

    Google Scholar 

  24. James Gafford. Rate monotonic scheduling. IEEE Micro, pages 34–38, June 1991.

    Google Scholar 

  25. Sandra Ramos Thuel. Enhancing Fault Tolerance of Real-Tine Systems through Time Redundancy. Ph D thesis, Carnegie Mellon University, May 1993.

    Google Scholar 

  26. H. Chetto and M. Chetto. Some results of the earliest deadline scheduling algorithm. IEEE Transactions on SW Eng., 15(10):466–473, 1989.

    Google Scholar 

  27. K. Schwan and H. Zhou. Dynamic scheduling of hard real-time tasks and real-time threads. IEEE Transactions on SW Eng., 18(8):736–748, 1992.

    Article  Google Scholar 

  28. C.L. Liu and J.W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. Journal of the Association for Computing Machinery, 20(l):46–61, January 1973.

    MATH  Google Scholar 

  29. J. Y.-T. Leung and J. Whitehead. On the complexity of fixed-priority scheduling of periodic real-time tasks. Performance Evaluation, 2:237–250, 1982.

    Article  Google Scholar 

  30. John Lehoczky, Lui Sha, and Ye Ding. The rate-monotonic scheduling algorithm: Exact characterization and average case behavior. In Real-Time Systems Symposium, pages 166–171, 1989.

    Google Scholar 

  31. Sandra Ramos-Thuel and Jay K. Strosnider. Scheduling fault recovery operations for time-critical applications. In Proceedings of Dependable Computing for Critical Applications, January 1994.

    Google Scholar 

  32. Sandra Ramos-Thuel and John P. Lehoczky. An optimal algorithm for scheduling soft-aperiodic tasks in fixed-priority preemptive systems. In Real-Time Systems Symposium, pages 100–110, December 1992.

    Google Scholar 

  33. Sandra Ramos-Thuel and John P. Lehoczky. On-line scheduling of hard deadline aperiodic tasks in fixed-priority systems. In Proceedings of the Real-Time Systems Symposium, pages 160–171, December 1993.

    Google Scholar 

  34. Edward C. Russell. Building Simulation Models with SIMSCRIPT II.5. CACI Inc., 1983.

    Google Scholar 

  35. W.G. Bouricius. Reliability modeling for fault-tolerant computers. IEEE Transactions on Computers, C-20:1306–1311, Nov. 1971.

    Google Scholar 

  36. K. Fowler. Inertial navigation system simulator: Top-level design. Technical Report CMU/SEI-89-TR-38, Software Engineering Institute, January 1989.

    Google Scholar 

  37. W. Stallings. Data and Computer Communications. Macmillan, N.Y., N.Y., 1985.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Kluwer Academic Publishers

About this chapter

Cite this chapter

Thuel, S.R., Strosnider, J.K. (1994). Enhancing Fault Tolerance of Real-Time Systems through Time Redundancy. In: Koob, G.M., Lau, C.G. (eds) Foundations of Dependable Computing. The Kluwer International Series in Engineering and Computer Science, vol 285. Springer, Boston, MA. https://doi.org/10.1007/978-0-585-28002-8_9

Download citation

  • DOI: https://doi.org/10.1007/978-0-585-28002-8_9

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-0-7923-9486-0

  • Online ISBN: 978-0-585-28002-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics