Skip to main content

Optimal Design of Checks for Error Detection and Location in Fault Tolerant Multiprocessor Systems

  • Conference paper
Fault-Tolerant Computing Systems

Part of the book series: Informatik-Fachberichte ((INFORMATIK,volume 283))

Abstract

Designing checks to detect or locate errors in the data is an important problem and plays an important role in the area of fault tolerance. Our checks are assumed to be of the simplest kind, i.e. a check can operate without any restriction on any non-empty subset of the set of data elements and can reliably detect up to one error in this subset. In this paper, we show how to design the data-check (DC) relationship. For the first time, we give a general procedure for designing checks to locate s errors, given any value for s. We also consider the problem of designing checks to detect s errors in the data. We give the first optimal construction for this problem. The procedure for designing the checks are simple and novel. One can also modify these constructions to produce uniform checks, i.e. checks which are identical and check the same number of data elements. We give procedures for obtaining such checks as well.

Recently, the problem of designing the DC relationship has attracted a lot of attention due to the important role it plays in the design of algorithm-based fault tolerant (ABFT) systems. In this paper, we illustrate the above problem in this context. ABFT schemes have been shown to be a natural paradigm for concurrent error detection/location in multiprocessor systems and systolic array computations. Banerjee and Abraham have shown that an ABPT scheme can be modeled as a tripartite graph consisting of processors (P), data (D) and checks(C). Our constructions can be used along with any general technique for designing fault tolerant PDC graphs, e.g. for designing unit systems [NA89] or for designing ud-systems [VJ91] etc.

This work was supported by DARPA/ONR under Contract no. N00014-88-K-0459.

This work was supported in part by ONR under Contract no. N00014-91-J-1199 and in part by AFOSR under Contract no. AFOSR-90-0144.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. A. Abraham et al., “Fault tolerance techniques for systolic arrays,” IEEE Computer, pp. 65–74, July 1987.

    Google Scholar 

  2. P. Banerjee et al., “An evaluation of system-level fault tolerance on the Intel hypercube multiprocessor,” in Proc. Int. Symp. Fault Tolerant Comput., Tokyo, pp. 362–367, June 1988.

    Google Scholar 

  3. P. Banerjee and J. A. Abraham, “Bounds on algorithm-based fault tolerance in multiple processor systems,” IEEE Trans. Comput., vol. C-35, pp. 296–306, Apr. 1986.

    Google Scholar 

  4. P. Banerjee and J. A. Abraham, “A probabilistic model of algorithm-based fault tolerance in array processors for real-time systems,” in Proc. Real-Time Systems Symp., pp. 72–78, 1986.

    Google Scholar 

  5. Y-H. Choi and M. Malek, “A fault tolerant FFT processor,” IEEE Trans. Comput, vol. 37, no. 5, pp. 617–621, May 1988.

    Article  Google Scholar 

  6. Y-H. Choi and M. Malek, “A fault tolerant systolic sorter,” IEEE Trans. Comput, vol. 37, no. 5, pp. 621–624, May 1988.

    Article  Google Scholar 

  7. D. Gu, D. J. Rosenkrantz, and S. S. Ravi, “Design and analysis of test schemes for algorithm-based fault tolerance,” in Proc. Int. Symp. Fault Tolerant Comput., pp. 106–113, Newcastle-upon-Tyne, U K., June 1990.

    Chapter  Google Scholar 

  8. K.-H. Huang and J. A. Abraham, “Algorithm-based fault tolerance for matrix operations” IEEE Trans. Comput., vol. C-33, pp. 518–528, June 1984.

    Google Scholar 

  9. J.-Y. Jou and J. A. Abraham, “Fault tolerant matrix arithmetic and signal processing on highly concurrent computing structures,” Proc. IEEE, vol. 74, no. 5, pp. 732–741, May 1986.

    Article  Google Scholar 

  10. J. Y. Jou and J. A. Abraham, “Fault tolerant FFT networks,” IEEE Trans. Comput., vol. 37, no. 5, pp. 548–561, May 1988.

    Article  Google Scholar 

  11. F. T. Luk and H. Park, “An analysis of algorithm-based fault tolerance techniques,” in Proc. SPIE Adv. Alg. amp; Arch, for Signal Proc., vol. 696, pp. 222–228, Aug. 1986.

    Google Scholar 

  12. V. S. S. Nair and J. A. Abraham, “A model for the analysis of fault tolerant signal processing architectures,” in Proc. 32nd Int. Tech. Symp. of SPIE, San Diego, pp. 246–257, Aug. 1988.

    Google Scholar 

  13. V. S. S. Nair and J. A. Abraham, “A model for the analysis, design and comparison of fault-tolerant WSI architectures,” in Proc. Workshop on Wafer Scale Integration, Como, Italy, June 1989.

    Google Scholar 

  14. V. S. S. Nair and J. A. Abraham, “Hierarchical design and analysis of fault- tolerant multiprocessor systems using concurrent error detection,” in Int. Symp. Fault Tolerant Comput., Newcastle-upon-Tyne, U.K., pp. 130–137, June 1990.

    Google Scholar 

  15. A. L. N. Reddy and P. Banerjee, “Algorithm-based fault detection for signal processing applications,” IEEE Trans. Comput., vol. 39, pp. 1304–1308, Oct. 1990.

    Article  Google Scholar 

  16. D. J. Rosenkrantz and S. S. Ravi, “Improved upper bounds for algorithm-based fault tolerance,” in Proc. 26th Allerton Conf. Comm. Cont. amp; Comput., Allerton, IL, pp. 388 - 397, Sept. 1988.

    Google Scholar 

  17. B. Vinnakota and N. K. Jha, “Diagnosability and diagnosis of algorithm-based fault tolerant systems,” in Proc. 32nd Midwest Symp. Circuits & Systems, Urbana, IL, pp. 28–31, Aug. 1989.

    Google Scholar 

  18. B. Vinnakota and N. K. Jha, “A dependence graph-based approach to the design of algorithm-based fault tolerant systems,” in Proc. Int. Symp. Fault Tolerant Comput., pp. 122–129, Newcastle-upon-Tyne, U.K., June 1990.

    Chapter  Google Scholar 

  19. B. Vinnakota and N. K. Jha, “Design of multiprocessor systems for concurrent error detection and fault diagnosis,” in Proc. Int. Symp. Fault Tolerant Comput., Montreal, June 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1991 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sitaraman, R., Jha, N.K. (1991). Optimal Design of Checks for Error Detection and Location in Fault Tolerant Multiprocessor Systems. In: Cin, M.D., Hohl, W. (eds) Fault-Tolerant Computing Systems. Informatik-Fachberichte, vol 283. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-76930-6_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-76930-6_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-54545-3

  • Online ISBN: 978-3-642-76930-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics