Abstract
We present a real-time fault-tolerant design for an l-level k-ary tree multiprocessor and examine its reconfigurability. The k-ary tree is augmented by spare nodes and spare links. By utilizing the capabilities of wave-switching communication modules of the spare nodes, faulty nodes and faulty links can be tolerated. We consider two modes of operations. In the strict mode, the multiprocessor is under heavy computation or hard deadline and therefore we use a fast and local reconfiguration scheme to tolerate the faulty nodes. In the relaxed mode, where light computation or soft deadline is encountered, a global reconfiguration scheme is used to maximize the utilization of spare nodes, both in this mode as well as in the next strict mode. Both theoretical and simulation results are examined. Our simulation results, in the relaxed mode of operation, reveal that our approach can tolerate significantly more faulty nodes than other approaches, with a low overhead and no performance degradation.
Similar content being viewed by others
References
J. Dongarra and D. Walker. The quest for petascale computing. IEEE Computing in Science and Engineering, 32–39, May 2001.
S. Bhatt, F. Chung, F. Leighton, and A. Rosenberg. Efficient embedding of trees in hypercubes. SIAM Journal of Computing, 21(1):151–162, 1992.
K. Li. Determining the expected load of dynamic tree embedding in hypercubes. Proceedings of 17th International Conference on Distributed Computing Systems, pp. 508–515, 1997.
S. Lee and H. Choi. Embedding of complete binary trees in meshes with row-column routing. IEEE Transactions on Parallel and Distributed Systems, 7(5):493–497, 1996.
C. E. Leiserson. The network architecture of the connection machine CM-5. In Proceedings of the 4th Annual ACM Symposium on Parallel Algorithms and Architectures, pp. 272–285, June 1992.
Meiko World Incorporated. Computing Surface 2 Reference Manuals, Preliminary Edition, 1993.
H. L. Muller, P. W. Stallard, and D. H. Warren. An evaluation study of a link-based data diffusion machine. In Proceedings of the 8th International Parallel Processing Symposium, pp. 115–128, April 1994.
B. Izadi and F. Özgüner. Reconfigurable k-ary tree multiprocessors. International Journal of Parallel and Distributed Systems and Networks, 3(4): 227–234, 2000.
J. P. Hayes. A graph model for fault-tolerant computing systems. IEEE Transactions on Computers, c-25:875–884, September 1976.
C. L. Kwan and S. Toida. An optimal 2-FT realization of binary symmetric hierarchical tree systems. Networks, 12(12):231–239, 1982.
C. Raghavendra, A. Avizienis, and M. D. Ercegovac. Fault tolerance in binary tree architectures. IEEE Transactions on Computers, c-33:568–572, June 1984.
S. Dutt and J. Hayes. On designing and reconfiguring k-fault-tolerant tree architectures. IEEE Transactions on Computers, 39:490–503, April 1990.
M. B. Lowrie and W. K. Fuchs. Reconfigurable tree architecture using subtree oriented fault tolerance. IEEE Transactions on Computers, c-36:1172–1182, October 1987.
R. Libeskind-Hadas, N. Shrivastava, R. Melhem, and C. Liu. Optimal reconfiguration algorithms for real-time fault-tolerant processor arrays. IEEE Transactions on Parallel and Distributed Systems, 6:498–510, May 1995.
J. Duato, P. Lopez, and S. Yalamanchili. Deadlock-and livelock-free routing protocols for wave switching. In Proceedings of the 11th International Parallel Processing Symposium, pp. 570–577, April 1997.
C. J. Colbourn. The Combinatorics of Network Reliability, Oxford University Press, 1987.
B. Izadi. Design of fault-tolerant distributed memory multiprocessors. Ph.D. thesis, the Ohio State University, 1995.
C. Y. Lee. An algorithm for path connection and its applications. IRE Transactions on Electronic Computers, ec-10:346–365, 1961.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Izadi, B.A., Özgüner, F. An Augmented k-ary Tree Multiprocessor with Real-Time Fault-Tolerant Capability. The Journal of Supercomputing 27, 5–17 (2004). https://doi.org/10.1023/A:1026235604866
Issue Date:
DOI: https://doi.org/10.1023/A:1026235604866