Abstract
A synchronization network (SN) consists of processing elements (PEs) at the leaves of a complete binary tree, with routing switches at interior nodes. We study the problem of rendering an SN tolerant to PE failures, by adding queues to its edges. We obtain the following results. In the worst-case, an N-PE SN whose edges have queues of capacity O(log log N) can tolerate the failure of a positive fraction of its PEs, no matter how the failed PEs are distributed; furthermore, this capacity requirement cannot be lowered by more than a small constant factor. In the expected-case, with probability exceeding 1−N −Ω(1) an N-PE SN whose edges have queues of capacity O(log log log N) can tolerate the failure of a positive fraction of its PEs; we do not know if this capacity requirement can be lowered. We also present an algorithm which, given an SN with queues of capacity C, salvages a maximum number of fault-free PEs; the running time is a low-degree polynomial in N even when C is as large as log(N/log N).
Preview
Unable to display preview. Download preview PDF.
References
A. Agrawal (1990): Fault-tolerant computing on trees. Typescript, Brown Univ.
F.S. Annexstein (1989): Fault tolerance in hypercube-derivative networks. 1st ACM Symp. on Parallel Algorithms and Architectures, 179–188.
J.L. Bentley and H.T. Kung (1979): A tree machine for searching problems. Intl. Conf. on Parallel Processing, 257–266.
S. Browning (1980): The Tree Machine: A Highly Concurrent Computing Environment. Ph.D. Thesis, CalTech.
R.D. Chamberlain (1990): Multiprocessor synchronization network: design description. Tech. Rpt. WUCCRC-90-12, Washington Univ.
R.D. Chamberlain (1991): Matrix multiplication on a hypercube architecture augmented with a synchronization network. Typescript, Washington Univ.
J. Hastad, F.T. Leighton, M. Newman (1989): Fast computation using faulty hypercubes. 21st ACM Symp. on Theory of Computing, 251–263.
J.-W. Hong, K. Mehlhorn, A.L. Rosenberg (1983): Cost tradeoffs in graph embeddings. J. ACM 30, 709–728.
C. Kaklamanis, A.R. Karlin, F.T. Leighton, V. Milenkovic, P. Raghavan, S. Rao, C. Thomborson, A. Tsantilas (1990): Asymptotically tight bounds for computing with faulty arrays of processors. 31st IEEE Symp. on Foundations of Computer Science, 285–296.
C.E. Leiserson (1979): Systolic priority queues. 1979 CalTech Conf. on VLSI.
C. Mead and L. Conway (1980): Introduction to VLSI Systems. Addison-Wesley.
P. Raghavan (1989): Robust algorithms for packet routing in a mesh. 1st ACM Symp. on Parallel Algorithms and Architectures, 344–350.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1992 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhatt, S.N., Chung, F.R.K., Leighton, F.T., Rosenberg, A.L. (1992). Tolerating faults in synchronization networks. In: Bougé, L., Cosnard, M., Robert, Y., Trystram, D. (eds) Parallel Processing: CONPAR 92—VAPP V. VAPP CONPAR 1992 1992. Lecture Notes in Computer Science, vol 634. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-55895-0_391
Download citation
DOI: https://doi.org/10.1007/3-540-55895-0_391
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-55895-8
Online ISBN: 978-3-540-47306-0
eBook Packages: Springer Book Archive