Abstract
Fat-tree interconnection network is one of the most popular and widely-used networks in massively parallel processing systems. Its superb characteristics such as deterministic routing, in-order delivery and providing the same performance as adaptive routing methods have made it an attractive interconnection network. However, due to its deterministic routing as well as simultaneously usage of switch links, Head of Line (HoL)-blocking may occur in buffers during high traffic workload. In order to mitigate this problem, in this paper, a novel strategy in switch buffers based on the blocking of the paths is proposed. It has been shown that combining packets with different blocked paths can reduce congestion by packet exchanging. To exchange packets, we used short and medium depth buffers and also considered two exchanging states; consecutive and non-consecutive. This novel strategy provides a trade-off between the performance improvement and reduction of buffers’ depths, while doesn’t change the delivery order of the packets. Simulation results show that in comparison with one buffer in each switch, 22% and 33% average network latency is improved with consecutive and one unit non-consecutive exchanging states respectively and also the depth of buffers in each switch is reduced 43.75% and 37.5% in comparison with multiple buffers.
Similar content being viewed by others
References
Chakaravarthy, V.T., Checconi, F., Murali, P., Petrini, F., Sabharwal, Y.: Scalable single source shortest path algorithms for massively parallel systems. IEEE Trans. Parallel Distrib. Syst. 28(7), 2031–2045 (2017)
Alsmadi, I., Khreishah, A., Xu, D.: Network slicing to improve multicasting in HPC clusters. Clust. Comput. 21(3), 1493–1506 (2018)
Shet, A.G., Sadayappan, P., Bernholdt, D.E., Nieplocha, J., Tipparaju, V.: A framework for characterizing overlap of communication and computation in parallel applications. Clust. Comput. 11(1), 75–90 (2008)
Mahapatra, S., Yuan, X., Nienaber, W.: Limited multi-path routing on extended generalized fat-trees. IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 938–945 (2012)
Petrini, F., Vanneschi, M.: Performance analysis of wormhole routed k-ary n-trees. Int. J. Found. Comput. Sci. 9(02), 157–177 (1998)
Mahanta, H.J., Biswas, A., Hussain, A.: An architecture based routing for heterogeneous fat tree network on chip” IEEE International Symposium on Advanced Computing and Communication (ISACC), pp. 341–345 (2015)
Lee, J.H., Kim, M.S., Han, T.H.: Insertion loss-aware routing analysis and optimization for a fat-tree-based optical network-on-chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(3), 559–572 (2018)
Wang, Z., Xu, J., Wu, X., Ye, Y., Zhang, W., Nikdast, M., Wang, X., Wang, Z.: Floorplan optimization of fat-tree-based networks-on-chip for chip multiprocessors. IEEE Trans. Comput. 63(6), 1446–1459 (2014)
Chueh, H.S., Lien, C.M., Chang, C.S., Cheng, J., Lee, D.S.: Load-balanced Birkhoff-von Neumann switches and fat-tree networks. IEEE 14th International Conference on High Performance Switching and Routing (HPSR), pp. 142–147 (2013)
Leiserson, C.E.: Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput. 100(10), 892–901 (1985)
Hoefler, T., Schneider, T., Lumsdaine, A.: Multistage switches are not crossbars: Effects of static routing in high-performance networks. IEEE International Conference on Cluster Computing, pp. 116–125 (2008)
Prisacari, B., Rodriguez, G., Minkenberg, C., Hoefler, T.: Bandwidth-optimal all-to-all exchanges in fat tree networks. Proceedings of the 27th international ACM conference on International conference on supercomputing, pp. 139–148 (2013)
Alonso, M., Coll, S., Martínez, J.M., Santonja, V., López, P.: Power consumption management in fat-tree interconnection networks. Parallel Comput. 48, 59–80 (2015)
He, Y., Kondo, M.: Opportunistic circuit-switching for energy efficient on-chip networks. IFIP/IEEE International conference on very large scale integration (VLSI-SoC), pp. 1–6 (2016)
Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center network architecture. ACM SIGCOMM Comput. Commun. Rev. 38, 63–74 (2008)
Niranjan Mysore, R., Pamboris, A., Farrington, N., Huang, N., Miri, P., Radhakrishnan, S., Subramanya, V., Vahdat, A.: Portland: a scalable fault-tolerant layer 2 data center network fabric. ACM SIGCOMM Comput. Commun. Rev. 39, 39–50 (2009)
Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, A., Bannon, R., Boving, S., Desai, G., Felderman, B., Germano, P., Kanagala, A.: Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network. ACM SIGCOMM Comput. Commun. Rev. 45, 183–197 (2015)
Bogdanski, B., Johnsen, B.D., Reinemo, S.A.: Multi-homed fat-tree routing with InfiniBand. IEEE Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 122–129 (2014)
Yan, F., Gao, J.: Reliable NoC design with low latency and power consumption. Electron. Lett. 53(6), 382–383 (2017)
Karol, M., Hluchyj, M., Morgan, S.: Input versus output queuing on a space-division packet switch. IEEE Trans. Commun. 35(12), 1347–1356 (1987)
Li, C., Dong, D., Liao, X., Wu, J., Lei, F.: RoB-router: low latency network-on-chip router microarchitecture using reorder buffer. 24th IEEE Annual Symposium on High-Performance Interconnects, pp. 68–75 (2016)
Anderson, T.E., Owicki, S.S., Saxe, J.B., Thacker, C.P.: High-speed switch scheduling for local-area networks. ACM Trans. Comput. Syst. 11(4), 319–352 (1993)
Farouk, A., El-Boghdadi, H.M.: A methodology for easing the congestion in fat-trees using traffic pattern detection. In: IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, pp. 705–712 (2012)
Guay, W.L., Reinemo, S.A., Lysne, O., Skeie, T.: dFtree: a fat-tree routing algorithm using dynamic allocation of virtual lanes to alleviate congestion in infiniband networks. Proceedings of the first international workshop on Network-aware data management, pp. 1–10 (2011)
Peñaranda, R., Gómez, C., Gómez, M.E., López, P., Duato, J.: Deterministic routing with HoL-blocking-awareness for direct topologies. Procedia Comput. Sci. 18, 2521–2524 (2013)
Gómez, C., Gilabert, F., Gómez, M.E., López, P., Duato, J.: A HoL-blocking aware mechanism for selecting the upward path in fat-tree topologies. J. Supercomput. 71(7), 2339–2364 (2015)
Samman, F.A., Hollstein, T., Glesner, M.: Runtime contention and bandwidth-aware adaptive routing selection strategies for networks-on-chip. IEEE Trans. Parallel Distrib. Syst. 24(7), 1411–1421 (2013)
Huang, A.: Starlite: a wideband digital switch’ Proceeding of Globecom’84, pp. 3–5 (1984)
Escamilla, J.V., Flich, J., Garcia, P.J.: Head-of-Line Blocking Avoidance in Networks-on-Chip. IEEE 27th International parallel and distributed processing symposium workshops & PhD forum (IPDPSW), pp. 796–805 (2013)
Bistouni, F., Jahanshahi, M.: Scalable crossbar network: a non-blocking interconnection network for large-scale systems. J Supercomput. 71(2), 697–728 (2015)
Karthikeyan, A., Kumar, P.S.: Randomly prioritized buffer-less routing architecture for 3D network on chip. Comput. Electr. Eng. 59, 39–50 (2017)
Tamir, Y., Frazier, G.L.: Dynamically-allocated multi-queue buffers for VLSI communication switches. IEEE Trans. Comput. 41(6), 725–737 (1992)
Nachiondo, T., Flich, J., Duato, J.: Buffer management strategies to reduce hol blocking. IEEE Trans. Parallel Distrib. Syst. 21(6), 739–753 (2010)
Escudero-Sahuquillo, J., Garcia, P.J., Quiles, F.J., Duato, J.: An efficient strategy for reducing head-of-line blocking in fat-trees. European conference on parallel processing, pp. 413–427 (2010)
Ofori-Attah, E., Agyeman, M.O.: A survey of recent contributions on low power NoC architectures. IEEE computing conference, pp. 1086–1090 (2017)
Su, N., Gu, H., Wang, K., Yu, X., Zhang, B.: A highly efficient dynamic router for application-oriented network on chip. J. Supercomput. 74(7), 2905–2915 (2018)
Liu, Y., Jin, J., Lai, Z.: A dynamic adaptive arbiter for Network-on-Chip. MIDEM J. Microelectron. Electron. Compon. Mater. 43(2), 111–118 (2013)
Gomez, C., Gilabert, F., Gomez, M.E., López, P., Duato, J.: Deterministic versus adaptive routing in fat-trees. IEEE international parallel and distributed processing symposium, pp. 1–8 (2007)
Widjaja, I., Walid, A., Luo, Y., Xu, Y., Chao, H.J.: Small versus large: switch sizing in topology design of energy-efficient data centers. IEEE/ACM 21st International Symposium on Quality of Service (IWQoS), pp. 1–6 (2013)
Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center architecture. Proceedings Of SIGCOMM (2008)
Villar, J.A., Andujar, F.J., Alfaro, F.J., Duato, J.: C-switches: increasing switch radix with current integration scale. IEEE 13th International conference on high performance computing and communications (HPCC), pp. 40–49 (2011)
Villar, J.A., AndúJar, F.J., SáNchez, J.L., Alfaro, F.J., GáMez, J.A., Duato, J.: Obtaining the optimal configuration of high-radix combined switches. J. Parallel Distrib. Comput. 73(9), 1239–1250 (2013)
Kim, J., Dally, W.J., Towles, B., Gupta, A.K.: Microarchitecture of a high radix router. IEEE 32nd International Symposium on Computer Architecture ISCA’05, pp. 420–431 (2005)
Bahn, J.H., Bagherzadeh, N.: A generic traffic model for on-chip interconnection networks. Network on Chip Architectures, pp. 22–28 (2008)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mohtavipour, S.M., Mollajafari, M. & Naseri, A. A novel packet exchanging strategy for preventing HoL-blocking in fat-trees. Cluster Comput 23, 461–482 (2020). https://doi.org/10.1007/s10586-019-02940-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-019-02940-2