Skip to main content
Log in

A novel packet exchanging strategy for preventing HoL-blocking in fat-trees

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Fat-tree interconnection network is one of the most popular and widely-used networks in massively parallel processing systems. Its superb characteristics such as deterministic routing, in-order delivery and providing the same performance as adaptive routing methods have made it an attractive interconnection network. However, due to its deterministic routing as well as simultaneously usage of switch links, Head of Line (HoL)-blocking may occur in buffers during high traffic workload. In order to mitigate this problem, in this paper, a novel strategy in switch buffers based on the blocking of the paths is proposed. It has been shown that combining packets with different blocked paths can reduce congestion by packet exchanging. To exchange packets, we used short and medium depth buffers and also considered two exchanging states; consecutive and non-consecutive. This novel strategy provides a trade-off between the performance improvement and reduction of buffers’ depths, while doesn’t change the delivery order of the packets. Simulation results show that in comparison with one buffer in each switch, 22% and 33% average network latency is improved with consecutive and one unit non-consecutive exchanging states respectively and also the depth of buffers in each switch is reduced 43.75% and 37.5% in comparison with multiple buffers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  1. Chakaravarthy, V.T., Checconi, F., Murali, P., Petrini, F., Sabharwal, Y.: Scalable single source shortest path algorithms for massively parallel systems. IEEE Trans. Parallel Distrib. Syst. 28(7), 2031–2045 (2017)

    Article  Google Scholar 

  2. Alsmadi, I., Khreishah, A., Xu, D.: Network slicing to improve multicasting in HPC clusters. Clust. Comput. 21(3), 1493–1506 (2018)

    Article  Google Scholar 

  3. Shet, A.G., Sadayappan, P., Bernholdt, D.E., Nieplocha, J., Tipparaju, V.: A framework for characterizing overlap of communication and computation in parallel applications. Clust. Comput. 11(1), 75–90 (2008)

    Article  Google Scholar 

  4. Mahapatra, S., Yuan, X., Nienaber, W.: Limited multi-path routing on extended generalized fat-trees. IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW), pp. 938–945 (2012)

  5. Petrini, F., Vanneschi, M.: Performance analysis of wormhole routed k-ary n-trees. Int. J. Found. Comput. Sci. 9(02), 157–177 (1998)

    Article  Google Scholar 

  6. Mahanta, H.J., Biswas, A., Hussain, A.: An architecture based routing for heterogeneous fat tree network on chip” IEEE International Symposium on Advanced Computing and Communication (ISACC), pp. 341–345 (2015)

  7. Lee, J.H., Kim, M.S., Han, T.H.: Insertion loss-aware routing analysis and optimization for a fat-tree-based optical network-on-chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(3), 559–572 (2018)

    Article  Google Scholar 

  8. Wang, Z., Xu, J., Wu, X., Ye, Y., Zhang, W., Nikdast, M., Wang, X., Wang, Z.: Floorplan optimization of fat-tree-based networks-on-chip for chip multiprocessors. IEEE Trans. Comput. 63(6), 1446–1459 (2014)

    Article  MathSciNet  Google Scholar 

  9. Chueh, H.S., Lien, C.M., Chang, C.S., Cheng, J., Lee, D.S.: Load-balanced Birkhoff-von Neumann switches and fat-tree networks. IEEE 14th International Conference on High Performance Switching and Routing (HPSR), pp. 142–147 (2013)

  10. Leiserson, C.E.: Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans. Comput. 100(10), 892–901 (1985)

    Article  Google Scholar 

  11. Hoefler, T., Schneider, T., Lumsdaine, A.: Multistage switches are not crossbars: Effects of static routing in high-performance networks. IEEE International Conference on Cluster Computing, pp. 116–125 (2008)

  12. Prisacari, B., Rodriguez, G., Minkenberg, C., Hoefler, T.: Bandwidth-optimal all-to-all exchanges in fat tree networks. Proceedings of the 27th international ACM conference on International conference on supercomputing, pp. 139–148 (2013)

  13. Alonso, M., Coll, S., Martínez, J.M., Santonja, V., López, P.: Power consumption management in fat-tree interconnection networks. Parallel Comput. 48, 59–80 (2015)

    Article  Google Scholar 

  14. He, Y., Kondo, M.: Opportunistic circuit-switching for energy efficient on-chip networks. IFIP/IEEE International conference on very large scale integration (VLSI-SoC), pp. 1–6 (2016)

  15. Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center network architecture. ACM SIGCOMM Comput. Commun. Rev. 38, 63–74 (2008)

    Article  Google Scholar 

  16. Niranjan Mysore, R., Pamboris, A., Farrington, N., Huang, N., Miri, P., Radhakrishnan, S., Subramanya, V., Vahdat, A.: Portland: a scalable fault-tolerant layer 2 data center network fabric. ACM SIGCOMM Comput. Commun. Rev. 39, 39–50 (2009)

    Article  Google Scholar 

  17. Singh, A., Ong, J., Agarwal, A., Anderson, G., Armistead, A., Bannon, R., Boving, S., Desai, G., Felderman, B., Germano, P., Kanagala, A.: Jupiter rising: a decade of clos topologies and centralized control in google’s datacenter network. ACM SIGCOMM Comput. Commun. Rev. 45, 183–197 (2015)

    Article  Google Scholar 

  18. Bogdanski, B., Johnsen, B.D., Reinemo, S.A.: Multi-homed fat-tree routing with InfiniBand. IEEE Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 122–129 (2014)

  19. Yan, F., Gao, J.: Reliable NoC design with low latency and power consumption. Electron. Lett. 53(6), 382–383 (2017)

    Article  Google Scholar 

  20. Karol, M., Hluchyj, M., Morgan, S.: Input versus output queuing on a space-division packet switch. IEEE Trans. Commun. 35(12), 1347–1356 (1987)

    Article  Google Scholar 

  21. Li, C., Dong, D., Liao, X., Wu, J., Lei, F.: RoB-router: low latency network-on-chip router microarchitecture using reorder buffer. 24th IEEE Annual Symposium on High-Performance Interconnects, pp. 68–75 (2016)

  22. Anderson, T.E., Owicki, S.S., Saxe, J.B., Thacker, C.P.: High-speed switch scheduling for local-area networks. ACM Trans. Comput. Syst. 11(4), 319–352 (1993)

    Article  Google Scholar 

  23. Farouk, A., El-Boghdadi, H.M.: A methodology for easing the congestion in fat-trees using traffic pattern detection. In: IEEE 10th International Symposium on Parallel and Distributed Processing with Applications, pp. 705–712 (2012)

  24. Guay, W.L., Reinemo, S.A., Lysne, O., Skeie, T.: dFtree: a fat-tree routing algorithm using dynamic allocation of virtual lanes to alleviate congestion in infiniband networks. Proceedings of the first international workshop on Network-aware data management, pp. 1–10 (2011)

  25. Peñaranda, R., Gómez, C., Gómez, M.E., López, P., Duato, J.: Deterministic routing with HoL-blocking-awareness for direct topologies. Procedia Comput. Sci. 18, 2521–2524 (2013)

    Article  Google Scholar 

  26. Gómez, C., Gilabert, F., Gómez, M.E., López, P., Duato, J.: A HoL-blocking aware mechanism for selecting the upward path in fat-tree topologies. J. Supercomput. 71(7), 2339–2364 (2015)

    Article  Google Scholar 

  27. Samman, F.A., Hollstein, T., Glesner, M.: Runtime contention and bandwidth-aware adaptive routing selection strategies for networks-on-chip. IEEE Trans. Parallel Distrib. Syst. 24(7), 1411–1421 (2013)

    Article  Google Scholar 

  28. Huang, A.: Starlite: a wideband digital switch’ Proceeding of Globecom’84, pp. 3–5 (1984)

  29. Escamilla, J.V., Flich, J., Garcia, P.J.: Head-of-Line Blocking Avoidance in Networks-on-Chip. IEEE 27th International parallel and distributed processing symposium workshops & PhD forum (IPDPSW), pp. 796–805 (2013)

  30. Bistouni, F., Jahanshahi, M.: Scalable crossbar network: a non-blocking interconnection network for large-scale systems. J Supercomput. 71(2), 697–728 (2015)

    Article  Google Scholar 

  31. Karthikeyan, A., Kumar, P.S.: Randomly prioritized buffer-less routing architecture for 3D network on chip. Comput. Electr. Eng. 59, 39–50 (2017)

    Article  Google Scholar 

  32. Tamir, Y., Frazier, G.L.: Dynamically-allocated multi-queue buffers for VLSI communication switches. IEEE Trans. Comput. 41(6), 725–737 (1992)

    Article  Google Scholar 

  33. Nachiondo, T., Flich, J., Duato, J.: Buffer management strategies to reduce hol blocking. IEEE Trans. Parallel Distrib. Syst. 21(6), 739–753 (2010)

    Article  Google Scholar 

  34. Escudero-Sahuquillo, J., Garcia, P.J., Quiles, F.J., Duato, J.: An efficient strategy for reducing head-of-line blocking in fat-trees. European conference on parallel processing, pp. 413–427 (2010)

  35. Ofori-Attah, E., Agyeman, M.O.: A survey of recent contributions on low power NoC architectures. IEEE computing conference, pp. 1086–1090 (2017)

  36. Su, N., Gu, H., Wang, K., Yu, X., Zhang, B.: A highly efficient dynamic router for application-oriented network on chip. J. Supercomput. 74(7), 2905–2915 (2018)

    Article  Google Scholar 

  37. Liu, Y., Jin, J., Lai, Z.: A dynamic adaptive arbiter for Network-on-Chip. MIDEM J. Microelectron. Electron. Compon. Mater. 43(2), 111–118 (2013)

    Google Scholar 

  38. Gomez, C., Gilabert, F., Gomez, M.E., López, P., Duato, J.: Deterministic versus adaptive routing in fat-trees. IEEE international parallel and distributed processing symposium, pp. 1–8 (2007)

  39. Widjaja, I., Walid, A., Luo, Y., Xu, Y., Chao, H.J.: Small versus large: switch sizing in topology design of energy-efficient data centers. IEEE/ACM 21st International Symposium on Quality of Service (IWQoS), pp. 1–6 (2013)

  40. Al-Fares, M., Loukissas, A., Vahdat, A.: A scalable, commodity data center architecture. Proceedings Of SIGCOMM (2008)

  41. Villar, J.A., Andujar, F.J., Alfaro, F.J., Duato, J.: C-switches: increasing switch radix with current integration scale. IEEE 13th International conference on high performance computing and communications (HPCC), pp. 40–49 (2011)

  42. Villar, J.A., AndúJar, F.J., SáNchez, J.L., Alfaro, F.J., GáMez, J.A., Duato, J.: Obtaining the optimal configuration of high-radix combined switches. J. Parallel Distrib. Comput. 73(9), 1239–1250 (2013)

    Article  Google Scholar 

  43. Kim, J., Dally, W.J., Towles, B., Gupta, A.K.: Microarchitecture of a high radix router. IEEE 32nd International Symposium on Computer Architecture ISCA’05, pp. 420–431 (2005)

  44. Bahn, J.H., Bagherzadeh, N.: A generic traffic model for on-chip interconnection networks. Network on Chip Architectures, pp. 22–28 (2008)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyed Mehdi Mohtavipour.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohtavipour, S.M., Mollajafari, M. & Naseri, A. A novel packet exchanging strategy for preventing HoL-blocking in fat-trees. Cluster Comput 23, 461–482 (2020). https://doi.org/10.1007/s10586-019-02940-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-02940-2

Keywords

Navigation