Memory Bandwidth Conservation for SpMV Kernels Through Adaptive Lossy Data Compression

Hu, Siyi; Ito, Makiko; Yoshikawa, Takahide; He, Yuan; Kondo, Masaaki

doi:10.1007/978-3-031-29927-8_36

Siyi Hu¹³,
Makiko Ito¹⁴,
Takahide Yoshikawa¹⁴,
Yuan He¹⁵ &
…
Masaaki Kondo^15,16

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13798))

Included in the following conference series:

International Conference on Parallel and Distributed Computing: Applications and Technologies

485 Accesses

Abstract

SpMV is a very common algorithm in linear algebra, which is widely adopted by machine learning applications nowadays. Especially, fully-connected MLP layers dominate many SpMV tasks that play a critical role in diverse services, and therefore a large fraction of data center cycles are spent. Despite exploiting sparse matrix storage techniques such as CSR/CSC, SpMV still suffers from limited memory bandwidth during data transferring because of the architecture of modern computing systems. However, we find that both integer type and floating-point type data used in matrix-vector multiplications are handled plainly without any necessary pre-processing. We added compression and decompression pre-processing between the main memory and Last Level Cache (LLC) which may dramatically reduce the memory bandwidth consumption. Furthermore, we also observed that convergence speed in some typical scientific computation benchmarks will not be degraded when adopting compressed floating-point data instead of the original double type. Based on these discoveries, in this paper, we propose a simple yet effective compression approach that can be implemented in general computing architectures and HPC systems preferably. When adopting this technique, a performance improvement of 1.92x is made in the best case.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

HPCG Ranking (2021). https://www.top500.org/lists/hpcg/2021/06/
Kourtis, K., Karakasis, V., Goumas, G., Koziris, N.: CSX: an extended compression format for SpMV on shared memory systems. SIGPLAN Not. 46, 8 (2011)
Article Google Scholar
Ahmad, K., Sundar, H., Hall, M.: Data-driven mixed precision sparse matrix vector multiplication for GPUs. ACM Trans. TACO 16(4), 1–24 (2019)
Google Scholar
Sakamoto, R., Kondo, M., Fujita, K., Ichimura, T., Nakajima, K.: The effectiveness of low-precision floating arithmetic on numerical codes: a case study on power consumption. In: Proceedings HPCAsia2020, pp. 199–206 (2020)
Google Scholar
FUJITSU Processor A64FX Datasheet. https://www.fujitsu.com/downloads/SUPER/a64fx/a64fx_datasheet_en.pdf
Vazquez, F., Ortega, G., Fernandez, J.J., Garzon, E.M.: Improving the performance of the sparse matrix vector product with GPUs. In: Proceedings of the 10th IEEE ICCIT, ser. CIT, pp. 1146–1151 (2010)
Google Scholar
Tang, W.T., et al.: Accelerating sparse matrix-vector multiplication on GPUs using bit-representation optimized schemes. In: Proceedings of the ICHPC (2013)
Google Scholar
Yang, W., Li, K., Mo, Z., Li, K.: Performance optimization using partitioned SpMV on GPUs and multicore CPUs. IEEE Trans. Comput. 64(9), 2623–2636 (2015)
Article MathSciNet MATH Google Scholar
Ashari, A., Sedaghati, N., Eisenlohr, J., Sadayappan, P.: An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUs. In: Proceedings of the ICS 2014, pp. 273–282 (2014)
Google Scholar
Grigoras, P., Burovskiy, P., Hung, E., Luk, W.: Accelerating SpMV on FPGAs by compressing nonzero values. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines, pp. 64–67 (2015)
Google Scholar
Liu, W., Vinter, B.: CSR5: an efficient storage format for cross-platform sparse matrix-vector multiplication. In: Proceedings of the ICS 2015, pp. 339–350 (2015)
Google Scholar
Bian, B., Huang, J., Dong, R., Liu, L., Wang, X.: CSR2: a new format for SIMD-accelerated SpMV. In: CCGRID, pp. 350–359 (2020)
Google Scholar
Dongarra, J., Heroux, M.A., Luszczek, P.: HPCG Benchmark: a new metric for ranking high performance computing systems. Knoxville, Tennessee (2015)
Google Scholar
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), Article no. 1 (2011)
Google Scholar

Download references

Acknowledgment

First and foremost, we would like to sincerely thank the anonymous reviewers for their valuable comments. This work was supported, in part, by JST CREST Grant Number JPMJCR18K1, Japan.

Author information

Authors and Affiliations

The University of Tokyo, Tokyo, Japan
Siyi Hu
Fujitsu Limited, Tokyo, Japan
Makiko Ito & Takahide Yoshikawa
Keio University, Yokohama, Japan
Yuan He & Masaaki Kondo
RIKEN Center for Computational Science, Kobe, Japan
Masaaki Kondo

Authors

Siyi Hu
View author publications
You can also search for this author in PubMed Google Scholar
Makiko Ito
View author publications
You can also search for this author in PubMed Google Scholar
Takahide Yoshikawa
View author publications
You can also search for this author in PubMed Google Scholar
Yuan He
View author publications
You can also search for this author in PubMed Google Scholar
Masaaki Kondo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Siyi Hu .

Editor information

Editors and Affiliations

Tohoku University, Aoba-ku, Japan
Hiroyuki Takizawa
Sun Yat-sen University, Guangzhou, China
Hong Shen
The University of Tokyo, Tokyo, Japan
Toshihiro Hanawa
Seoul National University of Science and Technology, Seoul, Korea (Republic of)
Jong Hyuk Park
Griffith University, Queensland, QLD, Australia
Hui Tian
Tokyo Denki University, Tokyo, Japan
Ryusuke Egawa

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, S., Ito, M., Yoshikawa, T., He, Y., Kondo, M. (2023). Memory Bandwidth Conservation for SpMV Kernels Through Adaptive Lossy Data Compression. In: Takizawa, H., Shen, H., Hanawa, T., Hyuk Park, J., Tian, H., Egawa, R. (eds) Parallel and Distributed Computing, Applications and Technologies. PDCAT 2022. Lecture Notes in Computer Science, vol 13798. Springer, Cham. https://doi.org/10.1007/978-3-031-29927-8_36

Download citation

DOI: https://doi.org/10.1007/978-3-031-29927-8_36
Published: 08 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-29926-1
Online ISBN: 978-3-031-29927-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Memory Bandwidth Conservation for SpMV Kernels Through Adaptive Lossy Data Compression