Abstract
A streaming floating-point sparse-matrix compression which forms a key element of an accelerator for finite-element and other linear algebra applications is described. The proposed architecture seeks to accelerate the key performance-limiting Sparse Matrix-Vector Multiplication (SMVM) operation at the heart of finite-element applications through a combination of a dedicated datapath optimized for these applications with a streaming data-compression and decompression unit which increases the effective memory bandwidth seen by the datapath. The proposed format uses variable length entries which contain an opcode and optionally an address and/or non-zero entry. System simulations performed using a cycle-accurate C++ architectural model and a database of over 400 large symmetric and unsymmetric matrices containing up to 20M non-zero elements (and a total of 226M non-zeroes) demonstrate that a 20% average effective memory bandwidth performance improvement can be achieved using the proposed architecture compared with published work, for a modest increase in hardware resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barrett, R., Berry, M., Chan, T.F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., Van der Vorst, H.: Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd edn. SIAM, Philadelphia (1994)
IEEE Standards Board. IEEE Standard for Binary Floating-Point Arithmetic, Technical Report ANSI/IEEE Std. 754-1985, IEEE, New York (1985)
Siek, J., Lumsdaine, A.: The Matrix Template Library: Generic Components for High-Performance Scientific Computing. IEEE Journal of Computing in Science & Engineering, 70–78 (November-December 1999)
Gee, K.R.: Using latent semantic indexing to filter spam. In: SAC 2003: Proceedings of the 2003 ACM symposium on Applied computing, pp. 460–464 (2003)
Muller, N., Magaia, L., Herbst, B.M.: Singular Value Decomposition, Eigenfaces, and 3D Reconstructions. SIAM Review 46(3), 518–545
Anderson, W.K., Gropp, W.D., Kaushik, D.K., Keyes, D.E., Smith, B.F.: Achieving high sustained performance in an unstructured mesh CFD application. In: Proceedings of the 1999 ACM/IEEE conference on Supercomputing, No. 69, Portland, Oregon, United States (1999)
Jacob, B.: A case for studying DRAM issues at the system level. IEEE Micro 23(4), 44–56 (2003)
Duff, I.S., Erisman, A.M., Reid, J.K.: Direct Methods for Sparse Matrices. Oxford University Press, London (1986)
Koster, J.: Parallel templates for numerical linear algebra, a high-performance computation library, MSc. Thesis, Dept. of Mathematics, Utrecht University (July 2002)
Bell, T., McKenzie, B.: Compression of sparse matrices by arithmetic coding. In: Data Compression Conference, 1998. DCC 1998. Proceedings, 30 March-1 April, pp. 23–32 (1998)
Isenburg, M., Lindstrom, P., Snoeyink, J.: Lossless Compression of Floating-Point Geometry. In: Proceedings of CAD 3D (May 2004)
Richardson, S.: Caching Function Results: Faster Arithmetic by Avoiding Unnecessary Computation. In: International Symposium on Computer Arithmetic (1993)
Wofle, A., Breternitz, M., Stephens, C., Ting, A.L., Kirk, D.B., Bianchini, R.P., Shen, J.P.: The White Dwarf: A High-Performance Application-Specific Processor
Taylor, V.E., Ranade, A., Messerschmitt, D.G.: SPAR: a new architecture for large finite element computations. IEEE Trans. on Computers 44(4), 531–545 (1995)
Stathis, P.T., Vassiliadis, S., Cotofana, S.D.: A Hierarchical Sparse Matrix Storage Format for Vector Processors. In: Proceedings of IPDPS 2003, Nice, France, April 2003, p. 61a(2003)
Stathis, P.T., Vassiliadis, S., Cotofana, S.D.: D-SAB: A Sparse Matrix Benchmark Suite. In: Proceedings of 7th International Conference on Parallel Computing Technologies (PaCT 2003), Nizhni Novgorod, Russia, September 2003, pp. 549–554 (2003)
Schmookler, M., Nowka, K.: Leading-Zero Anticipation and Detection – A Comparison of Methods. In: Proceedings of IEEE Symposium on Computer Arithmetic, pp. 7–12 (2001)
Moloney, D., Geraghty, D., Connor, F.: The Performance of IEEE floating-point operators on FPGAs. In: Proc. Irish Signals & Circuits Conf. (ISSC) 2004, pp. 254–259 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Moloney, D., Geraghty, D., McSweeney, C., McElroy, C. (2005). Streaming Sparse Matrix Compression/Decompression. In: Conte, T., Navarro, N., Hwu, Wm.W., Valero, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2005. Lecture Notes in Computer Science, vol 3793. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11587514_9
Download citation
DOI: https://doi.org/10.1007/11587514_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30317-6
Online ISBN: 978-3-540-32272-6
eBook Packages: Computer ScienceComputer Science (R0)