CSPR: Column Only SPARSE Matrix Representation for Performance Improvement on GPU Architecture

Neelima, B.; Raghavendra, Prakash S.

doi:10.1007/978-3-642-24037-9_58

B. Neelima⁴ &
Prakash S. Raghavendra⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 203))

Included in the following conference series:

International Conference on Parallel Distributed Computing Technologies and Applications

1572 Accesses
2 Citations

Abstract

General purpose computation on graphics processing unit (GPU) is prominent in the high performance computing era of this time. Porting or accelerating the data parallel applications onto GPU gives the default performance improvement because of the increased computational units. Better performances can be seen if application specific fine tuning is done with respect to the architecture under consideration. One such very widely used computation intensive kernel is sparse matrix vector multiplication (SPMV) in sparse matrix based applications. Most of the existing data format representations of sparse matrix are developed with respect to the central processing unit (CPU) or multi cores. This paper gives a new format for sparse matrix representation with respect to graphics processor architecture that can give 2x to 5x performance improvement compared to CSR (compressed row format), 2x to 54x performance improvement with respect to COO (coordinate format) and 3x to 10 x improvement compared to CSR vector format for the class of application that fit for the proposed new format. It also gives 10% to 133% improvements in memory transfer (of only access information of sparse matrix) between CPU and GPU. This paper gives the details of the new format and its requirement with complete experimentation details and results of comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Young, G.O.: Synthetic structure of industrial plastics (Book style with paper title and editor). In: Peters, J. (ed.) Plastics, 2nd edn., vol. 3, pp. 15–64. McGraw-Hill, New York (1964)
Google Scholar
http://www.drdobbs.com/supercomputingforthemasses (July 28, 2010)
http://developer.nvidia.com/ (December 2010)
Ryoo, S., et al.: Optimization Principles and Application Performance Evaluation of a Multithreaded GPU Using CUDA. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programing (PPoPP 2008). ACM, New York (2008)
Google Scholar
Ryoo, S., et al.: Program Optimization Space Pruning for a Multithreaded GPU. In: Proceedings of the 2008 International Symposium on Code Generation and Optimization, pp. 195–204. ACM, New York (2008)
Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. In: Proceedings of ACM/IEEE Conf. Supercomputing (SC 2009). ACM, New York (2009)
Google Scholar
D’Azevedo, E.F., Fahey, M.R., Mills, R.T.: Vectorized sparse matrix multiply for compressed row storage. In: Sunderam, V.S., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2005. LNCS, vol. 3514, pp. 99–106. Springer, Heidelberg (2005)
Chapter Google Scholar
Vuduc, R.W., Moon, H.-J.: Fast sparse matrix-vector multiplication by exploiting variable block structure. In: Yang, L.T., Rana, O.F., Di Martino, B., Dongarra, J. (eds.) HPCC 2005. LNCS, vol. 3726, pp. 807–816. Springer, Heidelberg (2005)
Chapter Google Scholar
Blelloch, G.E., Heroux, M.A., Zagha, M.: Segmented operations for sparse matrix computations on vector multiprocessors. Technical Report, CMU-CS-93-173, Department of Computer Science, Carnegie Mellon University (CMU), Pittsburgh, PA, USA (1993)
Google Scholar
Geus, R., Röllin, S.: Towards a fast parallel sparse matrix-vector multiplication. In: Proceedings of the International Conference on Parallel Computing, ParCo (2001)
Google Scholar
Mellor-Crummey, J., Garvin, J.: Optimizing sparse matrix vector multiply using unroll-and-jam. International Journal of High Perfromance Computing Applications 18(2) (2002)
Google Scholar
Nishtala, R., Vuduc, R., Demmel, J.W., Yelick, K.A.: When cache blocking sparse matrix vector multiply works and why. Journal of Applicable Algebra in Engineering, Communication, and Computing 18(3) (2007)
Google Scholar
Temam, O., Jalby, W.: Characterizing the behavior of sparse algorithms on caches. In: Proceedings of the 1992 ACM/ IEEE Conference on Supercomputing (SC 1992). IEEE Computer Society Press, Los Alamitos (1992)
Google Scholar
Toledo, S.: Improving memory-system performance of sparse matrix-vector multiplication. In: Proceeding of Eighth SIAM Conference on Parallel Processing for Scientific Computing (1997)
Google Scholar
Vastenhouw, B., Bisseling, R.H.: A two-dimensional data distribution method for parallel sparse matrix-vector multiplication. Journal of SIAM Review 47(1), 67–95 (2005)
Article MathSciNet MATH Google Scholar
Im, E.J., Yelick, K., Vuduc, R.: Sparsity: Optimization framework for sparse matrix kernels. International Journal of High Performance Computing Applications 18(1), 135–158 (2004)
Article Google Scholar
Vuduc, R.: Automatic performance tuning of sparse matrix kernels. Doctoral Dissertation, University of California, Berkeley, Berkeley, CA, USA (2003)
Google Scholar
Vuduc, R., James, W.D., Katherine, A.Y.: OSKI: A library of automatically tuned sparse matrix kernels. In: Proceedings of SciDAC. J. Phys.: Conf. Series, vol. 16, pp. 521–530. IOP Science (2005)
Google Scholar
Williams, S., et al.: Scientific computing kernels on the Cell processor. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC 2007). Kluwer Academic Publishers Norwell (2007); International Journal of Parallel Programming 35(3), 263–298 (2007)
Google Scholar
Lee, B.C., Vuduc, R., Demmel, J., Yelick, K.: Performance models for evaluation and automatic tuning of symmetric sparse matrix-vector multiply. In: Proceedings of the International Conference on Parallel Processing (ICPP 2004). IEEE Computer Scoiety, Washington, DC, USA (2004)
Google Scholar
Baskaran, M.M., Bordawekar, R.: Optimizing sparse matrix-vector multiplication on GPUs using compile-time and run-time strategies. Technical Report RC24704 (W0812-047), IBM T.J.Watson Research Center, Yorktown Heights, NY, USA (December 2008)
Google Scholar
Bolz, J., Farmer, I., Grinspun, E., Schröder, P.: Sparse matrix solvers on the GPU: Conjugate gradients and multigrid. In: Proceedings of Special Interest Group on Graphics Conf (SIGGRAPH), San Diego, CA, USA (July 2003)
Google Scholar
Christen, M., Schenk, O.: General-purpose sparse matrix building blocks using the NVIDIA CUDA technology platform. In: Proceedings of Workshop on General-Purpose Processing on Graphics Processing Units, GPGPU (2007)
Google Scholar
Garland, M.: Sparse matrix computations on manycore GPUs. In: Proceeding of ACM/IEEE Design Automation Conf. (DAC), Anaheim, CA, USA, pp. 2–6 (2008)
Google Scholar
Geus, R., Röllin, S.: Towards a fast sparse symmetric matrix-vector multiplication. Journal of Parallel Computing 27(7), 883–896 (2001)
Article MATH Google Scholar
http://www.cise.ufl.edu/research/sparse/matrices/Williams/index.html

Download references

Author information

Authors and Affiliations

Department of Information Technology, NITK Surathkal, Mangalore, Karnataka, India, 575 025
B. Neelima & Prakash S. Raghavendra

Authors

B. Neelima
View author publications
You can also search for this author in PubMed Google Scholar
Prakash S. Raghavendra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wireilla Net Solutions PTY Ltd, Melbourne, Victoria, Australia
Dhinaharan Nagamalai
Institut Telecom/Telecom SudParis, 9, rue Charles Fourier, 91011, Evry Cedex, France
Eric Renault
Manonmaniam Sundaranar University, Thirunelveli, Tamil Nadu, India
Murugan Dhanuskodi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neelima, B., Raghavendra, P.S. (2011). CSPR: Column Only SPARSE Matrix Representation for Performance Improvement on GPU Architecture. In: Nagamalai, D., Renault, E., Dhanuskodi, M. (eds) Advances in Parallel Distributed Computing. PDCTA 2011. Communications in Computer and Information Science, vol 203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24037-9_58

Download citation

DOI: https://doi.org/10.1007/978-3-642-24037-9_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24036-2
Online ISBN: 978-3-642-24037-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics