Performance Optimization Strategies of High Performance Computing on GPU

Ma, Anguo; Cai, Jing; Cheng, Yu; Ni, Xiaoqiang; Tang, Yuxing; Xing, Zuocheng

doi:10.1007/978-3-642-03644-6_12

Anguo Ma¹⁹,
Jing Cai¹⁹,
Yu Cheng¹⁹,
Xiaoqiang Ni¹⁹,
Yuxing Tang¹⁹ &
…
Zuocheng Xing¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

758 Accesses
1 Citations

Abstract

Recently GPU is widely utilized in scientific computing and engineering applications, owing primarily to the evolution of GPU architecture. Firstly, we analyze some key performance characters of GPU in detail, and the relationships among GPU architecture, programming model and memory hierarchy. Secondly, we present three performance optimization strategies: Prefetching, Streamlizing, and Task Division. Adequate experiments have been done to abstract the relationships among different factors and efficiency. Finally, we map the HPL benchmark to testify our strategies and achieve certain speedup.

This work is supported by National High Technology Development 863 Program of China under Grant No.2009AA01Z102 and National Natural Science Foundation of China under Grant No.60873016.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ghuloum, A., Sprangle, E., Fang, J., Wu, G., Zhou, X.: Ct: A Flexible Parallel Programming Model for Tera-scale Architectures. Technical report, Intel Research (2007)
Google Scholar
Gutowitz. H.: A tutorial introduction to Swarm. Technical report, The Santa Fe Institute (1993)
Google Scholar
Monteyne, M.: RapidMind: Multi-Core Develpment Platform, RapidMind Official Page (2007), http://www.rapidmind.net/
Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK Benchmark: Past, Present, and Future. Concurrency and Computation: Practice and Experience 15, 803–820 (2003)
Article Google Scholar
http://www.netlib.org/benchmark/hpl/index.html
Halfhill, T.R.: Parallel Processing With CUDA. Microprocessor Report (January 2008)
Google Scholar
Stone, J.: Accelerating Computational Biology by 100x with CUDA. In: NVISION (2008) (presentation)
Google Scholar
Hartley, T.D.R., Catalyurek, U., Ruiz, A., Igual, F., Mayo, R., Ujaldon, M.: Biomedical image analysis on a cooperative cluster of gpus and multicores. In: ICS 2008: Proceedings of the 22nd annual international conference on Supercomputing, pp. 15–25. ACM, New York (2008)
Google Scholar
Bond, A.: Havok FX: GPU-accelerated physics for PC games. In: Proceedings of Game Developers Conference 2006 (2006)
Google Scholar
Hagen, T.R., Lle, K.-A., Natvig, J.R.: Solving the Euler equations on graphics processing units. In: Alexandrov, V.N., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3994, pp. 220–227. Springer, Heidelberg (2006)
Chapter Google Scholar
Zeller, C.: Cloth simulation on the GPU. In: ACM SIGGRAPH 2005 Conference Abstracts and Applications (2005)
Google Scholar
Elsen, E., Houston, M., Vishal, V., Darve, E., Hanrahan, P., Pande, V.S.: N-Body simulation on GPUs. In: Proc. 2006 ACM/IEEE Conf. on Supercomputing, p. 188 (2006)
Google Scholar
Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E., Chipot, C., Skeel, R.D., Kale, L., Schulten, K.: Scalable molecular dynamics with NAMD. J. Comp. Chem. 26, 1781–1802 (2005)
Article Google Scholar
Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J., Trabuco, L.G., Schulten, K.: Accelerating molecular modeling applications with graphics processors. J. Comp. Chem. 28, 2618–2640 (2007)
Article Google Scholar
Stone, S.S., Haldar, J.P., Tsao, S.C., Hwu, W.W., Liang, Z., Sutton, B.P.: Accelerating advanced MRI reconstructions on GPUs. In: ACM Computing Frontier Conference (2008)
Google Scholar
openVIDIA, http://openvidia.sourceforge.net/
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: SC 2008: Proceedings of the 2008 ACM/IEEE conference on Super-computing, pp. 1–11. IEEE Press, Los Alamitos (2008)
Google Scholar
Fatica, M.: Accelerating Linpack with CUDA on heterogenous clusters. In: GPGPU 2009. ACM, New york (2009)
Google Scholar
Castillo, M., Chan, E., Igual, F.D., Mayo, R., Quintanaorti, E.S., Quintana-orti, G., Van De Geijn, R., Van Zee, F.G.: Making Programming Synonymous with Programming for Linear Algebra Libraries, FLAME Working Note #31. The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-08-20 (April 17, 2008)
Google Scholar
Quintana-Orti, G., Igual, F.D., Quintana-Orti, E.S., van de Geijn, R.: Solving Dense Linear Systems on Platforms with Multiple Hardware Accelerators. In: PPoPP, pp. 121–129 (2009)
Google Scholar
decuda, http://www.cs.rug.nl/~wladimir/decuda/

Download references

Author information

Authors and Affiliations

National Laboratory for Parallel and Distributed Processing, School of Computer, National University of Defense Technology, ChangSha, China
Anguo Ma, Jing Cai, Yu Cheng, Xiaoqiang Ni, Yuxing Tang & Zuocheng Xing

Authors

Anguo Ma
View author publications
You can also search for this author in PubMed Google Scholar
Jing Cai
View author publications
You can also search for this author in PubMed Google Scholar
Yu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqiang Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yuxing Tang
View author publications
You can also search for this author in PubMed Google Scholar
Zuocheng Xing
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Defense Technology, Department of Computer Science, 410073, Changsha, P.R. China
Yong Dou
Lausanne (EPFL), Ecole Polytechnique Fédérale de ,Dépt. Physique, 1015, LAUSANNE, Switzerland
Ralf Gruber
Technik Rapperswil, HSR - Hochschule für, Oberseestr. 10, 8640, RAPPERSWIL , SCHWEIZ
Josef M. Joller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, A., Cai, J., Cheng, Y., Ni, X., Tang, Y., Xing, Z. (2009). Performance Optimization Strategies of High Performance Computing on GPU. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-03644-6_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics