Learning Neural Representations for Predicting GPU Performance

Salaria, Shweta; Drozd, Aleksandr; Podobas, Artur; Matsuoka, Satoshi

doi:10.1007/978-3-030-20656-7_3

Shweta Salaria^18,19,
Aleksandr Drozd^18,19,
Artur Podobas²⁰ &
…
Satoshi Matsuoka^18,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11501))

Included in the following conference series:

International Conference on High Performance Computing

1197 Accesses
1 Citations

Abstract

The graphic processing units (GPUs) have become a primary source of heterogeneity in today’s computing systems. With the rapid increase in number and types of GPUs available, finding the best hardware accelerator for each application is a challenge. For that matter, it is time consuming and tedious to execute every application on every GPU system to learn the correlation between application properties and hardware characteristics. To address this problem, we extend our previously proposed collaborating filtering based modeling technique, to build an analytical model which can predict performance of applications across different GPU systems. Our model learns representations, or embeddings (dense vectors of latent features) for applications and systems and uses them to characterize the performance of various GPU-accelerated applications. We improve state-of-the-art collaborative filtering approach based on matrix factorization by building a multi-layer perceptron. In addition to increased accuracy in predicting application performance, we can use this model to simultaneously predict multiple metrics such as rates of memory access operations. We evaluate our approach on a set of 30 well-known micro-applications and seven Nvidia GPUs. As a result, we can predict expected instructions per second value with 90.6% accuracy in average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We tried adding Nvidia RTX 2070 and RTX 2080Ti GPUs from Turing micro-architecture in our study however we faced two issues: (1) nvprof profiling is not supported on these devices and a new profiling tool, Nsight Compute is recently introduced. However, some nvprof metrics (such as global load and store transactions) can’t be recorded using Nsight Compute when SM < 7.0. (2) Also, Nsight Compute records global load transactions in sector while nvprof records the same performance metric in bytes.

References

Almazro, D., Shahatah, G., Albdulkarim, L., Kherees, M., Martinez, R., Nzoukou, W.: A survey paper on recommender systems. CoRR abs/1006.5278 (2010)
Google Scholar
Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Huw, W.M.: An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 105–114 (2010)
Google Scholar
Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174, April 2009. https://doi.org/10.1109/ISPASS.2009.4919648
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010, pp. 177–186 (2010)
Chapter Google Scholar
Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE) - arguments against avoiding RMSE in the literature. Geosco. Model Dev. 7, 1247–1250 (2014)
Article Google Scholar
Che, S., et al.: Rodinia: a benchmark suite for hetrogenous computing. In: International Symposium on Workload Characterization (IISWC) (2009)
Google Scholar
NVIDIA Corporation. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html
Dean, J., Patterson, D., Young, C.: A new golden age in computer architecture: empowering the machine-learning revolution. IEEE Micro 38(2), 21–29 (2018)
Article Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural network. In: Proceedings of the Fourteenth International Conference on Artifical Intelligence and Statistics. PMLR 15, pp. 315–323 (2011)
Google Scholar
Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, November 2006 (2006)
Google Scholar
Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar) (2012)
Google Scholar
Hong, S., Kim, H.: An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA 2010, pp. 280–289 (2010)
Google Scholar
Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR abs/1611.05397 (2016)
Google Scholar
Kerr, A., Anger, E., Hendry, G., Yalamanchili, S.: Eiger: a framework for the automated synthesis of statistical performance models. In: 2012 19th International Conference on High Performance Computing, pp. 1–6 (2012)
Google Scholar
Liu, W., Schmidt, B.: Performance predictions for general-purpose computation on GPUs. In: Proceedings of 2007 International Conference on Parallel Processing, ICPP (2017)
Google Scholar
Luo, C., Suda, R.: A performance and energy consumption analytical model for GPU. In: 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, pp. 658–665 (2011)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26. Curran Associates, Inc. (2013)
Google Scholar
Mirowski, P.W., et al.: Learning to navigate in complex environments. CoRR abs/1611.03673 (2016)
Google Scholar
Nvidia Turing GPU Architecture. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf
NVProf. https://docs.nvidia.com/cuda/profiler-users-guide/index.html
The OpenCL Specification. https://www.khronos.org/opencl/
Salaria, S., Drozd, A., Podobas, A., Matsuoka, S.: Predicting performance using collaborative filtering. In: Proceedings of the 2018 IEEE International Conference on Cluster Computing, pp. 504–514. CLUSTER (2018)
Google Scholar
Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems in NIPS (2010)
Google Scholar
Top500. https://www.top500.org
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Article Google Scholar
Wu, G., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 564–576, February 2015
Google Scholar
Xhang, Y., Owens, J.D.: A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture, HPCA 2011 (2011)
Google Scholar
Yuting, Z., Kibok, L., Honglak, L.: Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 612–621. JMLR.org (2016)
Google Scholar

Download references

Acknowledgment

This work was partially supported by JST CREST Grant Numbers JPMJCR1303, JPMJCR1687 and JSPS KAKENHI Grant Number JP16F16764.

Author information

Authors and Affiliations

Tokyo Institute of Technology, Tokyo, Japan
Shweta Salaria, Aleksandr Drozd & Satoshi Matsuoka
AIST-Tokyo Tech RWBC-OIL, Tokyo, Japan
Shweta Salaria & Aleksandr Drozd
RIKEN Center for Computational Science, Kobe, Japan
Artur Podobas & Satoshi Matsuoka

Authors

Shweta Salaria
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandr Drozd
View author publications
You can also search for this author in PubMed Google Scholar
Artur Podobas
View author publications
You can also search for this author in PubMed Google Scholar
Satoshi Matsuoka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shweta Salaria .

Editor information

Editors and Affiliations

University of Edinburgh, Edinburgh, UK
Michèle Weiland
Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Dresden, Germany
Guido Juckeland
Technical University of Munich, Munich, Germany
Carsten Trinitis
Ohio State University, Columbus, USA
Ponnuswamy Sadayappan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Salaria, S., Drozd, A., Podobas, A., Matsuoka, S. (2019). Learning Neural Representations for Predicting GPU Performance. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11501. Springer, Cham. https://doi.org/10.1007/978-3-030-20656-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-20656-7_3
Published: 17 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20655-0
Online ISBN: 978-3-030-20656-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics