Skip to main content

Learning Neural Representations for Predicting GPU Performance

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11501))

Included in the following conference series:

Abstract

The graphic processing units (GPUs) have become a primary source of heterogeneity in today’s computing systems. With the rapid increase in number and types of GPUs available, finding the best hardware accelerator for each application is a challenge. For that matter, it is time consuming and tedious to execute every application on every GPU system to learn the correlation between application properties and hardware characteristics. To address this problem, we extend our previously proposed collaborating filtering based modeling technique, to build an analytical model which can predict performance of applications across different GPU systems. Our model learns representations, or embeddings (dense vectors of latent features) for applications and systems and uses them to characterize the performance of various GPU-accelerated applications. We improve state-of-the-art collaborative filtering approach based on matrix factorization by building a multi-layer perceptron. In addition to increased accuracy in predicting application performance, we can use this model to simultaneously predict multiple metrics such as rates of memory access operations. We evaluate our approach on a set of 30 well-known micro-applications and seven Nvidia GPUs. As a result, we can predict expected instructions per second value with 90.6% accuracy in average.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We tried adding Nvidia RTX 2070 and RTX 2080Ti GPUs from Turing micro-architecture in our study however we faced two issues: (1) nvprof profiling is not supported on these devices and a new profiling tool, Nsight Compute is recently introduced. However, some nvprof metrics (such as global load and store transactions) can’t be recorded using Nsight Compute when SM < 7.0. (2) Also, Nsight Compute records global load transactions in sector while nvprof records the same performance metric in bytes.

References

  1. Almazro, D., Shahatah, G., Albdulkarim, L., Kherees, M., Martinez, R., Nzoukou, W.: A survey paper on recommender systems. CoRR abs/1006.5278 (2010)

    Google Scholar 

  2. Baghsorkhi, S.S., Delahaye, M., Patel, S.J., Gropp, W.D., Huw, W.M.: An adaptive performance modeling tool for GPU architectures. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2010, pp. 105–114 (2010)

    Google Scholar 

  3. Bakhoda, A., Yuan, G.L., Fung, W.W.L., Wong, H., Aamodt, T.M.: Analyzing CUDA workloads using a detailed GPU simulator. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 163–174, April 2009. https://doi.org/10.1109/ISPASS.2009.4919648

  4. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT 2010, pp. 177–186 (2010)

    Chapter  Google Scholar 

  5. Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE) - arguments against avoiding RMSE in the literature. Geosco. Model Dev. 7, 1247–1250 (2014)

    Article  Google Scholar 

  6. Che, S., et al.: Rodinia: a benchmark suite for hetrogenous computing. In: International Symposium on Workload Characterization (IISWC) (2009)

    Google Scholar 

  7. NVIDIA Corporation. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

  8. Dean, J., Patterson, D., Young, C.: A new golden age in computer architecture: empowering the machine-learning revolution. IEEE Micro 38(2), 21–29 (2018)

    Article  Google Scholar 

  9. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural network. In: Proceedings of the Fourteenth International Conference on Artifical Intelligence and Statistics. PMLR 15, pp. 315–323 (2011)

    Google Scholar 

  10. Govindaraju, N.K., Larsen, S., Gray, J., Manocha, D.: A memory model for scientific algorithms on graphics processors. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, November 2006 (2006)

    Google Scholar 

  11. Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a high-level language targeted to GPU codes. In: Innovative Parallel Computing (InPar) (2012)

    Google Scholar 

  12. Hong, S., Kim, H.: An integrated GPU power and performance model. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA 2010, pp. 280–289 (2010)

    Google Scholar 

  13. Jaderberg, M., et al.: Reinforcement learning with unsupervised auxiliary tasks. CoRR abs/1611.05397 (2016)

    Google Scholar 

  14. Kerr, A., Anger, E., Hendry, G., Yalamanchili, S.: Eiger: a framework for the automated synthesis of statistical performance models. In: 2012 19th International Conference on High Performance Computing, pp. 1–6 (2012)

    Google Scholar 

  15. Liu, W., Schmidt, B.: Performance predictions for general-purpose computation on GPUs. In: Proceedings of 2007 International Conference on Parallel Processing, ICPP (2017)

    Google Scholar 

  16. Luo, C., Suda, R.: A performance and energy consumption analytical model for GPU. In: 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, pp. 658–665 (2011)

    Google Scholar 

  17. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26. Curran Associates, Inc. (2013)

    Google Scholar 

  18. Mirowski, P.W., et al.: Learning to navigate in complex environments. CoRR abs/1611.03673 (2016)

    Google Scholar 

  19. Nvidia Turing GPU Architecture. https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf

  20. NVProf. https://docs.nvidia.com/cuda/profiler-users-guide/index.html

  21. The OpenCL Specification. https://www.khronos.org/opencl/

  22. Salaria, S., Drozd, A., Podobas, A., Matsuoka, S.: Predicting performance using collaborative filtering. In: Proceedings of the 2018 IEEE International Conference on Cluster Computing, pp. 504–514. CLUSTER (2018)

    Google Scholar 

  23. Tokui, S., Oono, K., Hido, S., Clayton, J.: Chainer: a next generation open source framework for deep learning. In: Proceedings of Workshop on Machine Learning Systems in NIPS (2010)

    Google Scholar 

  24. Top500. https://www.top500.org

  25. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)

    Article  Google Scholar 

  26. Wu, G., Greathouse, J.L., Lyashevsky, A., Jayasena, N., Chiou, D.: GPGPU performance and power estimation using machine learning. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 564–576, February 2015

    Google Scholar 

  27. Xhang, Y., Owens, J.D.: A quantitative performance analysis model for GPU architectures. In: Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture, HPCA 2011 (2011)

    Google Scholar 

  28. Yuting, Z., Kibok, L., Honglak, L.: Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML 2016, vol. 48, pp. 612–621. JMLR.org (2016)

    Google Scholar 

Download references

Acknowledgment

This work was partially supported by JST CREST Grant Numbers JPMJCR1303, JPMJCR1687 and JSPS KAKENHI Grant Number JP16F16764.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shweta Salaria .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Salaria, S., Drozd, A., Podobas, A., Matsuoka, S. (2019). Learning Neural Representations for Predicting GPU Performance. In: Weiland, M., Juckeland, G., Trinitis, C., Sadayappan, P. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11501. Springer, Cham. https://doi.org/10.1007/978-3-030-20656-7_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20656-7_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20655-0

  • Online ISBN: 978-3-030-20656-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics