Skip to main content

A Cache-Optimal Alternative to the Unidirectional Hierarchization Algorithm

  • Conference paper
  • First Online:
Sparse Grids and Applications - Stuttgart 2014

Part of the book series: Lecture Notes in Computational Science and Engineering ((LNCSE,volume 109))

Abstract

The sparse grid combination technique provides a framework to solve high-dimensional numerical problems with standard solvers by assembling a sparse grid from many coarse and anisotropic full grids called component grids. Hierarchization is one of the most fundamental tasks for sparse grids. It describes the transformation from the nodal basis to the hierarchical basis. In settings where the component grids have to be frequently combined and distributed in a massively parallel compute environment, hierarchization on component grids is relevant to minimize communication overhead.

We present a cache-oblivious hierarchization algorithm for component grids of the combination technique. It causes \(\left \vert \mathbf{G}_{\boldsymbol{\ell}}\right \vert \cdot \left ( \frac{1} {B} + \mathcal{O}\left ( \frac{1} {\root{d}\of{M}}\right )\right )\) cache misses under the tall cache assumption \(M =\omega \left (B^{d}\right )\). Here, \(\mathbf{G}_{\boldsymbol{\ell}}\) denotes the component grid, d the dimension, M the size of the cache and B the cache line size. This algorithm decreases the leading term of the cache misses by a factor of d compared to the unidirectional algorithm which is the common standard up to now. The new algorithm is also optimal in the sense that the leading term of the cache misses is reduced to scanning complexity, i.e., every degree of freedom has to be touched once. We also present a variant of the algorithm that causes \(\left \vert \mathbf{G}_{\boldsymbol{\ell}}\right \vert \cdot \left ( \frac{2} {B} + \mathcal{O}\left ( \frac{1} {\root{d-1}\of{M\cdot B^{d-2}}} \right )\right )\) cache misses under the assumption \(M =\omega \left (B\right )\). The new algorithms have been implemented and outperform previously existing software. In several cases the measured performance is close to the best possible.

The dimension d is assumed to be constant in the \(\mathcal{O}\)-notation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the terms internal memory and cache as well as cache line size and block size synonymously.

References

  1. A. Aggarwal, J.S. Vitter, The input/output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)

    Article  MathSciNet  Google Scholar 

  2. G. Ballard, J. Demmel, O. Holtz, O. Schwartz, Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. Appl. 32(3), 866–901 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  3. H.-J. Bungartz, M. Griebel, Sparse grids. Acta Numer. 13, 147–269 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  4. H.-J. Bungartz, A. Heinecke, D. Pflüger, S. Schraufstetter, Option pricing with a direct adaptive sparse grid approach. J. Comput. Appl. Math. 236(15), 3741–3750 (2011). Online Okt. 2011

    Google Scholar 

  5. H.-J. Bungartz, D. Pflüger, S. Zimmer, Adaptive sparse grid techniques for data mining, in Modelling, Simulation and Optimization of Complex Processes 2006, Proceedings of the International Conference on HPSC, Hanoi, ed. by H. Bock, E. Kostina, X. Hoang, R. Rannacher (Springer, 2008), pp. 121–130

    Google Scholar 

  6. G. Buse, R. Jacob, D. Pflüger, A. Murarasu, A non-static data layout enhancing parallelism and vectorization in sparse grid algorithms, in Proceedings of the 11th International Symposium on Parallel and Distributed Computing (ISPDC), Munich, 25–29 June 2012 (IEEE, 2012), pp. 195–202

    Google Scholar 

  7. D. Butnaru, D. Pflüger, H.-J. Bungartz, Towards high-dimensional computational steering of precomputed simulation data using sparse grids, in Proceedings of the International Conference on Computational Science (ICCS), Tsukaba. Volume 4 of Procedia CS (Springer, 2011), pp. 56–65

    Google Scholar 

  8. P. Butz, Effiziente verteilte Hierarchisierung und Dehierarchisierung auf vollen Gittern, Bachelor’s thesis, University of Stuttgart, 2014, http://d-nb.info/1063333806

    Google Scholar 

  9. C. Feuersänger, Sparse grid methods for higher dimensional approximation, PhD thesis, Universität Bonn, 2010

    Google Scholar 

  10. M. Frigo, C. E. Leiserson, H. Prokop, S. Ramachandran, Cache-oblivious algorithms, in Proceedings of the 40th Annual Symposium on Foundations of Computer Science (FOCS’99), New York (IEEE Computer Society Press, 1999), pp. 285–297

    Google Scholar 

  11. J. Garcke, Maschinelles Lernen durch Funktionsrekonstruktion mit verallgemeinerten dünnen Gittern, PhD thesis, Universität Bonn, 2004

    Google Scholar 

  12. J. Garcke, M. Griebel, On the parallelization of the sparse grid approach for data mining, in Large-Scale Scientific Computing, ed. by S. Margenov, J. Waśniewski, P. Yalamov. Volume 2179 of Lecture Notes in Computer Science (Springer, Berlin/Heidelberg, 2001), pp. 22–32

    Google Scholar 

  13. E. Georganas, J. González-Domínguez, E. Solomonik, Y. Zheng, J. Touriño, K. Yelick, Communication avoiding and overlapping for numerical linear algebra, in Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC’12), Salt Lake City (IEEE Computer Society Press, Los Alamitos, 2012), pp. 100:1–100:11

    Google Scholar 

  14. M. Griebel, The combination technique for the sparse grid solution of PDE’s on multiprocessor machines. Parallel Process. Lett. 2, 61–70 (1992)

    Article  Google Scholar 

  15. M. Griebel, H. Harbrecht, On the convergence of the combination technique, in Sparse Grids and Applications. Volume 97 of Lecture Notes in Computational Science and Engineering (Springer, Cham/New York, 2014), pp. 55–74

    Google Scholar 

  16. M. Griebel, W. Huber, Turbulence simulation on sparse grids using the combination method, in ed. by N. Satofuka, J. Periaux, A. Ecer, Proceedings Parallel Computational Fluid Dynamics, New Algorithms and Applications (CFD’94), Kyoto, Wiesbaden Braunschweig (Vieweg, 1995), pp. 75–84

    Google Scholar 

  17. M. Griebel, W. Huber, C. Zenger, Numerical turbulence simulation on a parallel computer using the combination method, in Flow Simulation on High Performance Computers II, Notes on Numerical Fluid Mechanics 52, pp. 34–47 (Vieweg, Wiesbaden 1996) DOI:10.1007/978-3-322-89849-4_4

    Google Scholar 

  18. M. Griebel, M. Schneider, C. Zenger, A combination technique for the solution of sparse grid problems, in Iterative Methods in Linear Algebra (IMACS/Elsevier, Amsterdam 1992), pp. 263–281

    MATH  Google Scholar 

  19. M. Griebel, V. Thurner, The efficient solution of fluid dynamics problems by the combination technique. Int. J. Numer. Methods Heat Fluid Flow 5, 51–69 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  20. B. Harding, M. Hegland, A robust combination technique, in CTAC-2012. Volume 54 of ANZIAM Journal, 2013, pp. C394–C411

    Google Scholar 

  21. M. Holtz, Sparse Grid Quadrature in High Dimensions with Applications in Finance and Insurance. Volume 77 of Lecture Notes in Computational Science and Engineering (Springer, Heidelberg, 2011)

    Google Scholar 

  22. J.-W. Hong, H.-T. Kung, I/O complexity: The red-blue pebble game, in Proceedings of STOC’81, New York (ACM, 1981), pp. 326–333

    Google Scholar 

  23. P. Hupp, Communication efficient algorithms for numerical problems on full and sparse grids, PhD thesis, ETH Zurich, 2014

    Google Scholar 

  24. P. Hupp, Performance of unidirectional hierarchization for component grids virtually maximized, in International Conference on Computational Science. Volume 29 of Procedia Computer Science (Elsevier, Amsterdam 2014), pp. 2272–2283

    Google Scholar 

  25. P. Hupp, M. Heene, R. Jacob, D. Pflüger, Global communication schemes for the numerical solution of high-dimensional PDEs. Parallel Comput. (2016). DOI:10.1016/j.parco.2015.12.006

    Google Scholar 

  26. P. Hupp, R. Jacob, M. Heene, D. Pflüger, M. Hegland, Global communication schemes for the sparse grid combination technique. in Parallel Computing – Accelerating Computational Science and Engineering (CSE). Volume 25 of Advances in Parallel Computing (IOS Press, 2014), pp. 564–573

    Google Scholar 

  27. D. Irony, S. Toledo, A. Tiskin, Communication lower bounds for distributed-memory matrix multiplication. J. Parallel Distrib. Comput. 64(9), 1017–1026 (2004)

    Article  MATH  Google Scholar 

  28. R. Jacob, Efficient regular sparse grid hierarchization by a dynamic memory layout, in Sparse Grids and Applications 2012, Munich, ed. by J. Garcke, D. Pflüger. Volume 97 of Lecture Notes in Computational Science and Engineering (Springer, Cham/New York, 2014)pp. 195–219

    Google Scholar 

  29. C. Kowitz, M. Hegland, The sparse grid combination technique for computing eigenvalues in linear gyrokinetics. Procedia Comput. Sci. 18, 449–458 (2013). International Conference on Computational Science.

    Google Scholar 

  30. M.D. Lam, E.E. Rothberg, M.E. Wolf, The cache performance and optimizations of blocked algorithms. SIGPLAN Not. 26(4), 63–74 (1991)

    Article  Google Scholar 

  31. A. Maheshwari, N. Zeh, A survey of techniques for designing I/O-efficient algorithms, in Algorithms for Memory Hierarchies. ed. by U. Meyer, P. Sanders, J. Sibeyn. Volume 2625 of Lecture Notes in Computer Science, pp. 36–61 (Springer, Berlin/Heidelberg, 2003)

    Google Scholar 

  32. A. Murarasu, J. Weidendorfer, G. Buse, D. Butnaru, D. Pflüger, Compact data structure and scalable algorithms for the sparse grid technique, in Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP), San Antonio (ACM, 2011), pp. 25–34

    Google Scholar 

  33. A. F. Murarasu, G. Buse, D. Pflüger, J. Weidendorfer, A. Bode, fastsg: A fast routines library for sparse grids. Procedia CS 9, 354–363 (2012)

    Google Scholar 

  34. C. Pflaum, Convergence of the combination technique for second-order elliptic differential equations. SIAM J. Numer. Anal. 34(6), 2431–2455 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  35. C. Pflaum, A. Zhou, Error analysis of the combination technique. Numer. Math. 84(2), 327–350 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  36. D. Pflüger, Spatially adaptive sparse grids for high-dimensional problems, PhD thesis, Institut für Informatik, Technische Universität München, 2010

    Google Scholar 

  37. D. Pflüger, H.-J. Bungartz, M. Griebel, F. Jenko, T. Dannert, M. Heene, A. Parra Hinojosa, C. Kowitz, and P. Zaspel, Exahd: An exa-scalable two-level sparse grid approach for higher-dimensional problems in plasma physics and beyond, in Euro-Par 2014: Parallel Processing Workshops. Volume 8806 of Lecture Notes in Computer Science (Springer, Cham 2014), pp. 565–576

    Google Scholar 

  38. H. Prokop, Cache-oblivious algorithms, Master’s thesis, Massachusetts Institute of Technology, 1999

    MATH  Google Scholar 

  39. C. Reisinger, Analysis of linear difference schemes in the sparse grid combination technique. IMA J. Numer. Anal. 33(2), 544–581 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  40. S. Smolyak, Quadrature and interpolation formulas for tensor products of certain classes of functions. Sov. Math. Dokl. 4, 240–243 (1963)

    MATH  Google Scholar 

  41. C. Zenger, Sparse grids, in Parallel Algorithms for Partial Differential Equations. Volume 31 of Notes on Numerical Fluid Mechanics (Vieweg, Wiesbaden 1991), pp. 241–251

    Google Scholar 

Download references

Acknowledgements

We would like to thank Dirk Pflüger and Mario Heene for support and discussions, in particular for enabling the experiments on Hornet. We also thank two anonymous referees for detailed feedback on an earlier draft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philipp Hupp .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Hupp, P., Jacob, R. (2016). A Cache-Optimal Alternative to the Unidirectional Hierarchization Algorithm. In: Garcke, J., Pflüger, D. (eds) Sparse Grids and Applications - Stuttgart 2014. Lecture Notes in Computational Science and Engineering, vol 109. Springer, Cham. https://doi.org/10.1007/978-3-319-28262-6_5

Download citation

Publish with us

Policies and ethics