The memory behavior of cache oblivious stencil computations

Frigo, Matteo; Strumpen, Volker

doi:10.1007/s11227-007-0111-y

The memory behavior of cache oblivious stencil computations

Published: 21 February 2007

Volume 39, pages 93–112, (2007)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Matteo Frigo¹ &
Volker Strumpen¹

260 Accesses
34 Citations
Explore all metrics

Abstract

We present and evaluate a cache oblivious algorithm for stencil computations, which arise for example in finite-difference methods. Our algorithm applies to arbitrary stencils in n-dimensional spaces. On an “ideal cache” of size Z, our algorithm saves a factor of Θ(Z ^1/n) cache misses compared to a naive algorithm, and it exploits temporal locality optimally throughout the entire memory hierarchy. We evaluate our algorithm in terms of the number of cache misses, and demonstrate that the memory behavior agrees with our theoretical predictions. Our experimental evaluation is based on a finite-difference solution of a heat diffusion problem, as well as a Gauss-Seidel iteration and a 2-dimensional LBMHD program, both reformulated as cache oblivious stencil computations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A High Arithmetic Intensity Krylov Subspace Method Based on Stencil Compiler Programs

Modeling Stencil Computations on Modern HPC Architectures

A Cache-Optimal Alternative to the Unidirectional Hierarchization Algorithm

References

Aggarwal A, Alpern B, Chandra AK, Snir M (1987) A model for hierarchical memory. In: 19th ACM symposium on theory of computing, New York, May 1987, pp 305–314
Aggarwal A, Vitter JS (1988) The input/output complexity of sorting and related problems. Commun ACM 31(9):1116–1127
Article MathSciNet Google Scholar
Alpern B, Carter L, Ferrante J (1995) Space-limited procedures: a methodology for portable high-performance. In: Conference on programming models for massively parallel computers, Berlin, Germany, October 1995. IEEE Computer Society, pp 10–17
Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK users’ guide. Society for Industrial and Applied Mathematics, Philadelphia, 3rd edn. http://www.netlib.org/lapack/lug/lapack_lug.html
Google Scholar
Arge L, Bender MA, Demaine ED, Holland-Minkley B, Munro JI (2002) Cache-oblivious priority queue and graph algorithm applications. In: 34th ACM symposium on theory of computing. ACM Press, Montréal, Canada, 2002, pp 268–276
Bailey DH (1993) RISC microprocessors and scientific computing. In: Supercomputing’93, Portland, OR, November 1993, pp 645–654
Bender MA, Demaine ED, Farach-Colton M (2000) Cache-oblivious B-trees. In: Symposium on foundations of computer science, IEEE Computer Society, Redondo Beach, CA, November 2000, pp 399–409
Bilardi G, Preparata FP (1995) Upper bounds to processor-time tradeoffs under bounded-speed message propagation. In: 7th ACM symposium on parallel algorithms and architectures, ACM Press, Santa Barbara, 1995, pp 185–194
Blumofe RD, Frigo M, Joerg CF, Leiserson CE, Randall KH (1996) An analysis of dag-consistent distributed shared-memory algorithms. In: 8th ACM symposium on parallel algorithms and architectures, Padua, Italy, June 1996, pp 297–308
Bohrer P, Elnozahy M, Gheith A, Lefurgy C, Nakra T, Peterson J, Rajamony R, Rockhold R, Shafi H, Simpson R, Speight E, Sudeep K, Van Hensbergen E, Zhang L (2004) Mambo: a full system simulator for the PowerPC architecture. SIGMETRICS Perform Eval Rev 31(4):8–12
Article Google Scholar
Brodal GS, Fagerberg R, Vinther K (2004) Engineering a cache-oblivious sorting algorithm. In: 6th Workshop on algorithm engineering and experiments SIAM, New Orleans, LA, January 2004, pp 4–17
Chen S, Doolen GD, Eggert KG (1994) Lattice-Boltzmann fluid dynamics: a versatile tool for multiphase and other complicated flows. Los Alamos Sci 22:98–19
Google Scholar
Dongarra JJ, Moler CB, Bunch JR, Stewart GW (1979) LINPACK users’ guide. Society for Industrial and Applied Mathematics, Philadelphia
Google Scholar
Frigo M, Leiserson CE, Prokop H, Ramachandran S (1999) Cache-oblivious algorithms. In: 40th symposium on foundations of computer science, New York, NY, October 1999. ACM Press
Frigo M, Strumpen V (2005) Cache oblivious stencil computations. In: International conference on supercomputing, Boston, MA, June 2005. ACM Press, pp 361–366
Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. Johns Hopkins University Press, Baltimore
MATH Google Scholar
Goto K, van de Geijn R (2001) On Reducing TLB Misses in Matrix Multiplication. Technical Report TR-2002-55, Department of Computer Sciences, The University of Texas at Austin (FLAME Working Note #9)
Hong J-W, Kung HT (1981) I/O complexity: the red-blue pebbling game. In: 13th ACM Symposium on Theory of Computing, Milwaukee, WI, May 1981, pp 326–333
Kowarschik M (2004) Data locality optimizations for iterative numerical algorithms and cellular automata on hierarchical memory architectures. PhD thesis, Lehrstuhl für Informatik 10 (Systemsimulation), Institut für Informatik, Universität Erlangen-Nürnberg, Erlangen, Germany, July 2004
Macnab A, Vahala G, Vahala L, Pavlo P (2002) Lattice Boltzmann model for dissipative MHD. In: 29th EPS conference on controlled fusion and plasma physics, vol 26B, Montreux, Switzerland, June 2002
Oliker L, Canning A, Carter J, Shalf J, Ethier S (2004) Scientific computations on modern parallel vector systems. In: Supercomputing’04, Pittsburgh, PA, November 2004, IEEE. http://www.sc-conference.org/sc2004/papers.html
Pohl T, Deserno F, Thürey N, Rüde U, Lammers P, Wellein G, Zeiser T (2004) Performance evaluation of parallel large-scale lattice Boltzmann applications on three supercomputing architectures. In: Supercomputing’04, Pittsburgh, PA, November 2004, IEEE. http://www.sc-conference.org/sc2004/papers.html
Prokop H (1999) Cache-oblivious algorithms. Master’s thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June 1999
Smith GD (1985) Numerical solution of partial differential equations: finite difference methods, 3rd edn. Oxford University Press, Oxford
MATH Google Scholar
Toledo S (1997) Locality of reference in LU decomposition with partial pivoting. SIAM J Matrix Anal Appl 18(4):1065–1081
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

IBM Austin Research Laboratory, 11501 Burnet Road, Austin, TX, 78758, USA
Matteo Frigo & Volker Strumpen

Authors

Matteo Frigo
View author publications
You can also search for this author in PubMed Google Scholar
Volker Strumpen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Volker Strumpen.

Additional information

This work was supported in part by the Defense Advanced Research Projects Agency (DARPA) under contract No. NBCH30390004.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Frigo, M., Strumpen, V. The memory behavior of cache oblivious stencil computations. J Supercomput 39, 93–112 (2007). https://doi.org/10.1007/s11227-007-0111-y

Download citation

Published: 21 February 2007
Issue Date: February 2007
DOI: https://doi.org/10.1007/s11227-007-0111-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The memory behavior of cache oblivious stencil computations

Abstract

Access this article

Similar content being viewed by others

A High Arithmetic Intensity Krylov Subspace Method Based on Stencil Compiler Programs

Modeling Stencil Computations on Modern HPC Architectures

A Cache-Optimal Alternative to the Unidirectional Hierarchization Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The memory behavior of cache oblivious stencil computations

Abstract

Access this article

Similar content being viewed by others

A High Arithmetic Intensity Krylov Subspace Method Based on Stencil Compiler Programs

Modeling Stencil Computations on Modern HPC Architectures

A Cache-Optimal Alternative to the Unidirectional Hierarchization Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation