Hardware-Oriented Implementation of Cache Oblivious Matrix Operations Based on Space-Filling Curves

Bader, Michael; Franz, Robert; Günther, Stephan; Heinecke, Alexander

doi:10.1007/978-3-540-68111-3_66

Michael Bader¹,
Robert Franz¹,
Stephan Günther¹ &
…
Alexander Heinecke¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4967))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

973 Accesses
20 Citations

Abstract

We will present hardware-oriented implementations of block-recursive approaches for matrix operations, esp. matrix multiplication and LU decomposition. An element order based on a recursively constructed Peano space-filling curve is used to store the matrix elements. This block-recursive numbering scheme is changed into a standard row-major order, as soon as the respective matrix subblocks fit into level-1 cache. For operations on these small blocks, we implemented hardware-oriented kernels optimised for Intel’s Core architecture. The resulting matrix-multiplication and LU-decomposition codes compete well with optimised libraries such as Intel’s MKL, ATLAS, or GotoBLAS, but have the advantage that only comparably small and well-defined kernel operations have to be optimised to achieve high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Aberdeen, D., Baxter, J.: Emmerald: a fast matrix-matrix multiply using Intel’s SSE instructions, Concurrency Computat.: Pract. Exper. 13 (2001)
Google Scholar
Bader, M., Zenger, C.: Cache oblivious matrix multiplication using an element ordering based on a Peano curve. Linear Algebra Appl. 417(2–3) (2006)
Google Scholar
Bader, M., Zenger, C.: A cache oblivious algorithm for matrix multiplication based on Peano’s space filling curve. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, Springer, Heidelberg (2006)
Chapter Google Scholar
Bader, M., Mayer, C.: Cache oblivious matrix operations using Peano curves. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, Springer, Heidelberg (2007)
Chapter Google Scholar
Duff, I.S., Koster, J.: The design and use of algorithms for permuting large entries to the diagonal of sparse matrices. SIAM J. Matrix Anal. Appl. 20(4) (1999)
Google Scholar
Elmroth, E., Gustavson, F., Jonsson, I., Kågström, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 46(1) (2004)
Google Scholar
GotoBLAS, Texas Advanced Computing Center, http://www.tacc.utexas.edu/resources/software/
Gustavson, F.G.: Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development 41(6) (1997)
Google Scholar
Intel Math Kernel Library (2005), http://intel.com/cd/software/products/asmo-na/eng/perflib/mkl/
Joffrain, T., Quintana-Orti, E.S., van de Geijn, R.: Updating an LU factorization and its application to scalable out-of-core, ?????
Google Scholar
Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1–2) (2001)
Google Scholar
Yotov, K., Roeder, T., Pingali, K., Gunnels, J., Gustavson, F.: Is cache oblivious DGEMM a viable alternative. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, Springer, Heidelberg (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Informatics, TU München, 80290, München, Germany
Michael Bader, Robert Franz, Stephan Günther & Alexander Heinecke

Authors

Michael Bader
View author publications
You can also search for this author in PubMed Google Scholar
Robert Franz
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Günther
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Heinecke
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Roman Wyrzykowski Jack Dongarra Konrad Karczewski Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bader, M., Franz, R., Günther, S., Heinecke, A. (2008). Hardware-Oriented Implementation of Cache Oblivious Matrix Operations Based on Space-Filling Curves. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_66

Download citation

DOI: https://doi.org/10.1007/978-3-540-68111-3_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68105-2
Online ISBN: 978-3-540-68111-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics