Skip to main content

Hardware-Oriented Implementation of Cache Oblivious Matrix Operations Based on Space-Filling Curves

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4967))

Abstract

We will present hardware-oriented implementations of block-recursive approaches for matrix operations, esp. matrix multiplication and LU decomposition. An element order based on a recursively constructed Peano space-filling curve is used to store the matrix elements. This block-recursive numbering scheme is changed into a standard row-major order, as soon as the respective matrix subblocks fit into level-1 cache. For operations on these small blocks, we implemented hardware-oriented kernels optimised for Intel’s Core architecture. The resulting matrix-multiplication and LU-decomposition codes compete well with optimised libraries such as Intel’s MKL, ATLAS, or GotoBLAS, but have the advantage that only comparably small and well-defined kernel operations have to be optimised to achieve high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aberdeen, D., Baxter, J.: Emmerald: a fast matrix-matrix multiply using Intel’s SSE instructions, Concurrency Computat.: Pract. Exper. 13 (2001)

    Google Scholar 

  2. Bader, M., Zenger, C.: Cache oblivious matrix multiplication using an element ordering based on a Peano curve. Linear Algebra Appl. 417(2–3) (2006)

    Google Scholar 

  3. Bader, M., Zenger, C.: A cache oblivious algorithm for matrix multiplication based on Peano’s space filling curve. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds.) PPAM 2005. LNCS, vol. 3911, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Bader, M., Mayer, C.: Cache oblivious matrix operations using Peano curves. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  5. Duff, I.S., Koster, J.: The design and use of algorithms for permuting large entries to the diagonal of sparse matrices. SIAM J. Matrix Anal. Appl. 20(4) (1999)

    Google Scholar 

  6. Elmroth, E., Gustavson, F., Jonsson, I., Kågström, B.: Recursive blocked algorithms and hybrid data structures for dense matrix library software. SIAM Review 46(1) (2004)

    Google Scholar 

  7. GotoBLAS, Texas Advanced Computing Center, http://www.tacc.utexas.edu/resources/software/

  8. Gustavson, F.G.: Recursion leads to automatic variable blocking for dense linear-algebra algorithms. IBM Journal of Research and Development 41(6) (1997)

    Google Scholar 

  9. Intel Math Kernel Library (2005), http://intel.com/cd/software/products/asmo-na/eng/perflib/mkl/

  10. Joffrain, T., Quintana-Orti, E.S., van de Geijn, R.: Updating an LU factorization and its application to scalable out-of-core, ?????

    Google Scholar 

  11. Whaley, R.C., Petitet, A., Dongarra, J.J.: Automated empirical optimization of software and the ATLAS project. Parallel Computing 27(1–2) (2001)

    Google Scholar 

  12. Yotov, K., Roeder, T., Pingali, K., Gunnels, J., Gustavson, F.: Is cache oblivious DGEMM a viable alternative. In: Kågström, B., Elmroth, E., Dongarra, J., Waśniewski, J. (eds.) PARA 2006. LNCS, vol. 4699, Springer, Heidelberg (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roman Wyrzykowski Jack Dongarra Konrad Karczewski Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bader, M., Franz, R., Günther, S., Heinecke, A. (2008). Hardware-Oriented Implementation of Cache Oblivious Matrix Operations Based on Space-Filling Curves. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68111-3_66

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68105-2

  • Online ISBN: 978-3-540-68111-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics