Skip to main content

Empirical Performance-Model Driven Data Layout Optimization

  • Conference paper
Languages and Compilers for High Performance Computing (LCPC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3602))

Abstract

Empirical optimizers like ATLAS have been very effective in optimizing computational kernels in libraries. The best choice of parameters such as tile size and degree of loop unrolling is determined by executing different versions of the computation. In contrast, optimizing compilers use a model-driven approach to program transformation. While the model-driven approach of optimizing compilers is generally orders of magnitude faster than ATLAS-like library generators, its effectiveness can be limited by the accuracy of the performance models used. In this paper, we describe an approach where a class of computations is modeled in terms of constituent operations that are empirically measured, thereby allowing modeling of the overall execution time. The performance model with empirically determined cost components is used to perform data layout optimization in the context of the Tensor Contraction Engine, a compiler for a high-level domain-specific language for expressing computational models in quantum chemistry. The effectiveness of the approach is demonstrated through experimental measurements on some representative computations from quantum chemistry.

Supported in part by the National Science Foundation through the Information Technology Research program (CHE-0121676 and CHE-0121706), by NSF grant CCF-0073800 and by a grant from the Environmental Protection Agency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggregate Remote Memory Copy Interface, http://www.emsl.pnl.gov/docs/parsoft/armci/

  2. Anderson, J.M., Amarasinghe, S.P., Lam, M.S.: Data and Computation Transformations for Multiprocessors. In: Proc. of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Processing (July 1995)

    Google Scholar 

  3. Baumgartner, G., Bernholdt, D.E., Cociorva, D., Harrison, R., Hirata, S., Lam, C., Nooijen, M., Pitzer, R., Ramanujam, J., Sadayappan, P.: A High-Level Approach to Synthesis of High-Performance Codes for Quantum Chemistry. In: Proc. of SC 2002 (November 2002)

    Google Scholar 

  4. Cannon, L.: A Cellular Computer to Implement the Kalman Filter Algorithm. PhD thesis, Montana State University (1969)

    Google Scholar 

  5. Cierniak, M., Li, W.: Unifying data and control transformations for distributed shared memory machines. In: ACM SIGPLAN IPDPS, pp. 205–217 (1995)

    Google Scholar 

  6. Cociorva, D., Baumgartner, G., Lam, C., Ramanujam, J., Sadayappan, P., Nooijen, M., Bernholdt, D., Harrison, R.: Space-Time Trade-Off Optimization for a Class of Electronic Structure Calculations. In: Proc. of ACM SIGPLAN PLDI 2002, pp. 177–186 (2002)

    Google Scholar 

  7. Cociorva, D., Gao, X., Krishnan, S., Baumgartner, G., Lam, C., Sadayappan, P., Ramanujam, J.: Global Communication Optimization for Tensor Contraction Expressions under Memory Constraints. In: Proc. of IPDPS (2003)

    Google Scholar 

  8. Dongarra, J.J., Croz, J.D., Duff, I.S., Hammarling, S.: A set of level-3 basic linear algebra subprograms. ACM Transactions on Mathematical Software 16(1), 1–17 (1990)

    Article  MATH  Google Scholar 

  9. Van De Geijn, R.A., Watts, J.: SUMMA: scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9(4), 255–274 (1997)

    Article  Google Scholar 

  10. Intel Math Kernel Library, http://www.intel.com/software/products/mkl/features.htm

  11. Ju, Y., Dietz, H.: Reduction of cache coherence overhead by compiler data layout and loop transformation. In: Proc. of LCPC, pp. 344–358. Springer, Heidelberg (1992)

    Google Scholar 

  12. Kandemir, M., Banerjee, P., Choudhary, A., Ramanujam, J., Ayguade, E.: Static and dynamic locality optimizations using integer linear programming. IEEE Transactions on Parallel and Distributed Systems 12(9), 922–941 (2001)

    Article  Google Scholar 

  13. Kandemir, M., Choudhary, A., Ramanujam, J., Banerjee, P.: Improving locality using loop and data transformations in an integrated framework. In: International Symposium on Microarchitecture, pp. 285–297 (1998)

    Google Scholar 

  14. Kandemir, M., Choudhary, A., Shenoy, N., Banerjee, P., Ramanujam, J.: A linear algebra framework for automatic determination of optimal data layouts. IEEE Transactions on Parallel and Distributed Systems 10(2), 115–135 (1999)

    Article  Google Scholar 

  15. Kennedy, K., Broom, B., Cooper, K., Dongarra, J., Fowler, R., Gannon, D., Johnsson, L., Crummey, J.M., Torczon, L.: Telescoping languages: A strategy for automatic generation of scientific problem-solving systems from annotated libraries. JPDC 61(12), 1803–1826 (2001)

    MATH  Google Scholar 

  16. Krishnan, S., Krishnamoorthy, S., Baumgartner, G., Cociorva, D., Lam, C., Sadayappan, P., Ramanujam, J., Bernholdt, D.E., Choppella, V.: Data Locality Optimization for Synthesis of Efficient Out-of-Core Algorithms. In: Pinkston, T.M., Prasanna, V.K. (eds.) HiPC 2003. LNCS (LNAI), vol. 2913, pp. 406–417. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  17. Lee, T.J., Scuseria, G.E.: Achieving chemical accuracy with coupled cluster theory. In: Langhoff, S.R. (ed.) Quantum Mechanical Electronic Structure Calculations with Chemical Accuracy, pp. 47–109. Kluwer Academic Publishers, Dordrecht (1997)

    Google Scholar 

  18. Leung, S., Zahorjan, J.: Optimizing data locality by array restructuring. Technical Report TR-95-09-01, Dept. Computer Science, University of Washington, Seattle, WA (1995)

    Google Scholar 

  19. Frigo, M., Johnson, S.: FFTW: An adaptive software architecture for the FFT. In: Proc. of ICASSP 1998, vol. 3, pp. 1381–1384 (1998)

    Google Scholar 

  20. Nieplocha, J., Harrison, R.J., Littlefield, R.J.: Global arrays: a portable programming model for distributed memory computers. In: Supercomputing, pp. 340–349 (1994)

    Google Scholar 

  21. O’Boyle, M.F.P., Knijnenburg, P.M.W.: Non-singular data transformations: definition, validity, applications. In: Proc. of CPC1996, pp. 287–297 (1996)

    Google Scholar 

  22. Whaley, R., Dongarra, J.: Automatically Tuned Linear Algebra Software (ATLAS). In: Proc. of Supercomputing 1998 (1998)

    Google Scholar 

  23. Yotov, K., Li, X., Ren, G., Cibulskis, M., DeJong, G., Garzaran, M., Padua, D., Pingali, K., Stodghill, P., Wu, P.: A comparison of empirical and model-driven optimization. SIGPLAN Not. 38(5), 63–76 (2003)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, Q., Gao, X., Krishnamoorthy, S., Baumgartner, G., Ramanujam, J., Sadayappan, P. (2005). Empirical Performance-Model Driven Data Layout Optimization. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds) Languages and Compilers for High Performance Computing. LCPC 2004. Lecture Notes in Computer Science, vol 3602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532378_7

Download citation

  • DOI: https://doi.org/10.1007/11532378_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28009-5

  • Online ISBN: 978-3-540-31813-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics