Skip to main content

Data Movement in Data-Intensive High Performance Computing

  • Chapter
  • First Online:
Conquering Big Data with High Performance Computing

Abstract

The cost of executing a floating point operation has been decreasing for decades at a much higher rate than that of moving data. Bandwidth and latency, two key metrics that determine the cost of moving data, have degraded significantly relative to processor cycle time and execution rate. Despite the limitation of sub-micron processor technology and the end of Dennard scaling, this trend will continue in the short-term making data movement a performance-limiting factor and an energy/power efficiency concern. Even more so in the context of large-scale and data-intensive systems and workloads. This chapter gives an overview of the aspects of moving data across a system, from the storage system to the computing system down to the node and processor level, with case study and contributions from researchers at the San Diego Supercomputer Center, the Oak Ridge National Laboratory, the Pacific Northwest National Laboratory, and the University of Delaware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A director-class switch is loosely defined as a high port count switch connecting different fabrics.

References

  1. Open Community Runtime. https://xstackwiki.modelado.org/Open_Community_Runtime (2016)

  2. Stampede - Dell PowerEdge C8220 Cluster with Intel Xeon Phi coprocessors. http://www.tacc.utexas.edu/resources/hpc (2016)

  3. Advanced Scientific Computing Research (ASCR). Scientific discovery through advanced computing (SciDAC) Co-Design. http://science.energy.gov/ascr/research/scidac/co-design/ (2016)

  4. S. Amarasinghe, D. Campbell, W. Carlson, A. Chien, W. Dally, E. Elnohazy, M. Hall, R. Harrison, W. Harrod, K. Hill, A. Snavely, ExaScale software study: software challenges in Extreme scale systems. Technical report, DARPA IPTO, Air Force Research Labs, 2009

    Google Scholar 

  5. S. Amarasinghe, M. Hall, R. Lethin, K. Pingali, D. Quinlan, V. Sarkar, J. Shalf, R. Lucas, K. Yelick, P. Balaji, P. C. Diniz, A. Koniges, M. Snir, S.R. Sachs, Report of the Workshop on Exascale Programming Challenges. Technical report, US Department of Energy, 2011

    Google Scholar 

  6. J.A. Ang, B.W. Barrett, K.B. Wheeler, R.C. Murphy, Introducing the graph 500, in Proceedings of Cray User’s Group Meeting (CUG), May 2010

    Google Scholar 

  7. T. Baer, V. Hazlewood, J. Heo, R. Mohr, J. Walsh, Large Lustre File System Experiences at NICS, Cray User Group, Atlanta, GA, May 2009

    Google Scholar 

  8. J.D. Bakos, High-performance heterogeneous computing with the convey hc-1. Comput. Sci. Eng. 12 (6), 80–87 (2010)

    Article  Google Scholar 

  9. T. Barrett, M. Sumit, K. Taek-Jun, S. Ravinder, C. Sachit, J. Sondeen, J. Draper, A double-data rate (DDR) processing-in-memory (PIM) device with wideword floating-point capability, in Proceedings of the IEEE International Symposium on Circuits and Systems, 2006

    Google Scholar 

  10. K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R.S. Williams, K. Yelick, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Keckler, D. Klein, P. Kogge, R.S. Williams, K. Yelick, Exascale computing study: technology challenges in achieving exascale systems. Peter Kogge, editor and study lead (2008)

    Google Scholar 

  11. K. Beyls, E.H. D’Hollander, Reuse distance as a metric for cache behavior, in Proceedings of the IASTED Conference on Parallel and Distributed Computing System, 2001

    Google Scholar 

  12. K. Beyls, E.H. D’Hollander, Reuse distance-based cache hint selection, in Proceedings of the 8th International Euro-Par Conference on Parallel Processing, Euro-Par ’02, London, 2002 (Springer-Verlag, UK, 2002, pp. 265–274)

    Google Scholar 

  13. A.S. Bland, R.A. Kendall, D.B. Kothe, J.H. Rogers, G.M. Shipman, Jaguar: the world’s most powerful computer. Memory (TB) 300 (62), 362 (2009)

    Google Scholar 

  14. A.S. Bland, J.C. Wells, O.E. Messer, O.R. Hernandez, J.H. Rogers, Titan: early experience with the Cray XK6 at oak ridge national laboratory, in Proceedings of Cray User Group Conference (CUG 2012), 2012

    Google Scholar 

  15. Z. Budimlić, M. Burke, V. Cavé, K. Knobe, G. Lowney, R. Newton, J. Palsberg, D. Peixotto, V. Sarkar, F. Schlimbach, S. Tacsirlar, Concurrent collections. Sci. Program. 18 (3–4), 203–217 (2010)

    Google Scholar 

  16. J.A. Butts, G.S. Sohi, A static power model for architects, in Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 33 (ACM, New York, NY, 2000), pp. 191–201

    Google Scholar 

  17. P. Cicotti, Tarragon: a programming model for latency-hiding scientific computations. Ph.D. thesis, La Jolla, CA, USA, 2011. AAI3449479

    Google Scholar 

  18. P. Cicotti, L. Carrington, ADAMANT: tools to capture, analyze, and manage data movement, in International Conference on Computational Science (ICCS) San Diego, California, USA, 2016

    Google Scholar 

  19. P. Cicotti, M. Norman, R. Sinkovits, A. Snavely, S. Strande, Gordon: A Novel Architecture for Data Intensive Computing volume On the road to Exascale Computing: Contemporary Architectures in High Performance Computing, chapter 17 Chapman and Hall (2013)

    Google Scholar 

  20. B. Dally, Power, programmability, and granularity: the challenges of exascale computing, in Proceedings of International Parallel and Distributed Processing Symposium (2011), pp. 878–878

    Google Scholar 

  21. J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51 (1), 107–113 (2008)

    Article  Google Scholar 

  22. J. Demmel, Communication avoiding algorithms, in Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC ’12 (IEEE Computer Society, Washington, DC, 2012), pp. 1942–2000

    Google Scholar 

  23. J. Deslippe, B. Austin, C. Daley, W.-S. Yang, Lessons learned from optimizing science kernels for intel’s knights landing. Comput. Sci. Eng. 17 (3), 30–42 (2015)

    Article  Google Scholar 

  24. D. Dillow, G.M. Shipman, S. Oral, Z. Zhang, Y. Kim et al., Enhancing i/o throughput via efficient routing and placement for large-scale parallel file systems, in Performance Computing and Communications Conference (IPCCC), 2011 IEEE 30th International (IEEE, 2011), pp. 1–9

    Google Scholar 

  25. C. Ding, Y. Zhong, Predicting whole-program locality through reuse distance analysis, in Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI ’03 (ACM, New York, NY, 2003), pp. 245–257

    Google Scholar 

  26. H. Dong, H. Nak, D. Lewis, H.-H. Lee, An optimized 3d-stacked memory architecture by exploiting excessive, high-density TSV bandwidth, in IEEE 16th International Symposium on High Performance Computer Architecture (HPCA) (2010), pp. 1–12

    Google Scholar 

  27. J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, G. Daglikoca, The architecture of the diva processing-in-memory chip, in Proceedings of the 16th International Conference on Supercomputing (2002), pp. 14–25

    Google Scholar 

  28. T. Estrada, B. Zhang, P. Cicotti, R. Armen, M. Taufer, A scalable and accurate method for classifying protein–ligand binding geometries using a MapReduce approach. Comput. Biol. Med. 42 (7), 758–771 (2012)

    Article  Google Scholar 

  29. M. Ezell, D. Dillow, S. Oral, F. Wang, D. Tiwari, D.E. Maxwell, D. Leverman, J. Hill, I/O router placement and fine-grained routing on titan to support spider II, in Proceedings of Cray User Group Conference (CUG 2014) (2014)

    Google Scholar 

  30. W.-C. Feng, K.W. Cameron, The green500 list: encouraging sustainable supercomputing. Computer 40 (12), 50–55 (2007)

    Article  Google Scholar 

  31. B. Fitzpatrick, Distributed caching with memcached. Linux J. 2004 (124), 5 (2004)

    Google Scholar 

  32. R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, K.W. Cameron, PowerPack: energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst. 21 (5), 658–671 (2010)

    Article  Google Scholar 

  33. J.J. Hack, M.E. Papka, Big data: next-generation machines for big science. Comput. Sci. Eng. 17 (4), 63–65 (2015)

    Article  Google Scholar 

  34. Y. Ho, G. Huang, P. Li, Nonvolatile memristor memory: device characteristics and design implications, in IEEE/ACM International Conference on Computer-Aided Design (2009), pp. 485–490

    Google Scholar 

  35. Intel Corporation. Intel®; 64 and IA-32 Architectures Software Developer’s Manual (2015)

    Google Scholar 

  36. G. Kestor, R. Gioiosa, D. Kerbyson, A. Hoisie, Quantifying the energy cost of data movement in scientific applications, in The 16th IEEE International Symposium on Workload Characterization (IISWC), September 2013

    Google Scholar 

  37. P. Kogge, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, A. Snavely, W. Harrod, K. Hill, Exascale computing study: technology challenges in achieving exascale systems. Technical report, DARPA IPTO, Air Force Research Labs, 2008

    Google Scholar 

  38. A. Krishnamurthy, D.E. Culler, A. Dusseau, S.C. Goldstein, S. Lumetta, T. von Eicken, K. Yelick, Parallel programming in Split-C, in Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, Supercomputing ’93 (ACM, New York, NY, 1993), pp. 262–273

    Google Scholar 

  39. M. Laurenzano, M. Tikir, L. Carrington, A. Snavely, PEBIL: efficient static binary instrumentation for Linux, in 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), March 2010, pp. 175–183

    Google Scholar 

  40. J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, Z. Ghahramani, Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11, 985–1042 (2010)

    MathSciNet  MATH  Google Scholar 

  41. N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, C. Maltzahn, On the role of burst buffers in leadership-class storage systems, in 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST) (IEEE, 2012), pp. 1–11

    Google Scholar 

  42. G. Loh, 3D-Stacked memory architectures for multi-core processors, in Proceedings of the International Symposium on Computer Architecture (2008), pp. 453–464

    Google Scholar 

  43. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, K. Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 40 (6), 190–200 (2005)

    Article  Google Scholar 

  44. H. Meuer, E. Strohmaier, J. Dongarra, H. Simon, Top500 supercomputing sites, 2011

    Google Scholar 

  45. S. Oral, J. Simmons, J. Hill, D. Leverman, F. Wang, M. Ezell, R. Miller, D. Fuller, R. Gunasekaran, Y. Kim et al., Best practices and lessons learned from deploying and operating large-scale data-centric parallel file systems, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE Press, 2014), pp. 217–228

    Google Scholar 

  46. D.A. Patterson, Latency lags bandwidth. Commun. ACM 47 (10), 71–75 (2004)

    Article  Google Scholar 

  47. S.J. Plimpton, K.D. Devine, MapReduce in MPI for large-scale graph algorithms. Parallel Comput. 37 (9), 610–632 (2011). Emerging Programming Paradigms for Large-Scale Scientific Computing

    Google Scholar 

  48. G. Shipman, D. Dillow, S. Oral, F. Wang, The spider center wide file system: from concept to reality, in Proceedings, Cray User Group (CUG) Conference, Atlanta, GA, 2009

    Google Scholar 

  49. M. Taufer, M. Crowley, D. Price, A. Chien, C. Brooks III, Study of a highly accurate and fast protein-ligand docking based on molecular dynamics, in Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, April 2004, p. 188

    Google Scholar 

  50. M. Taufer, R. Armen, J. Chen, P. Teller, C. Brooks, Computational multiscale modeling in protein–ligand docking. Eng. Med. Biol. Mag. IEEE 28 (2), 58–69 (2009)

    Article  Google Scholar 

  51. J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J.R. Scott, N. Wilkens-Diehr, Xsede: accelerating scientific discovery. Comput. Sci. Eng. 16 (5), 62–74 (2014)

    Article  Google Scholar 

  52. F. Wang, S. Oral, G. Shipman, O. Drokin, T. Wang, I. Huang, Understanding Lustre filesystem internals. Oak Ridge National Laboratory, National Center for Computational Sciences, Tech. Rep, 2009

    Book  Google Scholar 

  53. W.A. Wulf, S.A. McKee, Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News 23 (1), 20–24 (1995)

    Article  Google Scholar 

  54. W. Xu, Y. Lu, Q. Li, E. Zhou, Z. Song, Y. Dong, W. Zhang, D. Wei, X. Zhang, H. Chen et al., Hybrid hierarchy storage system in milkyway-2 supercomputer. Front. Comput. Sci. 8 (3), 367–377 (2014)

    Article  MathSciNet  Google Scholar 

  55. B. Zhang, T. Estrada, P. Cicotti, M. Taufer, On efficiently capturing scientific properties in distributed big data without moving the data: a case study in distributed structural biology using mapreduce, in Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering, CSE ’13 (IEEE Computer Society, Washington, DC, 2013), pp. 117–124

    Google Scholar 

  56. B. Zhang, T. Estrada, P. Cicotti, M. Taufer, Enabling in-situ data analysis for large protein-folding trajectory datasets, in Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS ’14 (IEEE Computer Society, Washington, DC, 2014), pp. 221–230

    Google Scholar 

  57. B. Zhang, T. Estrada, P. Cicotti, P. Balaji, M. Taufer, Accurate scoring of drug conformations at the extreme scale, in Proceedings of 8th IEEE International Scalable Computing Challenge - Co-located with IEEE/ACM CCGrid, 2015

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pietro Cicotti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Cicotti, P. et al. (2016). Data Movement in Data-Intensive High Performance Computing. In: Arora, R. (eds) Conquering Big Data with High Performance Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-33742-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-33742-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-33740-1

  • Online ISBN: 978-3-319-33742-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics