Data Movement in Data-Intensive High Performance Computing

Cicotti, Pietro; Oral, Sarp; Kestor, Gokcen; Gioiosa, Roberto; Strande, Shawn; Taufer, Michela; Rogers, James H.; Abbasi, Hasan; Hill, Jason; Carrington, Laura

doi:10.1007/978-3-319-33742-5_3

Pietro Cicotti²,
Sarp Oral³,
Gokcen Kestor⁴,
Roberto Gioiosa⁴,
Shawn Strande⁵,
Michela Taufer⁶,
James H. Rogers³,
Hasan Abbasi³,
Jason Hill³ &
…
Laura Carrington²

1889 Accesses
2 Citations

Abstract

The cost of executing a floating point operation has been decreasing for decades at a much higher rate than that of moving data. Bandwidth and latency, two key metrics that determine the cost of moving data, have degraded significantly relative to processor cycle time and execution rate. Despite the limitation of sub-micron processor technology and the end of Dennard scaling, this trend will continue in the short-term making data movement a performance-limiting factor and an energy/power efficiency concern. Even more so in the context of large-scale and data-intensive systems and workloads. This chapter gives an overview of the aspects of moving data across a system, from the storage system to the computing system down to the node and processor level, with case study and contributions from researchers at the San Diego Supercomputer Center, the Oak Ridge National Laboratory, the Pacific Northwest National Laboratory, and the University of Delaware.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A director-class switch is loosely defined as a high port count switch connecting different fabrics.

References

Open Community Runtime. https://xstackwiki.modelado.org/Open_Community_Runtime (2016)
Stampede - Dell PowerEdge C8220 Cluster with Intel Xeon Phi coprocessors. http://www.tacc.utexas.edu/resources/hpc (2016)
Advanced Scientific Computing Research (ASCR). Scientific discovery through advanced computing (SciDAC) Co-Design. http://science.energy.gov/ascr/research/scidac/co-design/ (2016)
S. Amarasinghe, D. Campbell, W. Carlson, A. Chien, W. Dally, E. Elnohazy, M. Hall, R. Harrison, W. Harrod, K. Hill, A. Snavely, ExaScale software study: software challenges in Extreme scale systems. Technical report, DARPA IPTO, Air Force Research Labs, 2009
Google Scholar
S. Amarasinghe, M. Hall, R. Lethin, K. Pingali, D. Quinlan, V. Sarkar, J. Shalf, R. Lucas, K. Yelick, P. Balaji, P. C. Diniz, A. Koniges, M. Snir, S.R. Sachs, Report of the Workshop on Exascale Programming Challenges. Technical report, US Department of Energy, 2011
Google Scholar
J.A. Ang, B.W. Barrett, K.B. Wheeler, R.C. Murphy, Introducing the graph 500, in Proceedings of Cray User’s Group Meeting (CUG), May 2010
Google Scholar
T. Baer, V. Hazlewood, J. Heo, R. Mohr, J. Walsh, Large Lustre File System Experiences at NICS, Cray User Group, Atlanta, GA, May 2009
Google Scholar
J.D. Bakos, High-performance heterogeneous computing with the convey hc-1. Comput. Sci. Eng. 12 (6), 80–87 (2010)
Article Google Scholar
T. Barrett, M. Sumit, K. Taek-Jun, S. Ravinder, C. Sachit, J. Sondeen, J. Draper, A double-data rate (DDR) processing-in-memory (PIM) device with wideword floating-point capability, in Proceedings of the IEEE International Symposium on Circuits and Systems, 2006
Google Scholar
K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R.S. Williams, K. Yelick, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Keckler, D. Klein, P. Kogge, R.S. Williams, K. Yelick, Exascale computing study: technology challenges in achieving exascale systems. Peter Kogge, editor and study lead (2008)
Google Scholar
K. Beyls, E.H. D’Hollander, Reuse distance as a metric for cache behavior, in Proceedings of the IASTED Conference on Parallel and Distributed Computing System, 2001
Google Scholar
K. Beyls, E.H. D’Hollander, Reuse distance-based cache hint selection, in Proceedings of the 8th International Euro-Par Conference on Parallel Processing, Euro-Par ’02, London, 2002 (Springer-Verlag, UK, 2002, pp. 265–274)
Google Scholar
A.S. Bland, R.A. Kendall, D.B. Kothe, J.H. Rogers, G.M. Shipman, Jaguar: the world’s most powerful computer. Memory (TB) 300 (62), 362 (2009)
Google Scholar
A.S. Bland, J.C. Wells, O.E. Messer, O.R. Hernandez, J.H. Rogers, Titan: early experience with the Cray XK6 at oak ridge national laboratory, in Proceedings of Cray User Group Conference (CUG 2012), 2012
Google Scholar
Z. Budimlić, M. Burke, V. Cavé, K. Knobe, G. Lowney, R. Newton, J. Palsberg, D. Peixotto, V. Sarkar, F. Schlimbach, S. Tacsirlar, Concurrent collections. Sci. Program. 18 (3–4), 203–217 (2010)
Google Scholar
J.A. Butts, G.S. Sohi, A static power model for architects, in Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, MICRO 33 (ACM, New York, NY, 2000), pp. 191–201
Google Scholar
P. Cicotti, Tarragon: a programming model for latency-hiding scientific computations. Ph.D. thesis, La Jolla, CA, USA, 2011. AAI3449479
Google Scholar
P. Cicotti, L. Carrington, ADAMANT: tools to capture, analyze, and manage data movement, in International Conference on Computational Science (ICCS) San Diego, California, USA, 2016
Google Scholar
P. Cicotti, M. Norman, R. Sinkovits, A. Snavely, S. Strande, Gordon: A Novel Architecture for Data Intensive Computing volume On the road to Exascale Computing: Contemporary Architectures in High Performance Computing, chapter 17 Chapman and Hall (2013)
Google Scholar
B. Dally, Power, programmability, and granularity: the challenges of exascale computing, in Proceedings of International Parallel and Distributed Processing Symposium (2011), pp. 878–878
Google Scholar
J. Dean, S. Ghemawat, MapReduce: simplified data processing on large clusters. Commun. ACM 51 (1), 107–113 (2008)
Article Google Scholar
J. Demmel, Communication avoiding algorithms, in Proceedings of the 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC ’12 (IEEE Computer Society, Washington, DC, 2012), pp. 1942–2000
Google Scholar
J. Deslippe, B. Austin, C. Daley, W.-S. Yang, Lessons learned from optimizing science kernels for intel’s knights landing. Comput. Sci. Eng. 17 (3), 30–42 (2015)
Article Google Scholar
D. Dillow, G.M. Shipman, S. Oral, Z. Zhang, Y. Kim et al., Enhancing i/o throughput via efficient routing and placement for large-scale parallel file systems, in Performance Computing and Communications Conference (IPCCC), 2011 IEEE 30th International (IEEE, 2011), pp. 1–9
Google Scholar
C. Ding, Y. Zhong, Predicting whole-program locality through reuse distance analysis, in Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI ’03 (ACM, New York, NY, 2003), pp. 245–257
Google Scholar
H. Dong, H. Nak, D. Lewis, H.-H. Lee, An optimized 3d-stacked memory architecture by exploiting excessive, high-density TSV bandwidth, in IEEE 16th International Symposium on High Performance Computer Architecture (HPCA) (2010), pp. 1–12
Google Scholar
J. Draper, J. Chame, M. Hall, C. Steele, T. Barrett, J. LaCoss, J. Granacki, J. Shin, C. Chen, C. W. Kang, I. Kim, G. Daglikoca, The architecture of the diva processing-in-memory chip, in Proceedings of the 16th International Conference on Supercomputing (2002), pp. 14–25
Google Scholar
T. Estrada, B. Zhang, P. Cicotti, R. Armen, M. Taufer, A scalable and accurate method for classifying protein–ligand binding geometries using a MapReduce approach. Comput. Biol. Med. 42 (7), 758–771 (2012)
Article Google Scholar
M. Ezell, D. Dillow, S. Oral, F. Wang, D. Tiwari, D.E. Maxwell, D. Leverman, J. Hill, I/O router placement and fine-grained routing on titan to support spider II, in Proceedings of Cray User Group Conference (CUG 2014) (2014)
Google Scholar
W.-C. Feng, K.W. Cameron, The green500 list: encouraging sustainable supercomputing. Computer 40 (12), 50–55 (2007)
Article Google Scholar
B. Fitzpatrick, Distributed caching with memcached. Linux J. 2004 (124), 5 (2004)
Google Scholar
R. Ge, X. Feng, S. Song, H.-C. Chang, D. Li, K.W. Cameron, PowerPack: energy profiling and analysis of high-performance systems and applications. IEEE Trans. Parallel Distrib. Syst. 21 (5), 658–671 (2010)
Article Google Scholar
J.J. Hack, M.E. Papka, Big data: next-generation machines for big science. Comput. Sci. Eng. 17 (4), 63–65 (2015)
Article Google Scholar
Y. Ho, G. Huang, P. Li, Nonvolatile memristor memory: device characteristics and design implications, in IEEE/ACM International Conference on Computer-Aided Design (2009), pp. 485–490
Google Scholar
Intel Corporation. Intel^®; 64 and IA-32 Architectures Software Developer’s Manual (2015)
Google Scholar
G. Kestor, R. Gioiosa, D. Kerbyson, A. Hoisie, Quantifying the energy cost of data movement in scientific applications, in The 16th IEEE International Symposium on Workload Characterization (IISWC), September 2013
Google Scholar
P. Kogge, K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, A. Snavely, W. Harrod, K. Hill, Exascale computing study: technology challenges in achieving exascale systems. Technical report, DARPA IPTO, Air Force Research Labs, 2008
Google Scholar
A. Krishnamurthy, D.E. Culler, A. Dusseau, S.C. Goldstein, S. Lumetta, T. von Eicken, K. Yelick, Parallel programming in Split-C, in Proceedings of the 1993 ACM/IEEE Conference on Supercomputing, Supercomputing ’93 (ACM, New York, NY, 1993), pp. 262–273
Google Scholar
M. Laurenzano, M. Tikir, L. Carrington, A. Snavely, PEBIL: efficient static binary instrumentation for Linux, in 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), March 2010, pp. 175–183
Google Scholar
J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, Z. Ghahramani, Kronecker graphs: an approach to modeling networks. J. Mach. Learn. Res. 11, 985–1042 (2010)
MathSciNet MATH Google Scholar
N. Liu, J. Cope, P. Carns, C. Carothers, R. Ross, G. Grider, A. Crume, C. Maltzahn, On the role of burst buffers in leadership-class storage systems, in 2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST) (IEEE, 2012), pp. 1–11
Google Scholar
G. Loh, 3D-Stacked memory architectures for multi-core processors, in Proceedings of the International Symposium on Computer Architecture (2008), pp. 453–464
Google Scholar
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, K. Hazelwood, Pin: building customized program analysis tools with dynamic instrumentation. SIGPLAN Not. 40 (6), 190–200 (2005)
Article Google Scholar
H. Meuer, E. Strohmaier, J. Dongarra, H. Simon, Top500 supercomputing sites, 2011
Google Scholar
S. Oral, J. Simmons, J. Hill, D. Leverman, F. Wang, M. Ezell, R. Miller, D. Fuller, R. Gunasekaran, Y. Kim et al., Best practices and lessons learned from deploying and operating large-scale data-centric parallel file systems, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (IEEE Press, 2014), pp. 217–228
Google Scholar
D.A. Patterson, Latency lags bandwidth. Commun. ACM 47 (10), 71–75 (2004)
Article Google Scholar
S.J. Plimpton, K.D. Devine, MapReduce in MPI for large-scale graph algorithms. Parallel Comput. 37 (9), 610–632 (2011). Emerging Programming Paradigms for Large-Scale Scientific Computing
Google Scholar
G. Shipman, D. Dillow, S. Oral, F. Wang, The spider center wide file system: from concept to reality, in Proceedings, Cray User Group (CUG) Conference, Atlanta, GA, 2009
Google Scholar
M. Taufer, M. Crowley, D. Price, A. Chien, C. Brooks III, Study of a highly accurate and fast protein-ligand docking based on molecular dynamics, in Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th International, April 2004, p. 188
Google Scholar
M. Taufer, R. Armen, J. Chen, P. Teller, C. Brooks, Computational multiscale modeling in protein–ligand docking. Eng. Med. Biol. Mag. IEEE 28 (2), 58–69 (2009)
Article Google Scholar
J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J.R. Scott, N. Wilkens-Diehr, Xsede: accelerating scientific discovery. Comput. Sci. Eng. 16 (5), 62–74 (2014)
Article Google Scholar
F. Wang, S. Oral, G. Shipman, O. Drokin, T. Wang, I. Huang, Understanding Lustre filesystem internals. Oak Ridge National Laboratory, National Center for Computational Sciences, Tech. Rep, 2009
Book Google Scholar
W.A. Wulf, S.A. McKee, Hitting the memory wall: implications of the obvious. SIGARCH Comput. Archit. News 23 (1), 20–24 (1995)
Article Google Scholar
W. Xu, Y. Lu, Q. Li, E. Zhou, Z. Song, Y. Dong, W. Zhang, D. Wei, X. Zhang, H. Chen et al., Hybrid hierarchy storage system in milkyway-2 supercomputer. Front. Comput. Sci. 8 (3), 367–377 (2014)
Article MathSciNet Google Scholar
B. Zhang, T. Estrada, P. Cicotti, M. Taufer, On efficiently capturing scientific properties in distributed big data without moving the data: a case study in distributed structural biology using mapreduce, in Proceedings of the 2013 IEEE 16th International Conference on Computational Science and Engineering, CSE ’13 (IEEE Computer Society, Washington, DC, 2013), pp. 117–124
Google Scholar
B. Zhang, T. Estrada, P. Cicotti, M. Taufer, Enabling in-situ data analysis for large protein-folding trajectory datasets, in Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, IPDPS ’14 (IEEE Computer Society, Washington, DC, 2014), pp. 221–230
Google Scholar
B. Zhang, T. Estrada, P. Cicotti, P. Balaji, M. Taufer, Accurate scoring of drug conformations at the extreme scale, in Proceedings of 8th IEEE International Scalable Computing Challenge - Co-located with IEEE/ACM CCGrid, 2015
Google Scholar

Download references

Author information

Authors and Affiliations

San Diego Supercomputer Center/University of California, San Diego, USA
Pietro Cicotti & Laura Carrington
Oak Ridge National Lab, Oak Ridge, USA
Sarp Oral, James H. Rogers, Hasan Abbasi & Jason Hill
Pacific Northwest National Lab, Richland, USA
Gokcen Kestor & Roberto Gioiosa
San Diego Supercomputer Center, San Diego, USA
Shawn Strande
University of Delaware, Newark, Delaware, USA
Michela Taufer

Authors

Pietro Cicotti
View author publications
You can also search for this author in PubMed Google Scholar
Sarp Oral
View author publications
You can also search for this author in PubMed Google Scholar
Gokcen Kestor
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Gioiosa
View author publications
You can also search for this author in PubMed Google Scholar
Shawn Strande
View author publications
You can also search for this author in PubMed Google Scholar
Michela Taufer
View author publications
You can also search for this author in PubMed Google Scholar
James H. Rogers
View author publications
You can also search for this author in PubMed Google Scholar
Hasan Abbasi
View author publications
You can also search for this author in PubMed Google Scholar
Jason Hill
View author publications
You can also search for this author in PubMed Google Scholar
Laura Carrington
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pietro Cicotti .

Editor information

Editors and Affiliations

Texas Advanced Computing Center, Austin, Texas, USA
Ritu Arora

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cicotti, P. et al. (2016). Data Movement in Data-Intensive High Performance Computing. In: Arora, R. (eds) Conquering Big Data with High Performance Computing. Springer, Cham. https://doi.org/10.1007/978-3-319-33742-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-33742-5_3
Published: 17 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33740-1
Online ISBN: 978-3-319-33742-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics