Abstract
Data sizes are growing far faster than storage bandwidth. To address this growing gap, Integrated Application Workflows (IAWs) are being investigated as a potential to replace using a centralized storage array for storing intermediate data. IAWs run multiple simulation workflow components concurrently on an HPC resource connecting these components using compute area resources. These IAWs require high frequency and high volume data transfers between compute nodes and staging area nodes during the lifetime of a large parallel computation. The available network bandwidth between the two areas may not be enough to efficiently support the data movement. As the processing power available to compute resources increases, the requirements for this data transfer will become more difficult to satisfy and perhaps will not be satisfiable at all since network capabilities are not expanding at a comparable rate. It is necessary to reduce the volume of data without reducing the quality of data when it is being processed and analyzed. Delta resolves the issue by addressing the lifetime data transfer operations. Delta removes subsequent identical copies of already transmitted data prior to transfer and restores those pieces once the data has reached the destination using previously transmitted data. Delta is able to identify duplicated information and determine the most space efficient way to represent it. Initial tests show about 50 % reduction in data movement while maintaining the same data quality and transmission frequency. Given the simplicity of the approach and the log-based format employed by ADIOS, the approach can also be used to write less data to the storage array outside of IAW considerations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baker, A.H., Xu, H., Dennis, J.M., Levy, M.N., Nychka, D., Mickelson, S.A., Edwards, J., Vertenstein, M., Wegener, A.: A methodology for evaluating the impact of data compression on climate simulation data. In: The 23rd International Symposium on High-Performance Parallel, pp. 203–214 (2014)
Bangerth, W., Hartmann, R., Kanschat, G.: deal.II - a general purpose object oriented finite element library. ACM Trans. Math. Softw. 33(4), 24/1–24/27 (2007)
Burns, R.C., Long, D.D.E.: Efficient distributed backup with delta compression. In: Proceedings of the Fifth Workshop on I/O in Parallel and Distributed Systems, IOPADS 1997, New York, NY, USA, pp. 27–36. ACM (1997)
Housel, B.C., Lindquist, D.B.: Webexpress: a system for optimizing web browsing in a wireless environment. In: Proceedings of the 2nd Annual International Conference on Mobile Computing and Networking, MobiCom 1996, New York, NY, USA, pp. 108–116. ACM (1996)
Klappenecker, A., May, F.U.: Evolving better wavelet compression schemes. In: Proceedings of Wavelet Applications in Signal and Image Processing III, vol. 1214, pp. 614–622 (1995)
Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6852, pp. 366–379. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23400-2_34
Laros III, J.H., Pedretti, K.T., Kelly, S.M., Shu, W., Vaughan, C.T.: Energy based performance tuning for large scale high performance computing systems. In: Proceedings of the 2012 Symposium on High Performance Computing. Society for Computer Simulation International, p. 6 (2012)
Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: IPDPS, Rome, Italy (2009)
Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system: research articles. Concurr. Comput. Pract. Exper. 18(10), 1039–1065 (2006)
Malewicz, G., Foster, I., Rosenberg, A., Wilde, M.: A tool for prioritizing DAGMan jobs and its evaluation. In: 2006 15th IEEE International Symposium on High Performance Distributed Computing, pp. 156–168 (2006)
Manber, U., Manber, U.: Finding similar files in a large file system. In: Proceedings of the USENIX Winter 1994 Technical Conference, pp. 1–10 (1994)
Mullender, S.J., Leslie, I.M., McAuley, D.: Operating-system support for distributed multimedia. In: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, vol. 1, pp. 209–219 (1994)
Nicolae, B., Cappello, F.: Ai-ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 155–166. ACM (2013)
Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117(1), 1–19 (1995)
Spring, N.T., Wetherall, D.: A protocol-independent technique for eliminating redundant network traffic. ACM SIGCOMM Comput. Commun. Rev. 30(4), 87–95 (2000)
Xia, L., Hale, K.C., Dinda, P.A.: Concord: easily exploiting memory content redundancy through the content-aware service command. In: The 23rd International Symposium on High-Performance Parallel, pp. 25–36 (2014)
Acknowledgments
Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2014-17090 C.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Lofstead, J., Jean-Baptiste, G., Oldfield, R. (2016). Delta: Data Reduction for Integrated Application Workflows and Data Storage. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-46079-6_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)