Skip to main content

Delta: Data Reduction for Integrated Application Workflows and Data Storage

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

Abstract

Data sizes are growing far faster than storage bandwidth. To address this growing gap, Integrated Application Workflows (IAWs) are being investigated as a potential to replace using a centralized storage array for storing intermediate data. IAWs run multiple simulation workflow components concurrently on an HPC resource connecting these components using compute area resources. These IAWs require high frequency and high volume data transfers between compute nodes and staging area nodes during the lifetime of a large parallel computation. The available network bandwidth between the two areas may not be enough to efficiently support the data movement. As the processing power available to compute resources increases, the requirements for this data transfer will become more difficult to satisfy and perhaps will not be satisfiable at all since network capabilities are not expanding at a comparable rate. It is necessary to reduce the volume of data without reducing the quality of data when it is being processed and analyzed. Delta resolves the issue by addressing the lifetime data transfer operations. Delta removes subsequent identical copies of already transmitted data prior to transfer and restores those pieces once the data has reached the destination using previously transmitted data. Delta is able to identify duplicated information and determine the most space efficient way to represent it. Initial tests show about 50 % reduction in data movement while maintaining the same data quality and transmission frequency. Given the simplicity of the approach and the log-based format employed by ADIOS, the approach can also be used to write less data to the storage array outside of IAW considerations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baker, A.H., Xu, H., Dennis, J.M., Levy, M.N., Nychka, D., Mickelson, S.A., Edwards, J., Vertenstein, M., Wegener, A.: A methodology for evaluating the impact of data compression on climate simulation data. In: The 23rd International Symposium on High-Performance Parallel, pp. 203–214 (2014)

    Google Scholar 

  2. Bangerth, W., Hartmann, R., Kanschat, G.: deal.II - a general purpose object oriented finite element library. ACM Trans. Math. Softw. 33(4), 24/1–24/27 (2007)

    Article  MathSciNet  Google Scholar 

  3. Burns, R.C., Long, D.D.E.: Efficient distributed backup with delta compression. In: Proceedings of the Fifth Workshop on I/O in Parallel and Distributed Systems, IOPADS 1997, New York, NY, USA, pp. 27–36. ACM (1997)

    Google Scholar 

  4. Housel, B.C., Lindquist, D.B.: Webexpress: a system for optimizing web browsing in a wireless environment. In: Proceedings of the 2nd Annual International Conference on Mobile Computing and Networking, MobiCom 1996, New York, NY, USA, pp. 108–116. ACM (1996)

    Google Scholar 

  5. Klappenecker, A., May, F.U.: Evolving better wavelet compression schemes. In: Proceedings of Wavelet Applications in Signal and Image Processing III, vol. 1214, pp. 614–622 (1995)

    Google Scholar 

  6. Lakshminarasimhan, S., Shah, N., Ethier, S., Klasky, S., Latham, R., Ross, R., Samatova, N.F.: Compressing the incompressible with ISABELA: in-situ reduction of spatio-temporal data. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011. LNCS, vol. 6852, pp. 366–379. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23400-2_34

    Chapter  Google Scholar 

  7. Laros III, J.H., Pedretti, K.T., Kelly, S.M., Shu, W., Vaughan, C.T.: Energy based performance tuning for large scale high performance computing systems. In: Proceedings of the 2012 Symposium on High Performance Computing. Society for Computer Simulation International, p. 6 (2012)

    Google Scholar 

  8. Lofstead, J., Zheng, F., Klasky, S., Schwan, K.: Adaptable, metadata rich IO methods for portable high performance IO. In: IPDPS, Rome, Italy (2009)

    Google Scholar 

  9. Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system: research articles. Concurr. Comput. Pract. Exper. 18(10), 1039–1065 (2006)

    Article  Google Scholar 

  10. Malewicz, G., Foster, I., Rosenberg, A., Wilde, M.: A tool for prioritizing DAGMan jobs and its evaluation. In: 2006 15th IEEE International Symposium on High Performance Distributed Computing, pp. 156–168 (2006)

    Google Scholar 

  11. Manber, U., Manber, U.: Finding similar files in a large file system. In: Proceedings of the USENIX Winter 1994 Technical Conference, pp. 1–10 (1994)

    Google Scholar 

  12. Mullender, S.J., Leslie, I.M., McAuley, D.: Operating-system support for distributed multimedia. In: Proceedings of the USENIX Summer 1994 Technical Conference on USENIX Summer 1994 Technical Conference, vol. 1, pp. 209–219 (1994)

    Google Scholar 

  13. Nicolae, B., Cappello, F.: Ai-ckpt: leveraging memory access patterns for adaptive asynchronous incremental checkpointing. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 155–166. ACM (2013)

    Google Scholar 

  14. Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. J. Comput. Phys. 117(1), 1–19 (1995)

    Article  MATH  Google Scholar 

  15. Spring, N.T., Wetherall, D.: A protocol-independent technique for eliminating redundant network traffic. ACM SIGCOMM Comput. Commun. Rev. 30(4), 87–95 (2000)

    Article  Google Scholar 

  16. Xia, L., Hale, K.C., Dinda, P.A.: Concord: easily exploiting memory content redundancy through the content-aware service command. In: The 23rd International Symposium on High-Performance Parallel, pp. 25–36 (2014)

    Google Scholar 

Download references

Acknowledgments

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND2014-17090 C.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jay Lofstead .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Lofstead, J., Jean-Baptiste, G., Oldfield, R. (2016). Delta: Data Reduction for Integrated Application Workflows and Data Storage. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46079-6_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46078-9

  • Online ISBN: 978-3-319-46079-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics