Abstract
Many scientific applications manipulate large amount of data and, therefore, are parallelized on high-performance computing systems to take advantage of their computational power and memory space. The size of data processed by these large-scale applications can easily overwhelm the disk capacity of most systems. Thus, tertiary storage devices are used to store the data. The parallelization of this type of applications requires understanding of not only the data partition pattern among multiple processors but also the underlying storage architectures and the data storage pattern. In this paper, we present a meta-data management system which uses a database to record the information of datasets and manage these meta data to provide suitable I/O interface. As a result, users specify dataset names instead of data physical location to access data using optimal I/O calls without knowing the underlying storage structure.We use an astrophysics application to demonstrate that the management system can provide convenient programming environment with negligible database access overhead.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. Ellis and D. Kotz. Prefetching in File Systems forMIMDMultiprocessors. In International Conference on Parallel Processing, volume 1, pages 306–314, August 1989.
P. Cao, E. Felten, and K. Li. Application-Controlled File Caching Policies. In the 1994 Summer USENIX Technical Conference, pages 171–182, June 1994.
J. del Rosario and A. Choudhary. High Performance I/O for Parallel Computers: Problems and Prospects. IEEE Computer, March 1994.
J. Karpovich, A. Grimshaw, and J. French. Extensible File Systems (ELFS): An Object-Oriented Approach to High Performance File I/O. In The Ninth Annual Conference on Object-Oriented Programming Systems, pages 191–204, October 1994.
W. Gropp, E. Lusk, and R. Thakur. Using MPI-2: Advanced Features of the Message-Passing Interface. The MIT Press, Cambridge, MA, 1999.
G. Memik et al. APRIL: A Run-Time Library for Tape Resident Data. In NASA Goddard Conference on Mass Storage Systems and Technologies, March 2000.
X. Shen and A. Choudhary. I/O Optimization and Evaluation for Tertiary Storage Systems. In submitted to International Conference on Parallel Processing, 2000.
X. Shen et al. A Novel Application Development Environment for Large-Scale Scientific Computations. In International Conference on Supercomputing, May 2000.
A. Malagoli et al. A Portable and Efficient Parallel Code for Astrophysical Fluid Dynamics. http://astro.uchicago.edu/Computing/On Line/cfd95/camelse. html.
IBM. RS/6000 SP Software: Parallel I/O File System, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liao, Wk., Shen, X., Choudhary, A. (2000). Meta-data Management System for High-Performance Large-Scale Scientific Data Access. In: Valero, M., Prasanna, V.K., Vajapeyam, S. (eds) High Performance Computing — HiPC 2000. HiPC 2000. Lecture Notes in Computer Science, vol 1970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44467-X_26
Download citation
DOI: https://doi.org/10.1007/3-540-44467-X_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41429-2
Online ISBN: 978-3-540-44467-1
eBook Packages: Springer Book Archive