A Power-Aware Autonomic Approach for Performance Management of Scientific Applications in a Data Center Environment

Mehrotra, Rajat; Banicescu, Ioana; Srivastava, Srishti; Abdelwahed, Sherif

doi:10.1007/978-1-4939-2092-1_5

Rajat Mehrotra³,
Ioana Banicescu⁴,
Srishti Srivastava⁴ &
…
Sherif Abdelwahed³

4040 Accesses
6 Citations

Abstract

In the recent years, computer servers and data center facilities that provide high performance computing (HPC) for scientific applications have largely increased in numbers and have become great consumers of electrical power. Supercomputers often run at their peak performance for an efficient execution of scientific applications, and therefore consume an enormous amount of power that results in increased operational cost. Furthermore, an increase in the power consumption results in an increase in the temperature of the physical HPC systems, which in turn translates into increased failure rates and decreased reliability. Slowing down these HPC systems by reducing the individual speed of the processors, results in a loss of execution performance of the scientific application, due to the variation in processing speed. Another cause of the degradation in the execution performance of scientific applications is the variation in the computational resource availability due to its utilization by other applications executing on the same computing node in a space shared manner. The variations in processor availability can lead to severe performance degradation in the execution environment due to load imbalance and a violation of the performance objectives, such as meeting a deadline, and therefore it may result in high penalty in terms of revenue loss to the service providers. In this chapter, a utility based power-aware approach has been presented that uses a model-based control theoretic framework for executing scientific applications. The approach and related simulations indicate that the performance and the power requirements of the system can dynamically be adjusted, while maintaining the predefined quality of service (QoS) goals in terms of deadline of execution and power consumption of the HPC system, even in the presence of computational resource related perturbations. This approach is autonomic, performance directed, dynamically controlled, and independent of (does not interfere with) the execution of the application.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Report to congress on server and data center energy efficiency public law 109-431. Technical report, U.S. Environmental Protection Agency ENERGY STAR Program, August 2 2007.
Google Scholar
A simple way to estimate the cost of downtime. In Proceedings of the 16th USENIX conference on System administration (LISA '02), pages 185–188, Berkeley, CA, USA, 2002. USENIX Association.
Google Scholar
Wu chun Feng, Xizhou Feng, and Rong Ge. Green supercomputing comes of age. IT Professional, 10(1):17–23, 2008.
Article Google Scholar
W. Feng. Green destiny + mpiblast = bioinfomagic. In 10th International Conference on Parallel Computing (PARCO), pages 653–660, 2003.
Google Scholar
Rong Ge, Xizhou Feng, Wu-chun Feng, and Kirk W. Cameron. Cpu miser: A performance-directed, run-time system for power-aware clusters. In Proceedings of the 2007 International Conference on Parallel Processing (ICPP '07), page 18, Washington, DC, USA, 2007. IEEE Computer Society.
Google Scholar
R. Ge and K.W. Cameron. Power-aware speedup. In Proceedings of the IEEE International on Parallel and Distributed Processing Symposium (IPDPS)., pages 1–10, March 2007.
Google Scholar
Chung-hsing Hsu and Wu-chun Feng. A power-aware run-time system for high-performance computing. In Proceedings of the ACM/IEEE conference on Supercomputing (SC '05), page 1, Washington, DC, USA, 2005. IEEE Computer Society.
Google Scholar
Ioana Banicescu and Ricolindo L. Carino. Addressing the stochastic nature of scientific computations via dynamic loop scheduling. Electronic Transactions on Numerical Analysis 21:66-80, 2005.
Google Scholar
Rajat Mehrotra, Ioana Banicescu, and Srishti Srivastava. A utility based power-aware autonomic approach for running scientific applications. In Proceedings of IEEE 26th International Parallel and Distributed Processing Symposium (IPDPS), pages 1457–1466, 2012.
Google Scholar
David A. Patterson and John L. Hennessy. Computer Organization and Design, The Hardware/Software Interface, 4th Edition. Morgan Kaufmann, 2008.
Google Scholar
Yongpeng Liu and Hong Zhu. A survey of the research on power management techniques for high-performance systems. Software: Practice and Experience, 40(11):943–964, October 2010.
Google Scholar
M. Nakao, H. Hayama, and M. Nishioka. Which cooling air supply system is better for a high heat density room: underfloor or overhead? In Proceedings of Telecommunications Energy Conference, (INTELEC '91), pages 393–400, 1991.
Google Scholar
H. Hayama and M. Nakao. Air flow systems for telecommunications equipment rooms. In Proceedings of Telecommunications Energy Conference (INTELEC '89), pages 8.3/1–8.3/7 vol.1, 1989.
Google Scholar
Taliver Heath, Ana Paula Centeno, Pradeep George, Luiz Ramos, Yogesh Jaluria, and Ricardo Bianchini. Mercury and freon: temperature emulation and management for server systems. In Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, ASPLOS XII, pages 106–116, New York, NY, USA, 2006. ACM.
Google Scholar
Justin Moore, Jeff Chase, Parthasarathy Ranganathan, and Ratnesh Sharma. Making scheduling “cool”: temperature-aware workload placement in data centers. In Proceedings of the annual conference on USENIX Annual Technical Conference, ATEC '05, pages 5–5, Berkeley, CA, USA, 2005. USENIX Association.
Google Scholar
Tridib Mukherjee, Ayan Banerjee, Georgios Varsamopoulos, Sandeep K. S. Gupta, and Sanjay Rungta. Spatio-temporal thermal-aware job scheduling to minimize energy consumption in virtualized heterogeneous data centers. Computer Networks, 53(17):2888–2904, December 2009.
Google Scholar
Eun Kyung Lee, Indraneel Kulkarni, Dario Pompili, and Manish Parashar. Proactive thermal management in green datacenters. Journal of Supercomput., 60(2):165–195, May 2012.
Article Google Scholar
Blue gene. http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/bluegene/ [May 2013].
Severin Zimmermann, Ingmar Meijer, Manish K. Tiwari, Stephan Paredes, Bruno Michel, and Dimos Poulikakos. Aquasar: A hot water cooled data center with direct energy reuse. Energy, 43(1):237–245, 2012. 2nd International Meeting on Cleaner Combustion (CM0901-Detailed Chemical Models for Cleaner Combustion).
Article Google Scholar
Chung-Hsing Hsu and Wu-Chun Feng. Effective dynamic voltage scaling through cpu-boundedness detection. In In Workshop on Power Aware Computing Systems, pages 135–149, 2004.
Google Scholar
Vincent W. Freeh, David K. Lowenthal, Feng Pan, Nandini Kappiah, Rob Springer, Barry L. Rountree, and Mark E. Femal. Analyzing the energy-time trade-off in high-performance computing applications. IEEE Trans. Parallel Distrib. Syst., 18:835–848, June 2007.
Google Scholar
Michael Knobloch. Chapter 1 - energy-aware high performance computing—a survey. In Ali Hurson, editor, Green and Sustainable Computing: Part II, volume 88 of Advances in Computers, pages 1–78. Elsevier, 2013.
Google Scholar
B. J. Smith. Architecture and applications of the hep multiprocessor computer system. In SPIE - Real-Time Signal Processing IV, pages 241–248, 1981.
Google Scholar
Clyde P. Kruskal and Alan Weiss. Allocating independent subtasks on parallel processors. IEEE Trans. Softw. Eng., 11(10):1001–1016, 1985.
Google Scholar
T. H. Tzen and L. M. Ni. Trapezoid self-scheduling: A practical scheduling scheme for parallel compilers. IEEE Trans. Parallel Distrib. Syst., 4(1):87–98, 1993.
Article Google Scholar
Susan Flynn Hummel, Edith Schonberg, and Lawrence E. Flynn. Factoring: a method for scheduling parallel loops. Communication of ACM, 35(8):90–101, 1992.
Google Scholar
Ioana Banicescu and Susan Flynn Hummel. Balancing processor loads and exploiting data locality in n-body simulations. In Proceedings of the 1995 ACM/IEEE Conference on Supercomputing, Supercomputing '95 (on CDROM), pages 43–55, New York, NY, USA, 1995. ACM.
Google Scholar
Susan Flynn Hummel, Jeanette Schmidt, R. N. Uma, and Joel Wein. Load-sharing in heterogeneous systems via weighted factoring. In Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures (SPAA '96), pages 318–328, New York, NY, USA, 1996. ACM.
Google Scholar
Ioana Banicescu and Vijay Velusamy. Performance of scheduling scientific applications with adaptive weighted factoring. In Proceedings of the 15th International Parallel & Distributed Processing Symposium (IPDPS '01), page 84, Washington, DC, USA, 2001. IEEE Computer Society.
Google Scholar
Ricolindo L. Carino Cariño and Ioana Banicescu. Dynamic load balancing with adaptive factoring methods in scientific applications. The Journal of Supercomputing, 44(1):41–63, 2008.
Article Google Scholar
Ioana Banicescu, Vijay Velusamy, and Johnny Devaprasad. On the scalability of dynamic scheduling scientific applications with adaptive weighted factoring. Cluster Computing, 6(3):215–226, 2003.
Article Google Scholar
Ioana Banicescu and Vijay Velusamy. Load balancing highly irregular computations with the adaptive factoring. In 16th International Parallel and Distributed Processing Symposium (IPDPS 2002), 15-19 April 2002, Fort Lauderdale, FL, USA, CD-ROM/Abstracts Proceedings. IEEE Computer Society, 2002.
Google Scholar
Ricolindo Cari˜no, Ioana Banicescu, Thomas Rauber, and Gudula Rünger. Dynamic loop scheduling with processor groups. In Proceedings of the ISCA Parallel and distributed Computing Symposium (PDCS), pages 78–84, 2004.
Google Scholar
Yong Dong, Juan Chen, Xuejun Yang, Lin Deng, and Xuemeng Zhang. Energy-oriented openmp parallel loop scheduling. In Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications, pages 162–169, Washington, DC, USA, 2008. IEEE Computer Society.
Google Scholar
Anton Cervin, Johan Eker, Bo Bernhardsson, and Karl-Erik Arzen. Feedback–feedforward scheduling of control tasks. Real-Time Systems, 23(1/2):25–53, 2002.
Article MATH Google Scholar
T.F. Abdelzaher, K.G. Shin, and N. Bhatti. Performance guarantees for web server end-systems: a control-theoretical approach. IEEE Transactions on Parallel and Distributed Systems, 13(1):80–96, Jan 2002.
Article Google Scholar
R. Mehrotra, A. Dubey, S. Abdelwahed, and W. Monceaux. Large scale monitoring and online analysis in a distributed virtualized environment. In 8th IEEE International Conference and Workshops on Engineering of Autonomic and Autonomous Systems (EASe), 2011, pages 1–9, 2011.
Google Scholar
Chenyang Lu, Guillermo A. Alvarez, and John Wilkes. Aqueduct: Online data migration with performance guarantees. In FAST '02: Proceedings of the 1st USENIX Conference on File and Storage Technologies, page 21, Berkeley, CA, USA, 2002. USENIX Association.
Google Scholar
R. Mehrotra, A. Dubey, S. Abdelwahed, and A. Tantawi. Integrated monitoring and control for performance management of distributed enterprise systems. In 2010 IEEE International Symposium on Modeling, Analysis Simulation of Computer and Telecommunication Systems (MASCOTS), pages 424–426, 2010.
Google Scholar
Rajat Mehrotra, Abhishek Dubey, Sherif Abdelwahed, and Asser Tantawi. A Power-aware Modeling and Autonomic Management Framework for Distributed Computing Systems. CRC Press, 2011.
Google Scholar
Dara Kusic, Nagarajan Kandasamy, and Guofei Jiang. Approximation modeling for the online performance management of distributed computing systems. In ICAC '07: Proceedings of the Fourth International Conference on Autonomic Computing, page 23, Washington, DC, USA, 2007. IEEE Computer Society.
Google Scholar
Rajat Mehrotra, Abhishek Dubey, Sherif Abdelwahed, and Asser Tantawi. Model identification for performance management of distributed enterprise systems. (ISIS-10-104), 2010.
Google Scholar
S. Abdelwahed, Nagarajan Kandasamy, and Sandeep Neema. Online control for self-management in computing systems. In Proceedings of Real-Time and Embedded Technology and Applications Symposium,(RTAS) 2004., pages 368–375, 2004.
Google Scholar
Abhishek Dubey, Rajat Mehrotra, Sherif Abdelwahed, and Asser Tantawi. Performance modeling of distributed multi-tier enterprise systems. SIGMETRICS Performance Evaluation Review, 37(2):9–11, 2009.
Article Google Scholar
S. Abdelwahed, Jia Bai, Rong Su, and Nagarajan Kandasamy. On the application of predictive control techniques for adaptive performance management of computing systems. IEEE Transactions on Network and Service Management, 6(4):212–225, 2009.
Article Google Scholar

Download references

Acknowledgment

The authors would like to thank the National Science Foundation (NSF) for its support of this work through the grant NSF IIP-1034897.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, NSF Center for Cloud and Autonomic Computing, Mississippi State University, MS, USA
Rajat Mehrotra & Sherif Abdelwahed
Department of Computer Science and Engineering, NSF Center for Cloud and Autonomic Computing, Mississippi State University, MS, USA
Ioana Banicescu & Srishti Srivastava

Authors

Rajat Mehrotra
View author publications
You can also search for this author in PubMed Google Scholar
Ioana Banicescu
View author publications
You can also search for this author in PubMed Google Scholar
Srishti Srivastava
View author publications
You can also search for this author in PubMed Google Scholar
Sherif Abdelwahed
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rajat Mehrotra .

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, North Dakota State University, Fargo, North Dakota, USA
Samee U. Khan
School of Information Technologies, The University of Sydney, Sydney, New South Wales, Australia
Albert Y. Zomaya

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mehrotra, R., Banicescu, I., Srivastava, S., Abdelwahed, S. (2015). A Power-Aware Autonomic Approach for Performance Management of Scientific Applications in a Data Center Environment. In: Khan, S., Zomaya, A. (eds) Handbook on Data Centers. Springer, New York, NY. https://doi.org/10.1007/978-1-4939-2092-1_5

Download citation

DOI: https://doi.org/10.1007/978-1-4939-2092-1_5
Published: 17 March 2015
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4939-2091-4
Online ISBN: 978-1-4939-2092-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics