HCOpt: An Automatic Optimizer for Configuration Parameters of Hadoop

Zhang, Xiong; Zeng, Linxi; Shi, Xuanhua; Wu, Song; Xie, Xia; Jin, Hai

doi:10.1007/978-3-319-31854-7_54

Xiong Zhang¹⁵,
Linxi Zeng¹⁵,
Xuanhua Shi¹⁵,
Song Wu¹⁵,
Xia Xie¹⁵ &
…
Hai Jin¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9567))

Included in the following conference series:

International Conference on Human Centered Computing

1617 Accesses

Abstract

MapReduce is an efficient tool for data-intensive applications. Hadoop, an open-source implementation of MapReduce, has been widely adopted and experienced by some enterprises and scientific computing communities. However, when users intend to run a MapReduce program in Hadoop, they have to set a number of configuration parameters to make sure the program runs efficiently. Users often run into performance problems because they are unaware of how to set these parameters. To address these performance problems, we focus on the optimization opportunities presented by the high configurability of Hadoop, and propose an automation tool named HCOpt for performance optimization of Hadoop configuration parameters. HCOpt uses a Profile Engine to collect monitoring information from running MapReduce programs, a Prediction Engine to estimate the performance of a given Hadoop configuration and a genetic-based search algorithm to find an optimized configuration in the large search space. Our evaluation shows that HCOpt reduces the job completion time of Hadoop applications by up to 20 % when compared to applications run with configuration parameters that suggested by the rule-based optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th on Symposium on Operating Systems Design & Implementation (OSDI), pp. 137–150 (2004)
Google Scholar
Apache hadoop. http://hadoop.apache.org
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP), pp. 29–43 (2003)
Google Scholar
Vaidya. hadoop.apache.org/mapreduce/docs/r0.21.0/vaidya.html
Hadoop Performance Monitoring UI. http://code.google.com/p/hadoop-toolkit/wiki
Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MapReduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 975–986 (2010)
Google Scholar
Bu, Y., Howe, B., Balazinska, M., Ernst, M.: HaLoop: efficient iterative data processing on large clusters. VLDB Endowment 3(1–2), 285–296 (2010)
Article Google Scholar
Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: MRShare: sharing across multiple queries in MapReduce. VLDB Endowment 3(1–2), 494–505 (2010)
Article MATH Google Scholar
Olston, C., Reed, B., Silberstein, A., Srivastava, U.: Automatic optimization of parallel dataflow programs. In: Proceedings of USENIX 2008 Annual Technical Conference, (ATC), pp. 267–273 (2008)
Google Scholar
A Instrumentation Tool for Java. https://kenai.com/projects/btrace
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Murthy, R.: Hive - a warehousing solution over a MapReduce Framework. VLDB Endowment 2(2), 1626–1629 (2009)
Article Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1099–1110 (2008)
Google Scholar

Download references

Acknowledgments

This paper is partly supported by the NSFC under grant No. 61433019 and No. 61370104, International Science & Technology Cooperation Program of China under grant No. 2015DFE12860, and Chinese Universities Scientific Fundunder grant No. 2015MS077.

Author information

Authors and Affiliations

Services Computing Technology and System Laboratory, Big Data Technology and System Laboratory, Cluster and Grid Computing Laboratory, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Xiong Zhang, Linxi Zeng, Xuanhua Shi, Song Wu, Xia Xie & Hai Jin

Authors

Xiong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Linxi Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Xuanhua Shi
View author publications
You can also search for this author in PubMed Google Scholar
Song Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xia Xie
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuanhua Shi .

Editor information

Editors and Affiliations

Wuhan, Hubei, China
Qiaohong Zu
Fujitsu Laboratories of Europe Ltd., Middlesex, United Kingdom
Bo Hu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Zeng, L., Shi, X., Wu, S., Xie, X., Jin, H. (2016). HCOpt: An Automatic Optimizer for Configuration Parameters of Hadoop. In: Zu, Q., Hu, B. (eds) Human Centered Computing. HCC 2016. Lecture Notes in Computer Science(), vol 9567. Springer, Cham. https://doi.org/10.1007/978-3-319-31854-7_54

Download citation

DOI: https://doi.org/10.1007/978-3-319-31854-7_54
Published: 01 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31853-0
Online ISBN: 978-3-319-31854-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics