Skip to main content

HCOpt: An Automatic Optimizer for Configuration Parameters of Hadoop

  • Conference paper
  • First Online:
Human Centered Computing (HCC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9567))

Included in the following conference series:

  • 1617 Accesses

Abstract

MapReduce is an efficient tool for data-intensive applications. Hadoop, an open-source implementation of MapReduce, has been widely adopted and experienced by some enterprises and scientific computing communities. However, when users intend to run a MapReduce program in Hadoop, they have to set a number of configuration parameters to make sure the program runs efficiently. Users often run into performance problems because they are unaware of how to set these parameters. To address these performance problems, we focus on the optimization opportunities presented by the high configurability of Hadoop, and propose an automation tool named HCOpt for performance optimization of Hadoop configuration parameters. HCOpt uses a Profile Engine to collect monitoring information from running MapReduce programs, a Prediction Engine to estimate the performance of a given Hadoop configuration and a genetic-based search algorithm to find an optimized configuration in the large search space. Our evaluation shows that HCOpt reduces the job completion time of Hadoop applications by up to 20 % when compared to applications run with configuration parameters that suggested by the rule-based optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Proceedings of the 6th on Symposium on Operating Systems Design & Implementation (OSDI), pp. 137–150 (2004)

    Google Scholar 

  2. Apache hadoop. http://hadoop.apache.org

  3. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP), pp. 29–43 (2003)

    Google Scholar 

  4. Vaidya. hadoop.apache.org/mapreduce/docs/r0.21.0/vaidya.html

  5. Hadoop Performance Monitoring UI. http://code.google.com/p/hadoop-toolkit/wiki

  6. Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in MapReduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 975–986 (2010)

    Google Scholar 

  7. Bu, Y., Howe, B., Balazinska, M., Ernst, M.: HaLoop: efficient iterative data processing on large clusters. VLDB Endowment 3(1–2), 285–296 (2010)

    Article  Google Scholar 

  8. Nykiel, T., Potamias, M., Mishra, C., Kollios, G., Koudas, N.: MRShare: sharing across multiple queries in MapReduce. VLDB Endowment 3(1–2), 494–505 (2010)

    Article  MATH  Google Scholar 

  9. Olston, C., Reed, B., Silberstein, A., Srivastava, U.: Automatic optimization of parallel dataflow programs. In: Proceedings of USENIX 2008 Annual Technical Conference, (ATC), pp. 267–273 (2008)

    Google Scholar 

  10. A Instrumentation Tool for Java. https://kenai.com/projects/btrace

  11. Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Murthy, R.: Hive - a warehousing solution over a MapReduce Framework. VLDB Endowment 2(2), 1626–1629 (2009)

    Article  Google Scholar 

  12. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 1099–1110 (2008)

    Google Scholar 

Download references

Acknowledgments

This paper is partly supported by the NSFC under grant No. 61433019 and No. 61370104, International Science & Technology Cooperation Program of China under grant No. 2015DFE12860, and Chinese Universities Scientific Fundunder grant No. 2015MS077.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuanhua Shi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, X., Zeng, L., Shi, X., Wu, S., Xie, X., Jin, H. (2016). HCOpt: An Automatic Optimizer for Configuration Parameters of Hadoop. In: Zu, Q., Hu, B. (eds) Human Centered Computing. HCC 2016. Lecture Notes in Computer Science(), vol 9567. Springer, Cham. https://doi.org/10.1007/978-3-319-31854-7_54

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31854-7_54

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31853-0

  • Online ISBN: 978-3-319-31854-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics