Skip to main content

Node Capability Modeling for Reduce Phase’s Scheduling in MapReduce Environment

  • Conference paper
  • First Online:
Cloud Computing and Big Data (CloudCom-Asia 2015)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9106))

  • 1324 Accesses

Abstract

MapReduce is a programming model widely used in big data processing. Reduce tasks scheduling in MapReduce is a key issue which affect the performance significantly. Unfortunately, because of the complication of reduce tasks scheduling, there are no acknowledged solution in this issue. Main ideas in optimizing reduce tasks scheduling emphasizes features of computation or data locality. Although few researches tried to explore solutions with theoretical modeling, their models are oversimplified. Aiming to optimizing reduce tasks scheduling, we propose a method of modeling node’s computation and communication capability uniformly based on analyzing the procedure of reduce phase theoretically. In the analysis, cost of reduce tasks in intermediate data fetching and processing are integrated. With the proposed model, the optimal load balance of reduce phase is concluded and proved. Evaluations under different environments show that load balance of reduce phase is improved significantly with the scheduling method instructed by the optimal principle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. J. Commun. ACM. 51, 107–113 (2008)

    Article  Google Scholar 

  2. Hadoop. http://hadoop.apache.org

  3. Applications powered by Hadoop: https://wiki.apache.org/hadoop/PoweredBy

  4. Yahoo! Launches World’s Largest Hadoop Production Application. https://developer.yahoo.com/blogs/hadoop/yahoo-launches-world-largest-hadoop-production-application-398.html

  5. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a Mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)

    Article  Google Scholar 

  6. Kalyanaraman, A., Cannon, W.R., Latt, B., Baxter, D.J.: MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification. Bioinformatics 27, 3072–3073 (2011)

    Article  Google Scholar 

  7. Stuart, J.A., Owerns, J.D.: Multi-GPU MapReduce on GPU clusters. In: 2011 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 1068–1079. IEEE (2011)

    Google Scholar 

  8. Srirama, S.N., Jakovits, P., Vainikko, E.: Adapting scientific computing problems to clouds using MapReduce. Future Gener. Comput. Syst. 28(1), 184–192 (2012)

    Article  Google Scholar 

  9. Nguyen, P., Simon, T., Halem, M., Chapman, D., Le, Q.: A hybrid scheduling algorithm for data intensive workloads in a MapReduce environment. In: Proceedings of the 5th International Conference on Utility and Cloud Computing, Chicago, IL, USA, 5–8 November 2012

    Google Scholar 

  10. Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, Paris, France, 13–16 April 2010

    Google Scholar 

  11. Zhang, X., Zhong, Z., Feng, S., Tu, B., Fan, J.: Improving data locality of Mapreduce by scheduling in homogeneous computing environments. In: Proceedings of the 9th International Symposium on Parallel and Distributed Processing with Applications, Busan, Korea, 26–28 May 2011

    Google Scholar 

  12. Tang, Z., Zhou, J., Li, K., et al.: A MapReduce task scheduling algorithm for deadline constraints. Cluster Comput. 16(4), 651–662 (2013)

    Article  Google Scholar 

  13. Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving Mapreduce performance through data placement in heterogeneous hadoop clusters. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, 19–23 April 2010

    Google Scholar 

  14. Abad, C.L., Lu, Y., Campbell, R.H.: DARE: adaptive data replication for efficient cluster scheduling. In: Proceedings of IEEE International Conference on Cluster Computing, Austin, TX, USA, 26–30 September 2011

    Google Scholar 

  15. Palanisamy, B., Singh, A., Liu, L., et al.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 58. ACM (2011)

    Google Scholar 

  16. Lin, H., Ma, X., Archuleta, J., Feng, W., Gardner, M., Zhang, Z.: Moon: Mapreduce on opportunistic environments. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, USA, 21–25 June 2010

    Google Scholar 

  17. Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job scheduling for multi-user Mapreduce clusters. Technical report, UCB/EECS-2009–55 (2009)

    Google Scholar 

  18. Hammoud, M, Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570–576. IEEE (2011)

    Google Scholar 

  19. Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for Mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, Karlsruhe, Germany, 14–18 June 2011

    Google Scholar 

  20. Tan, J., Meng, S., Meng, X., Zhang, L.: Improving ReduceTask data locality for sequential MapReduce jobs. In: Proceedings of the IEEE INFOCOM, Turin, Italy, 14–19 April 2013

    Google Scholar 

  21. Yuan, Y, Wang, D, Liu, J.: Joint Scheduling of MapReduce jobs with servers: performance bounds and experiments

    Google Scholar 

  22. Berlińska, J., Drozdowski, M.: Scheduling divisible MapReduce computations. J. Parallel Distrib. Comput. 71, 450–459 (2011)

    Article  Google Scholar 

  23. White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Cambridge (2012)

    Google Scholar 

  24. Moges, M., Yu, D., Robertazzi, T.G.: Grid scheduling divisible loads from two sources. Comput. Math. Appl. 58, 1081–1092 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  25. Piriyakumar, A., Murthy, C.S.R.: Distributed computation for a hypercube network of sensor-driven processors with communication delays including setup time. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 28, 245–251 (1998)

    Article  Google Scholar 

  26. Hung, J., Robertazzi, T.: Scalable scheduling for clusters and grids using cut through switching. Int. J. Comput. Appl. 26, 147–156 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qun Liao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Zuo, C., Liao, Q., Gu, T., Li, T., Yang, Y. (2015). Node Capability Modeling for Reduce Phase’s Scheduling in MapReduce Environment. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28430-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28429-3

  • Online ISBN: 978-3-319-28430-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics