Node Capability Modeling for Reduce Phase’s Scheduling in MapReduce Environment

Zuo, Chuang; Liao, Qun; Gu, Tao; Li, Tao; Yang, Yulu

doi:10.1007/978-3-319-28430-9_17

Chuang Zuo¹⁶,
Qun Liao¹⁶,
Tao Gu¹⁶,
Tao Li¹⁶ &
…
Yulu Yang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9106))

Included in the following conference series:

Second International Conference on Cloud Computing and Big Data in Asia

1324 Accesses

Abstract

MapReduce is a programming model widely used in big data processing. Reduce tasks scheduling in MapReduce is a key issue which affect the performance significantly. Unfortunately, because of the complication of reduce tasks scheduling, there are no acknowledged solution in this issue. Main ideas in optimizing reduce tasks scheduling emphasizes features of computation or data locality. Although few researches tried to explore solutions with theoretical modeling, their models are oversimplified. Aiming to optimizing reduce tasks scheduling, we propose a method of modeling node’s computation and communication capability uniformly based on analyzing the procedure of reduce phase theoretically. In the analysis, cost of reduce tasks in intermediate data fetching and processing are integrated. With the proposed model, the optimal load balance of reduce phase is concluded and proved. Evaluations under different environments show that load balance of reduce phase is improved significantly with the scheduling method instructed by the optimal principle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. J. Commun. ACM. 51, 107–113 (2008)
Article Google Scholar
Hadoop. http://hadoop.apache.org
Applications powered by Hadoop: https://wiki.apache.org/hadoop/PoweredBy
Yahoo! Launches World’s Largest Hadoop Production Application. https://developer.yahoo.com/blogs/hadoop/yahoo-launches-world-largest-hadoop-production-application-398.html
McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., DePristo, M.A.: The genome analysis toolkit: a Mapreduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010)
Article Google Scholar
Kalyanaraman, A., Cannon, W.R., Latt, B., Baxter, D.J.: MapReduce implementation of a hybrid spectral library-database search method for large-scale peptide identification. Bioinformatics 27, 3072–3073 (2011)
Article Google Scholar
Stuart, J.A., Owerns, J.D.: Multi-GPU MapReduce on GPU clusters. In: 2011 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 1068–1079. IEEE (2011)
Google Scholar
Srirama, S.N., Jakovits, P., Vainikko, E.: Adapting scientific computing problems to clouds using MapReduce. Future Gener. Comput. Syst. 28(1), 184–192 (2012)
Article Google Scholar
Nguyen, P., Simon, T., Halem, M., Chapman, D., Le, Q.: A hybrid scheduling algorithm for data intensive workloads in a MapReduce environment. In: Proceedings of the 5th International Conference on Utility and Cloud Computing, Chicago, IL, USA, 5–8 November 2012
Google Scholar
Zaharia, M., Borthakur, D., Sen Sarma, J., Elmeleegy, K., Shenker, S., Stoica, I.: Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling. In: Proceedings of the 5th European Conference on Computer Systems, Paris, France, 13–16 April 2010
Google Scholar
Zhang, X., Zhong, Z., Feng, S., Tu, B., Fan, J.: Improving data locality of Mapreduce by scheduling in homogeneous computing environments. In: Proceedings of the 9th International Symposium on Parallel and Distributed Processing with Applications, Busan, Korea, 26–28 May 2011
Google Scholar
Tang, Z., Zhou, J., Li, K., et al.: A MapReduce task scheduling algorithm for deadline constraints. Cluster Comput. 16(4), 651–662 (2013)
Article Google Scholar
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving Mapreduce performance through data placement in heterogeneous hadoop clusters. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing, Workshops and PhD Forum, 19–23 April 2010
Google Scholar
Abad, C.L., Lu, Y., Campbell, R.H.: DARE: adaptive data replication for efficient cluster scheduling. In: Proceedings of IEEE International Conference on Cluster Computing, Austin, TX, USA, 26–30 September 2011
Google Scholar
Palanisamy, B., Singh, A., Liu, L., et al.: Purlieus: locality-aware resource allocation for MapReduce in a cloud. In: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 58. ACM (2011)
Google Scholar
Lin, H., Ma, X., Archuleta, J., Feng, W., Gardner, M., Zhang, Z.: Moon: Mapreduce on opportunistic environments. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, Chicago, Illinois, USA, 21–25 June 2010
Google Scholar
Zaharia, M., Borthakur, D., Sarma, J.S., Elmeleegy, K., Shenker, S., Stoica, I.: Job scheduling for multi-user Mapreduce clusters. Technical report, UCB/EECS-2009–55 (2009)
Google Scholar
Hammoud, M, Sakr, M.F.: Locality-aware reduce task scheduling for MapReduce. In: 2011 IEEE Third International Conference on Cloud Computing Technology and Science (CloudCom), pp. 570–576. IEEE (2011)
Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for Mapreduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing, Karlsruhe, Germany, 14–18 June 2011
Google Scholar
Tan, J., Meng, S., Meng, X., Zhang, L.: Improving ReduceTask data locality for sequential MapReduce jobs. In: Proceedings of the IEEE INFOCOM, Turin, Italy, 14–19 April 2013
Google Scholar
Yuan, Y, Wang, D, Liu, J.: Joint Scheduling of MapReduce jobs with servers: performance bounds and experiments
Google Scholar
Berlińska, J., Drozdowski, M.: Scheduling divisible MapReduce computations. J. Parallel Distrib. Comput. 71, 450–459 (2011)
Article Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Cambridge (2012)
Google Scholar
Moges, M., Yu, D., Robertazzi, T.G.: Grid scheduling divisible loads from two sources. Comput. Math. Appl. 58, 1081–1092 (2009)
Article MATH MathSciNet Google Scholar
Piriyakumar, A., Murthy, C.S.R.: Distributed computation for a hypercube network of sensor-driven processors with communication delays including setup time. IEEE Trans. Syst. Man Cybern. Part A: Syst. Hum. 28, 245–251 (1998)
Article Google Scholar
Hung, J., Robertazzi, T.: Scalable scheduling for clusters and grids using cut through switching. Int. J. Comput. Appl. 26, 147–156 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer and Control Engineering, Nankai University, Tianjin, China
Chuang Zuo, Qun Liao, Tao Gu, Tao Li & Yulu Yang

Authors

Chuang Zuo
View author publications
You can also search for this author in PubMed Google Scholar
Qun Liao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Gu
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Yulu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qun Liao .

Editor information

Editors and Affiliations

School of Computer Science and Tech., Huazhong Univ. of Science and Technology, Wuhan, China
Weizhong Qiang
College of Mathematics and Computer Sci., Fuzhou University, Fuzhou, China
Xianghan Zheng
Dept. of Computer Scie and Informat. Eng, Chung Hua University, Hsinchu, Taiwan
Ching-Hsien Hsu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zuo, C., Liao, Q., Gu, T., Li, T., Yang, Y. (2015). Node Capability Modeling for Reduce Phase’s Scheduling in MapReduce Environment. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_17

Download citation

DOI: https://doi.org/10.1007/978-3-319-28430-9_17
Published: 10 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28429-3
Online ISBN: 978-3-319-28430-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics