Skip to main content
Log in

Joint deadline-constrained and influence-aware design for allocating MapReduce jobs in cloud computing systems

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

MapReduce can speed up the execution of jobs operating over big data. A MapReduce job can be divided into a number of map and reduce tasks by a well determined division manner on its processing data. In a cloud computing system, multiple MapReduce jobs may be submitted together to compete for the computing resources of the system. When a job has a particular performance requirement (e.g. execution deadline), the appropriate computing resources must be kept for executing the map/reduce tasks of the job; otherwise, the performance requirement cannot be satisfied. Several deadline-constrained MapReduce schedulers have been proposed, but most of them are not aware of the performance influence over existing tasks. We propose a deadline-constrained and influence-aware MapReduce scheduler which combines the following three factors: (1) relaxed data locality, (2) performance influence over existing tasks, and (3) coordinating allocation contention. We first adopt the data-locality criterion to make a tentative allocation plan. By verifying the data-locality allocation plan, if some new tasks severely affect existing tasks or the deadline requirements of some new tasks are not satisfied, the data-locality allocation plan will be modified by re-allocating some new tasks. To optimize the computing resource usage, the solution of a well-known network graph problem: minimum cost maximum-flow (MCMF) is applied to perform the modification of the data-locality allocation plan. A heuristic algorithm is also presented to suppress the complexity of MCMF problem. In addition to meeting the deadline requirements of new jobs, the final allocation plan also considers the performance influence over existing jobs. Finally, we conduct the performance analysis to demonstrate the performance of our proposed MapReduce scheduler using various performance metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)

    Article  Google Scholar 

  2. Buyya, R., Broberg, J., Goscinski, A.M.: Cloud Computing Principles and Paradigms. Wiley Publishing, Hoboken (2011)

    Book  Google Scholar 

  3. Zhang, B., Krikava, F., Rouvoy, R., Seinturier, L.: Self-configuration of the number of concurrently running MapReduce jobs in a hadoop cluster. In: Proceedings of the IEEE international conference on autonomic computing, pp. 149–150 (2015)

  4. White, T.: Hadoop: The Definitive Guide, 3rd edn. Inc. O’Reilly Media, Beijing (2012)

    Google Scholar 

  5. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceedings of USENIX Conference OSDI, pp. 29–42 (2008)

  6. Tang, Z., Zhou, J., Li, K., Li, R.: A MapReduce task scheduling algorithm for deadline constraints. Clust. Comput. 16, 651–662 (2013)

    Article  Google Scholar 

  7. Shin, S., Kim, Y., Lee S.: Deadline-guaranteed scheduling algorithm with improved resource utilization for cloud computing. In: 12th annual IEEE consumer communications and networking conference, pp. 814–819 (2015)

  8. Chen, C.H., Lin, J.W., Kuo, S.Y.: MapReduce scheduling for deadline-constrained jobs in heterogeneous cloud computing systems. In: IEEE transactions on cloud computing, accepted for publication

  9. Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network Flows: Theory Algorithms and Applications, 1st edn. Prentice Hall, Upper Saddle River (1993)

    MATH  Google Scholar 

  10. Tiwari, N., Sarkar, S., Bellur, U., Indrawan, M.: Classification framework of MapReduce scheduling algorithms. ACM Comput. Surv. (CSUR) 47(3), 49:1–49:38 (2015)

    Article  Google Scholar 

  11. Apache Hadoop YARN. https://hadoop.apache.org/docs/r2.7.1/hadoop-[hadoop.apache.org]yarn/hadoop-yarn-site/YARN.html (2017). Accessed 1 Oct 2017

  12. Ho, L.Y., Wu, J.J., Liu, P.: Optimal algorithms for cross-rack communication optimization in MapReduce framework. In: Proceedings of IEEE CLOUD, pp. 420–427 (2011)

  13. Sokkalingam, P.T., Ahuja, R.K., Orlin, J.B.: New polynomial-time cycle-canceling algorithms for minimum-cost flows. Networks 36(1), 53–63 (2000)

    Article  MathSciNet  Google Scholar 

  14. Xu, C.X.: A simple solution to maximum flow at minimum cost. In: Proceedings of 2010 2nd International Conference Information Engineering and Computer Science (ICIECS 10), pp. 1–4 (2010)

  15. Kelner, J.A., Lee, Y.T., Orecchia, L., Sidford, A.: An almost-linear-time algorithm for approximate max flow in undirected graphs, and its multicommodity generalizations. In: Proceedings of the Twenty-Fifth Annual ACM-SIAM Symposium on Discrete Algorithms, p. 217–226 (2014)

  16. MathWorks—MATLAB and Simulink for technical computing. http://www.mathworks.com/ (2017)

  17. BladeCenter Blade Servers. http://www07.ibm.com/systems/includes/content/bladecenter/hardware/servers/ (2017)

  18. Sarda, K., Sanghrajka, S., Sion, R.: Cloud Performance Benchmark Series: Amazon EC2 CPU Speed Benchmark. Department of Computer Science, Stony Brook University, Tech. Rep. (2010)

  19. Amazon EC2. https://aws.amazon.com/ec2/?nc1=hls (2017)

  20. Chen, Q., Liu, C., Xiao, Z.: Improving MapReduce performance using smart speculative execution strategy. IEEE Trans. Comput. 63(4), 954–967 (2014)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was supported by the Ministry of Science and Technology, Taiwan, R.O.C, under Grant MOST 105-2221-E-030-004-MY3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joseph M. Arul.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, JW., Arul, J.M. & Lin, CY. Joint deadline-constrained and influence-aware design for allocating MapReduce jobs in cloud computing systems. Cluster Comput 22 (Suppl 3), 6963–6976 (2019). https://doi.org/10.1007/s10586-018-1981-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-018-1981-x

Keywords

Navigation