Skip to main content

A Study of Factors Affecting MapReduce Scheduling

  • Conference paper
  • First Online:
Big Data Analytics

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 654))

Abstract

MapReduce is a programming model for parallel distributed processing of large-scale data. Hadoop framework is an implementation of MapReduce. Since MapReduce processes data parallel on clusters of nodes, there is a need to have a good scheduling technique to optimize performance. Performance of MapReduce scheduling depends upon various points like execution time, resource utilization across the cluster, data locality, compute capacity, energy efficiency, heterogeneity, scaling, etc. Researchers have developed various algorithms to resolve some or the other problem and reach a near-optimal solution. This paper summarizes most of the research work done in this regard.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. MapReduce Tutorial. http://hadoop.apache.org/docs/

  2. Hammoud, M., Sakr, F.M.: Locality-aware reduce task scheduling for MapReduce. In: Proceeding CLOUDCOM IEEE 3rd International Conference on Cloud Computing Technology and Science, pp. 570–576 (2011)

    Google Scholar 

  3. Guo, Z., Fox, G.: Improving MapReduce performance in heterogeneous network environments and resource utilization. In: 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 714–716 (2012)

    Google Scholar 

  4. Sandholm, T., Lai, K.: Dynamic proportional share scheduling in Hadoop. Job Sched. Strat. Parallel Process. Lect. Notes V6253, 110–131 (2010)

    Article  Google Scholar 

  5. Song, G., Yu, L., Meng, Z., Lin, X.: A game theory based MapReduce scheduling algorithm. Emerg. Technol. Inf. Syst. Comput. Manage. Lect. Notes Electr. Eng. 236, 287–296 (2013)

    Google Scholar 

  6. Ahmad, F., Chakradhar, S., Raghunathan, A., Vijaykumar, T.N.: Tarazu: optimizing MapReduce on heterogeneous clusters. In: ASPLOS XVII International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 61–74 (2012)

    Google Scholar 

  7. Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In: GCM 2nd International Workshop, pp. 1–6 (2011)

    Google Scholar 

  8. Wolf, J., Balmin, A., Rajan, D., Hildrum, K., Khandekar, R., Parekh, S., Wu, K.-L., Vernica, R.: On the optimization of schedules for MapReduce workloads in the presence of shared scans. VLDB J. 21(5), 589–609 (2012)

    Article  Google Scholar 

  9. Zaharia, M., Konwinski, A., Joseph, D.A., Katz, H.R., Stoica, I.: Improving MapReduce performance in heterogeneous environments. In: Proceeding OSDI 8th USENIX Conference on Operating Systems Design and Implementation, pp. 29–42 (2008)

    Google Scholar 

  10. Phan, T.X.L., Zhang, Z., Loo, T.B., Lee, I.: Real-time MapReduce scheduling. Technical Report, University of Pennsylvania Department of Computer and Information Science

    Google Scholar 

  11. Luo, Y., Plale, B.: Hierarchical MapReduce programming model and scheduling algorithms. In:12th IEEE International Symposium on Cluster, Cloud and Grid Computing (2012)

    Google Scholar 

  12. Ibrahim, S., Jin, H., Lu, L., He, B., Antoniu, G., Wu, S.: Maestro: replica-aware map scheduling for MapReduce, In: Proceeding CCGRID 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 435–442 (2012)

    Google Scholar 

  13. Tan, J., Meng, X., Zhang, L.: Coupling task progress for MapReduce resource-aware scheduling. In: INFOCOM pp. 1618–1626 (2013)

    Google Scholar 

  14. Bu, X., Rao, J., Xu, C.-Z.,: Interference and locality-aware task scheduling for MapReduce applications in virtual clusters. In: Proceeding HPDC 22nd International Symposium on High-Performance Parallel and Distributed Computing, pp. 227–238 (2013)

    Google Scholar 

  15. Wolf, J., Rajan, D., Hildrum, K., Khandekar, R., Kumar, V., Parekh, S., Wu, K.-L., Balmin, A.: FLEX: a slot allocation scheduling optimizer for MapReduce workloads. In: Middleware ACM/IFIP/USENIX 11th International Conference on Middleware Archive, pp. 1–20 (2010)

    Google Scholar 

  16. Sharma, B., Prabhakar, R., Lim, S.-H., Kandemir, T.M., Das, R.C.: MROrchestrator: a fine-grained resource orchestration framework for MapReduce clusters. In: IEEE Fifth International Conference on Cloud Computing, pp. 1–8 (2012)

    Google Scholar 

  17. Yao, Y., Wang, J., Sheng, B., Lin, J., Mi, N.: HaSTE: Hadoop YARN scheduling based on task-dependency and resource-demand. In: IEEE International Conference on Cloud Computing, pp. 184–191 (2014)

    Google Scholar 

  18. Zhang, Q., Zhani, F.M., Yang, Y., Boutaba, R., Wong, B.: PRISM: fine-grained resource-aware scheduling for MapReduce. IEEE Trans. Cloud Comput. 3(2), 182–194 (2015)

    Article  Google Scholar 

  19. Chen, C.-H., Lin, J.-W., Kuo, S.-Y.: Deadline-constrained MapReduce scheduling based on graph modelling. In: IEEE 7th International Conference, pp. 416–423 (2014)

    Google Scholar 

  20. Wang, Y., Shi, W. Budget-driven scheduling algorithms for batches of MapReduce jobs in heterogeneous clouds. IEEE Trans. Cloud Comput. 306–319 (2014)

    Google Scholar 

  21. Tang, Z., Zhou, J., Li, K., Li, R.: MTSD: a task scheduling algorithm for MapReduce base on deadline constraints. In: 8th International Conference on Semantics, Knowledge and Grids, pp. 2012–2018 (2012)

    Google Scholar 

  22. Liu, L., Zhou, Y., Liu, M., Xu, G., Chen, X., Fan, D., Wang, Q.: Preemptive Hadoop jobs scheduling under a deadline. In: 8th International Conference on Semantics, Knowledge, Grids, pp. 72–79 (2012)

    Google Scholar 

  23. Kc, K., Anyanwu, K.: Scheduling Hadoop jobs to meet deadlines. In: IEEE Second International Conference on Cloud Computing Technology and Science, pp. 388–392 (2010)

    Google Scholar 

  24. Lai, Z.-R., Chang, C.-W., Liu, X., Kuo, T.-W., Hsiu, P.-C.: Deadline-aware load balancing for MapReduce. In: IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), 20–22 Aug 2014, pp. 1–10 (2014)

    Google Scholar 

  25. Sun, X.: Thesis on—an enhanced self adaptive MapReduce scheduling algorithm. In: The Graduate College at the University of Nebraska (2012)

    Google Scholar 

  26. Mashayekhy, L., Nejad, M.M., Grosu, D., Lu, D., Shi, W.: Energy-aware scheduling of MapReduce jobs. In: IEEE International Congress on Big Data, pp. 32–39 (2014)

    Google Scholar 

  27. Dong, X., Wang, Y., Liao, H.: Scheduling mixed real-time and non-real-time applications in MapReduce environment. In: IEEE 17th International Conference on Parallel and Distributed Systems, pp. 9–16 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manisha Gaur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Gaur, M., Minocha, B., Muttoo, S.K. (2018). A Study of Factors Affecting MapReduce Scheduling. In: Aggarwal, V., Bhatnagar, V., Mishra, D. (eds) Big Data Analytics. Advances in Intelligent Systems and Computing, vol 654. Springer, Singapore. https://doi.org/10.1007/978-981-10-6620-7_27

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6620-7_27

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6619-1

  • Online ISBN: 978-981-10-6620-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics