Skip to main content

Achieving Dynamic Resource Allocation in the Hadoop Cloud System

  • Conference paper
  • First Online:
Internet of Vehicles. Technologies and Services Toward Smart Cities (IOV 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11894))

Included in the following conference series:

  • 1260 Accesses

Abstract

Cloud computing has been extensively adopted to handle the enormous amount of data from Internet of Things, Big Date, and many other cutting-edge research areas in recent years. As cloud systems serve more and more jobs, it will be getting more difficult for time-critical or urgent jobs with high priority in a busy cloud environment to complete their execution as soon as users would like to have. To facilitate the prompt execution of those jobs, it is imperative for cloud systems to provide schemes expediting their execution. The Apache Hadoop is one of the most popular cloud platforms in cloud computing. Unfortunately, it is not equipped with flexible mechanisms to hasten the course of prioritized jobs. There had been various approaches proposed to accelerate the execution of prioritized jobs from different aspects. However, those approaches not only target at just certain existing Hadoop job schedulers but also require modifications made to those job schedulers. Thus, they cannot be directly applied to other job schedulers without major porting efforts, much less to new job schedulers developed in the future. We designed and implemented a new scheme enabling dynamic resource allocation to jobs selected by job schedulers. As a result, without making changes to job schedulers, our scheme could help some current and future Hadoop job schedulers speed up the execution of jobs with high priority. Experimental results demonstrate that jobs executed with high priority can reduce their execution time by up to 68.28%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. http://en.wikipedia.org/wiki/apache_hadoop

  2. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/federation.html

  3. https://www.facebook.com/notes/facebook-engineering/under-the-hood-hadoop-distributed-filesystem-reliability-with-namenode-and-avata/10150888759153920

  4. http://www.cloudera.com/content/cloudera-content/cloudera-docs/cdh4/4.2.0/cdh4-high-availability-guide/cdh4hag_topic_2_1.html

  5. Agarwal, S., Borthakur, D., Stoica, I.: Snapshots in Hadoop distributed file system. Technical report, EECS Department, University of California, Berkeley, November 2010 (2011)

    Google Scholar 

  6. Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM (2015)

    Google Scholar 

  7. Blagojevic, F., Guyot, C., Wang, Q., Tsai, T., Mateescu, R., Bandic, Z.: Priority IO scheduling in the cloud. In: Proceedings of USENIX Conference on Hot Topics Cloud Computing, pp. 1–6 (2013)

    Google Scholar 

  8. Borthakur, D., et al.: Apache Hadoop goes realtime at Facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1071–1080. ACM, New York (2011). https://doi.org/10.1145/1989323.1989438

  9. Bui, D.M., Hussain, S., Huh, E.N., Lee, S.: Adaptive replication management in hdfs based on supervised learning. IEEE Trans. Knowl. Data Eng. 28(6), 1369–1382 (2016)

    Article  Google Scholar 

  10. Burns, B., Oppenheimer, D.: Design patterns for container-based distributed systems. In: 8th \(\{\)USENIX\(\}\) Workshop on Hot Topics in Cloud Computing, HotCloud 2016 (2016)

    Google Scholar 

  11. Buyya, R., Broberg, J., Goscinski, A.M.: Cloud Computing: Principles and Paradigms, vol. 87. Wiley, Hoboken (2010)

    Google Scholar 

  12. Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 29–43. ACM (2003)

    Google Scholar 

  13. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, vol. 8, pp. 11–11 (2010)

    Google Scholar 

  14. Karanasos, K., et al.: Mercury: hybrid centralized and distributed scheduling in large shared clusters. In: 2015 \(\{\)USENIX\(\}\) Annual Technical Conference, \(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 2015, pp. 485–497 (2015)

    Google Scholar 

  15. Kc, K., Anyanwu, K.: Scheduling Hadoop jobs to meet deadlines. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 388–392. IEEE (2010)

    Google Scholar 

  16. Kondikoppa, P., Chiu, C.H., Cui, C., Xue, L., Park, S.J.: Network-aware scheduling of MapReduce framework on distributed clusters over high speed networks. In: Proceedings of the 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit, pp. 39–44. ACM (2012)

    Google Scholar 

  17. Li, H., Ghodsi, A., Zaharia, M., Shenker, S., Stoica, I.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–15. ACM (2014)

    Google Scholar 

  18. Oriani, A., Garcia, I.C.: From backup to hot standby: high availability for HDFS. In: 2012 IEEE 31st Symposium on Reliable Distributed Systems (SRDS), pp. 131–140. IEEE (2012)

    Google Scholar 

  19. Qin, P., Dai, B., Huang, B., Xu, G.: Bandwidth-aware scheduling with SDN in Hadoop: a new trend for big data. IEEE Syst. J. 11, 2337–2344 (2015)

    Article  Google Scholar 

  20. Rasooli, A., Down, D.G.: An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pp. 30–44. IBM Corporation (2011)

    Google Scholar 

  21. Renner, T., Thamsen, L., Kao, O.: CoLoc: distributed data and container colocation for data-intensive applications. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 3008–3015. IEEE (2016)

    Google Scholar 

  22. Rista, C., Griebler, D., Maron, C.A., Fernandes, L.G.: Improving the network performance of a container-based cloud environment for Hadoop systems. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 619–626. IEEE (2017)

    Google Scholar 

  23. Sandholm, T., Lai, K.: Dynamic proportional share scheduling in Hadoop. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 110–131. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16505-4_7

    Chapter  Google Scholar 

  24. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)

    Google Scholar 

  25. Tan, J., Meng, X., Zhang, L.: Coupling task progress for MapReduce resource-aware scheduling. In: 2013 Proceedings IEEE INFOCOM, pp. 1618–1626. IEEE (2013)

    Google Scholar 

  26. Varga, M., Petrescu-Nita, A., Pop, F.: Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput. Secur. 76, 354–366 (2018)

    Article  Google Scholar 

  27. Vavilapalli, V.K., et al.: Apache Hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 5. ACM (2013)

    Google Scholar 

  28. White, T.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly, Newton (2012)

    Google Scholar 

  29. Yeh, T., Huang, H.: Realizing prioritized scheduling service in the Hadoop system. In: 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud), pp. 47–54. IEEE (2018)

    Google Scholar 

  30. Yeh, T., Sun, Y.: Enabling prioritized cloud I/O service in Hadoop distributed file system. In: The 16th IEEE International Conference on High Performance Computing and Communications, pp. 256–259. IEEE (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tsozen Yeh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yeh, T., Yu, S. (2020). Achieving Dynamic Resource Allocation in the Hadoop Cloud System. In: Hsu, CH., Kallel, S., Lan, KC., Zheng, Z. (eds) Internet of Vehicles. Technologies and Services Toward Smart Cities. IOV 2019. Lecture Notes in Computer Science(), vol 11894. Springer, Cham. https://doi.org/10.1007/978-3-030-38651-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-38651-1_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-38650-4

  • Online ISBN: 978-3-030-38651-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics