Achieving Dynamic Resource Allocation in the Hadoop Cloud System

Yeh, Tsozen; Yu, Shengchieh

doi:10.1007/978-3-030-38651-1_22

Tsozen Yeh¹² &
Shengchieh Yu¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11894))

Included in the following conference series:

International Conference on Internet of Vehicles

1260 Accesses

Abstract

Cloud computing has been extensively adopted to handle the enormous amount of data from Internet of Things, Big Date, and many other cutting-edge research areas in recent years. As cloud systems serve more and more jobs, it will be getting more difficult for time-critical or urgent jobs with high priority in a busy cloud environment to complete their execution as soon as users would like to have. To facilitate the prompt execution of those jobs, it is imperative for cloud systems to provide schemes expediting their execution. The Apache Hadoop is one of the most popular cloud platforms in cloud computing. Unfortunately, it is not equipped with flexible mechanisms to hasten the course of prioritized jobs. There had been various approaches proposed to accelerate the execution of prioritized jobs from different aspects. However, those approaches not only target at just certain existing Hadoop job schedulers but also require modifications made to those job schedulers. Thus, they cannot be directly applied to other job schedulers without major porting efforts, much less to new job schedulers developed in the future. We designed and implemented a new scheme enabling dynamic resource allocation to jobs selected by job schedulers. As a result, without making changes to job schedulers, our scheme could help some current and future Hadoop job schedulers speed up the execution of jobs with high priority. Experimental results demonstrate that jobs executed with high priority can reduce their execution time by up to 68.28%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

http://en.wikipedia.org/wiki/apache_hadoop
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/federation.html
https://www.facebook.com/notes/facebook-engineering/under-the-hood-hadoop-distributed-filesystem-reliability-with-namenode-and-avata/10150888759153920
http://www.cloudera.com/content/cloudera-content/cloudera-docs/cdh4/4.2.0/cdh4-high-availability-guide/cdh4hag_topic_2_1.html
Agarwal, S., Borthakur, D., Stoica, I.: Snapshots in Hadoop distributed file system. Technical report, EECS Department, University of California, Berkeley, November 2010 (2011)
Google Scholar
Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394. ACM (2015)
Google Scholar
Blagojevic, F., Guyot, C., Wang, Q., Tsai, T., Mateescu, R., Bandic, Z.: Priority IO scheduling in the cloud. In: Proceedings of USENIX Conference on Hot Topics Cloud Computing, pp. 1–6 (2013)
Google Scholar
Borthakur, D., et al.: Apache Hadoop goes realtime at Facebook. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, pp. 1071–1080. ACM, New York (2011). https://doi.org/10.1145/1989323.1989438
Bui, D.M., Hussain, S., Huh, E.N., Lee, S.: Adaptive replication management in hdfs based on supervised learning. IEEE Trans. Knowl. Data Eng. 28(6), 1369–1382 (2016)
Article Google Scholar
Burns, B., Oppenheimer, D.: Design patterns for container-based distributed systems. In: 8th \(\{\)USENIX\(\}\) Workshop on Hot Topics in Cloud Computing, HotCloud 2016 (2016)
Google Scholar
Buyya, R., Broberg, J., Goscinski, A.M.: Cloud Computing: Principles and Paradigms, vol. 87. Wiley, Hoboken (2010)
Google Scholar
Ghemawat, S., Gobioff, H., Leung, S.T.: The Google file system. In: ACM SIGOPS Operating Systems Review, vol. 37, pp. 29–43. ACM (2003)
Google Scholar
Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, vol. 8, pp. 11–11 (2010)
Google Scholar
Karanasos, K., et al.: Mercury: hybrid centralized and distributed scheduling in large shared clusters. In: 2015 \(\{\)USENIX\(\}\) Annual Technical Conference, \(\{\)USENIX\(\}\)\(\{\)ATC\(\}\) 2015, pp. 485–497 (2015)
Google Scholar
Kc, K., Anyanwu, K.: Scheduling Hadoop jobs to meet deadlines. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science (CloudCom), pp. 388–392. IEEE (2010)
Google Scholar
Kondikoppa, P., Chiu, C.H., Cui, C., Xue, L., Park, S.J.: Network-aware scheduling of MapReduce framework on distributed clusters over high speed networks. In: Proceedings of the 2012 Workshop on Cloud Services, Federation, and the 8th Open Cirrus Summit, pp. 39–44. ACM (2012)
Google Scholar
Li, H., Ghodsi, A., Zaharia, M., Shenker, S., Stoica, I.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 1–15. ACM (2014)
Google Scholar
Oriani, A., Garcia, I.C.: From backup to hot standby: high availability for HDFS. In: 2012 IEEE 31st Symposium on Reliable Distributed Systems (SRDS), pp. 131–140. IEEE (2012)
Google Scholar
Qin, P., Dai, B., Huang, B., Xu, G.: Bandwidth-aware scheduling with SDN in Hadoop: a new trend for big data. IEEE Syst. J. 11, 2337–2344 (2015)
Article Google Scholar
Rasooli, A., Down, D.G.: An adaptive scheduling algorithm for dynamic heterogeneous Hadoop systems. In: Proceedings of the 2011 Conference of the Center for Advanced Studies on Collaborative Research, pp. 30–44. IBM Corporation (2011)
Google Scholar
Renner, T., Thamsen, L., Kao, O.: CoLoc: distributed data and container colocation for data-intensive applications. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 3008–3015. IEEE (2016)
Google Scholar
Rista, C., Griebler, D., Maron, C.A., Fernandes, L.G.: Improving the network performance of a container-based cloud environment for Hadoop systems. In: 2017 International Conference on High Performance Computing & Simulation (HPCS), pp. 619–626. IEEE (2017)
Google Scholar
Sandholm, T., Lai, K.: Dynamic proportional share scheduling in Hadoop. In: Frachtenberg, E., Schwiegelshohn, U. (eds.) JSSPP 2010. LNCS, vol. 6253, pp. 110–131. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-16505-4_7
Chapter Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–10. IEEE (2010)
Google Scholar
Tan, J., Meng, X., Zhang, L.: Coupling task progress for MapReduce resource-aware scheduling. In: 2013 Proceedings IEEE INFOCOM, pp. 1618–1626. IEEE (2013)
Google Scholar
Varga, M., Petrescu-Nita, A., Pop, F.: Deadline scheduling algorithm for sustainable computing in Hadoop environment. Comput. Secur. 76, 354–366 (2018)
Article Google Scholar
Vavilapalli, V.K., et al.: Apache Hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, p. 5. ACM (2013)
Google Scholar
White, T.: Hadoop: The Definitive Guide, 3rd edn. O’Reilly, Newton (2012)
Google Scholar
Yeh, T., Huang, H.: Realizing prioritized scheduling service in the Hadoop system. In: 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud), pp. 47–54. IEEE (2018)
Google Scholar
Yeh, T., Sun, Y.: Enabling prioritized cloud I/O service in Hadoop distributed file system. In: The 16th IEEE International Conference on High Performance Computing and Communications, pp. 256–259. IEEE (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Information Engineering, Fu Jen Catholic University, New Taipei City, Taiwan
Tsozen Yeh & Shengchieh Yu

Authors

Tsozen Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Shengchieh Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tsozen Yeh .

Editor information

Editors and Affiliations

Chung Hua University, Hsinchu, Taiwan
Ching-Hsien Hsu
Saint-Quentin-en-Yvelines, Université de Versailles, Versailles Cedex, France
Sondès Kallel
China Medical University, Tainan, Taiwan
Kun-Chan Lan
Sun Yat-sen University, Guangzhou, China
Zibin Zheng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yeh, T., Yu, S. (2020). Achieving Dynamic Resource Allocation in the Hadoop Cloud System. In: Hsu, CH., Kallel, S., Lan, KC., Zheng, Z. (eds) Internet of Vehicles. Technologies and Services Toward Smart Cities. IOV 2019. Lecture Notes in Computer Science(), vol 11894. Springer, Cham. https://doi.org/10.1007/978-3-030-38651-1_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-38651-1_22
Published: 19 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-38650-4
Online ISBN: 978-3-030-38651-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics