Abstract
We present a resource-aware scheduling technique for MapReduce multi-job workloads that aims at improving resource utilization across machines while observing completion time goals. Existing MapReduce schedulers define a static number of slots to represent the capacity of a cluster, creating a fixed number of execution slots per machine. This abstraction works for homogeneous workloads, but fails to capture the different resource requirements of individual jobs in multi-user environments. Our technique leverages job profiling information to dynamically adjust the number of slots on each machine, as well as workload placement across them, to maximize the resource utilization of the cluster. In addition, our technique is guided by user-provided completion time goals for each job. Source code of our prototype is available at [1].
Chapter PDF
Similar content being viewed by others
References
Adaptive Scheduler, https://issues.apache.org/jira/browse/MAPREDUCE-1380
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: OSDI 2004, San Francisco, CA, pp. 137–150 (December 2004)
Hadoop MapReduce, http://hadoop.apache.org/mapreduce/
Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., Murthy, R., Liu, H.: Data warehousing and analytics infrastructure at facebook. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 1013–1020. ACM, New York (2010)
Ananthanarayanan, G., Kandula, S., Greenberg, A., Stoica, I., Lu, Y., Saha, B., Harris, E.: Reining in the outliers in map-reduce clusters using mantri. In: OSDI 2010, pp. 1–16. USENIX Asoc., Berkeley (2010)
Polo, J., Carrera, D., Becerra, Y., Steinder, M., Whalley, I.: Performance-driven task co-scheduling for MapReduce environments. In: Network Operations and Management Symposium, NOMS, pp. 373–380. IEEE, Osaka (2010)
Wolf, J., Rajan, D., Hildrum, K., Khandekar, R., Kumar, V., Parekh, S., Wu, K.-L., Balmin, A.: Flex: A Slot Allocation Scheduling Optimizer for Mapreduce Workloads. In: Gupta, I., Mascolo, C. (eds.) Middleware 2010. LNCS, vol. 6452, pp. 1–20. Springer, Heidelberg (2010)
Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: Automatic Resource Inference and Allocation for MapReduce Environments. In: 8th IEEE International Conference on Autonomic Computing, Karlsruhe, Germany (June 2011)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. In: OSDI 2008, pp. 29–42. USENIX Association, Berkeley (2008)
Tang, C., Steinder, M., Spreitzer, M., Pacifici, G.: A scalable application placement controller for enterprise data centers. In: Procs. of the 16th Intl. Conference on World Wide Web, pp. 331–340. ACM, NY (2007)
Herodotou, H., Babu, S.: Profiling, What-if Analysis, and Cost-based Optimization of MapReduce Programs. In: VLDB (2010)
Pacifici, G., Segmuller, W., Spreitzer, M., Tantawi, A.N.: Dynamic estimation of cpu demand of web traffic. In: Lenzini, L., Cruz, R.L. (eds.) VALUETOOLS. ACM International Conference Proceeding Series, vol. 180, p. 26. ACM (2006)
Tesauro, G., Jong, N.K., Das, R., Bennani, M.N.: A hybrid reinforcement learning approach to autonomic resource allocation. In: Proceedings of the 2006 IEEE International Conference on Autonomic Computing, pp. 65–73. IEEE Computer Society, Washington, DC (2006)
Yahoo! Inc. Capacity scheduler, http://developer.yahoo.com/blogs/hadoop/posts/2011/02/capacity-scheduler/
Isard, M., Prabhakaran, V., Jon Currey, U.W., Talwar, K., Goldberg, A.: Quincy: fair scheduling for distributed computing clusters. In: SOSP 2009 (2009)
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, EuroSys 2007, pp. 59–72. ACM, New York (2007)
Dhok, J., Varma, V.: Using pattern classification for task assignment, http://researchweb.iiit.ac.in/~jaideep/jd-thesis.pdf
Murthy, A.: Next Generation Hadoop, http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 IFIP International Federation for Information Processing
About this paper
Cite this paper
Polo, J. et al. (2011). Resource-Aware Adaptive Scheduling for MapReduce Clusters. In: Kon, F., Kermarrec, AM. (eds) Middleware 2011. Middleware 2011. Lecture Notes in Computer Science, vol 7049. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25821-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-25821-3_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25820-6
Online ISBN: 978-3-642-25821-3
eBook Packages: Computer ScienceComputer Science (R0)