Abstract
Over the last years, the traditional pressing need for fast and reliable processing solutions has been further exacerbated by the increase of data volumes – produced by mobile devices, sensors and almost ubiquitous internet availability. These big data must be analyzed to extract further knowledge.
Distributed programming models, such as Map Reduce, are providing a technical answer to this challenge. Furthermore, when relaying on cloud infrastructures, Map Reduce platforms can easily be runtime provided with additional computing nodes (e.g., the system administrator can scale the infrastructure to face temporal deadlines). Nevertheless, the execution of distributed programming models on the cloud still lacks automated mechanisms to guarantee the Quality of Service (i.e., autonomous scale-up/-down behavior).
In this paper, we focus on the steps of monitoring Map Reduce applications (to detect situations where the temporal deadline will be exceeded) and performing recovery actions on the cluster (by automatically providing additional resources to boost the computation). To this end, we exploit some techniques and tools developed in the research field of Business Process Management: in particular, we focus on declarative languages and tools for monitoring the execution of business process. We introduce a distributed architecture where a logic-based monitor is able to detect possible delays, and trigger recovery actions such as the dynamic provisioning of a congruent number of resources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available for download at https://www.inf.unibz.it/~montali/tools.html#MOBUCON.
References
Amazon Cloud Watch (2016). https://aws.amazon.com/it/cloudwatch/. Accessed July 2016
Apache Hadoop (2016). https://hadoop.apache.org/. Accessed July 2016
Apache Spark (2016). http://spark.apache.org. Accessed July 2016
Armbrust, M., Fox, O., R., G.: Above the clouds: a Berkeley view of cloud computing. Technical rep., Electrical Engineering and Computer Sciences, University of California at Berkeley (2009)
Bauer, A., Leucker, M., Schallhart, C.: Runtime verification for LTL and TLTL. ACM Trans. Softw. Eng. Methodol. 20(4), 14:1–14:64 (2011). http://doi.acm.org/10.1145/2000799.2000800
Chen, K., Powers, J., Guo, S., Tian, F.: CRESP: towards optimal resource provisioning for MapReduce computing in public clouds. IEEE Trans. Parallel Distrib. Syst. 25(6), 1403–1412 (2014)
Chen, M., Mao, S., Liu, Y.: Big Data: a survey. Mob. Netw. Appl. 19(2), 171–209 (2014)
Collins, E.: Intersection of the Cloud and Big Data. IEEE Cloud Comput. 1(1), 84–85 (2014)
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). http://doi.acm.org/10.1145/1327452.1327492
Ekanayake, J., Li, H., Zhang, B.: Twister: a runtime for iterative Map Reduce. In: Proceedings of the First International Workshop on Map Reduce and its Application of ACM HPDC Conference (2010)
Farrel, A., Sergot, M., Sallè, M., Bartolini, C.: Using the event calculus for tracking the normative state of contracts. Int. J. Coop. Inf. Syst. 14(02n03), 99–129 (2005). http://www.worldscientific.com/doi/abs/10.1142/S0218843005001110
Giannakopoulou, D., Havelund, K.: Automata-based verification of temporal properties on running programs. In: Proceedings of 16th Annual International Conference on Automated Software Engineering (ASE 2001), pp. 412–416, November 2001
Kailasam, S., Dhawalia, P., Balaji, S., Iyer, G., Dharanipragada, J.: Extending MapReduce across clouds with BStream. IEEE Trans. Cloud Comput. 2(3), 362–376 (2014)
Kowalski, R.A., Sergot, M.J.: A logic-based calculus of events. New Gener. Comput. 4, 67–95 (1986)
Loreti, D., Ciampolini, A.: A hybrid cloud infrastructure of Big Data applications. In: Proceedings of IEEE International Conferences on High Performance Computing and Communications (2015)
Mattess, M., Calheiros, R., Buyya, R.: Scaling MapReduce applications across hybrid clouds to meet soft deadlines. In: 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA), pp. 629–636, March 2013
Montali, M., Chesani, F., Mello, P., Maggi, F.M.: Towards data-aware constraints in declare. In: Shin, S.Y., Maldonado, J.C. (eds.) Proceedings of the 28th Annual ACM Symposium on Applied Computing, SAC 2013, Coimbra, Portugal, 18–22 March 2013, pp. 1391–1396. ACM (2013). http://doi.acm.org/10.1145/2480362.2480624
Montali, M., Maggi, F.M., Chesani, F., Mello, P., van der Aalst, W.M.P.: Monitoring business constraints with the event calculus. ACM TIST 5(1), 17 (2013). http://doi.acm.org/10.1145/2542182.2542199
OpenStack Ceilometer (2016). https://wiki.openstack.org/wiki/Ceilometer. Accessed July 2016
Palanisamy, B., Singh, A., Liu, L.: Cost-effective resource provisioning for MapReduce in a cloud. IEEE Trans. Parallel Distrib. Syst. 26(5), 1265–1279 (2015)
Pesic, M., Aalst, W.M.P.: A declarative approach for flexible business processes management. In: Eder, J., Dustdar, S. (eds.) BPM 2006. LNCS, vol. 4103, pp. 169–180. Springer, Heidelberg (2006). doi:10.1007/11837862_18
Rizvandi, N.B., Taheri, J., Moraveji, R., Zomaya, A.Y.: A study on using uncertain time series matching algorithms for MapReduce applications. Concurrency Comput. Pract. Experience 25(12), 1699–1718 (2013). http://dx.doi.org/10.1002/cpe.2895
Spanoudakis, G., Mahbub, K.: Non-intrusive monitoring of service-based systems. Int. J. Coop. Inf. Syst. 15(03), 325–358 (2006). http://www.worldscientific.com/doi/abs/10.1142/S0218843006001384
Van Der Aalst, W.M.P.: Distributed process discovery and conformance checking. In: Lara, J., Zisman, A. (eds.) FASE 2012. LNCS, vol. 7212, pp. 1–25. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28872-2_1
Van Der Aalst, W., et al.: Process mining manifesto. In: Daniel, F., Barkaoui, K., Dustdar, S. (eds.) BPM 2011. LNBIP, vol. 99, pp. 169–194. Springer, Heidelberg (2012). doi:10.1007/978-3-642-28108-2_19
Verma, A., Cherkasova, L., Campbell, R.H.: Resource provisioning framework for MapReduce jobs with performance goals. In: Kon, F., Kermarrec, A.-M. (eds.) Middleware 2011. LNCS, vol. 7049, pp. 165–186. Springer, Heidelberg (2011). doi:10.1007/978-3-642-25821-3_9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Chesani, F., Ciampolini, A., Loreti, D., Mello, P. (2017). Map Reduce Autoscaling over the Cloud with Process Mining Monitoring. In: Helfert, M., Ferguson, D., Méndez Muñoz, V., Cardoso, J. (eds) Cloud Computing and Services Science. CLOSER 2016. Communications in Computer and Information Science, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-62594-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-62594-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62593-5
Online ISBN: 978-3-319-62594-2
eBook Packages: Computer ScienceComputer Science (R0)