Abstract
Infrastructure-as-a-service container-based virtualization is gaining interest as a platform for running distributed applications. With increasing scale of cloud architectures, faults are becoming a frequent occurrence, which makes availability true challenge. Replication is a method to survive failures whether of checkpoints, containers or data to increase their availability. In fact, following a node failure, fault-tolerant cloud systems restart failed containers on a new node from distributed images of containers (or checkpoints). With a high failure rate, we can lose some replicas. It is interesting to increase the replication factor in some cases and finding the trade-off between restarting all failed containers and storage overhead. This paper addresses the issue of adapting the replication factor and contributes with a novel replication factor modeling approach, which is able to predict the right replication factor using prediction techniques. These techniques are based on experimental modeling, which analyze collected data related to different executions. We have used regression technique to find the relation between availability and replicas number. Experiments on the Grid’5000 testbed demonstrate the benefits of our proposal to satisfy the availability requirement, using a real fault-tolerant cloud system.
Similar content being viewed by others
References
Marinescu DC (2017) Cloud computing: theory and practice. Morgan Kaufmann, Burlington
Mell P, Grance T (2011) The NIST definition of cloud computing. National Institute of Standards & Technology, Gaithersburg, MD, USA
Joy AM (2015) Performance comparison between linux containers and virtual machines. In: 2015 International Conference on Advances in Computer Engineering and Applications (ICACEA), IEEE, pp 342–346
Martin JP, Kandasamy A, Chandrasekaran K (2018) Exploring the support for high performance applications in the container runtime environment. Hum Centric Comput Inf Sci 8(1):1
Vishwanath KV, Nagappan N (2010) Characterizing cloud computing hardware reliability. In: Proceedings of the 1st ACM Symposium on Cloud Computing, ACM, pp 193–204
Jhawar R, Piuri V (2017) Fault tolerance and resilience in cloud computing environments. In: Computer and information security handbook (3rd edn), Elsevier, pp 165–181
Cheraghlou MN, Khadem-Zadeh A, Haghparast M (2016) A survey of fault tolerance architecture in cloud computing. J Netw Comput Appl 61:81–92
Milani BA, Navimipour NJ (2016) A comprehensive review of the data replication techniques in the cloud environments: major trends and future directions. J Netw Comput Appl 64:229–238
Louati T, Abbes H, Cérin C, Jemni M (2018) Lxcloud-cr: towards linux containers distributed hash table based checkpoint-restart. J Parallel Distrib Comput 111:187–205
Louati T, Abbes H, Cérin C (2018) Lxcloudft: towards high availability, fault tolerant cloud system based linux containers. J Parallel Distrib Comput 122:51–69
Zhou Y, Li N, Li H, Zhang Y (2015) Regression cloud models and their applications in energy consumption of data center. J Electr Comput Eng 2015:143071:1–143071:9
Hightower K, Burns B, Beda J (2017) Kubernetes: up and running dive into the future of infrastructure, 1st edition. O’Reilly Media, Inc, ISBN: 1491935677
Netto HV, Lung LC, Correia M, Luiz AF, de Souza LMS (2017) State machine replication in containers managed by kubernetes. J Syst Archit 73:53–59
OpenStack (2019) https://www.openstack.org/
Docker kubernetes (2019) https://www.docker.com/ kubernetes
Docker Swarm (2019) https://docs.docker.com/engine/ swarm/
Hassan WU, Lemay M, Aguse N, Bates A, Moyer T (2018) Towards scalable cluster auditing through grammatical inference over provenance graphs. In: Network and Distributed Systems Security Symposium
Autonomic aspects in cloud data management (2018) http://slideplayer.com/slide/10708882/
Apache Hadoop (2019) http://hadoop.org/
Wei Q, Veeravalli B, Gong B, Zeng L, Feng D (2010) Cdrm: a cost-effective dynamic replication management scheme for cloud storage cluster. In: 2010 IEEE International Conference on Cluster Computing (CLUSTER), IEEE, pp 188–196
Wang M, Li B, Zhao Y, Pu G (2014) Formalizing google file system. In: 2014 IEEE 20th Pacific Rim International Symposium on Dependable Computing (PRDC), IEEE, pp 190–191
The Google File System (GFS) (2019) https://tinyurl.com/yab4s2zq
AmazonS3 Versioning (2019) http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html
Xie G, Zeng G, Chen Y, Bai Y, Zhou Z, Li R, Li K (2017) Minimizing redundancy to satisfy reliability requirement for a parallel application on heterogeneous service-oriented systems. IEEE Trans Serv Comput. https://doi.org/10.1109/TSC.2017.2665552
CRIU (2019) https://www.criu.org/
CRIU Comparison to other CR projects (2019) http://criu.org/Comparison_to_other_CR_projects
CRIU Images (2019). https://www.criu.org/
Louati T, Abbes H, Cérin C, Jemni M (2017) Gc-cr: a decentralized garbage collector component for checkpointing in clouds. In: 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), IEEE, pp 97–104
Zhou Y, Li N, Li H, Zhang Y (2015) Regression cloud models and their applications in energy consumption of data center. JECE 2015:1:1–1:1
Rajamani K, Sheela D (2018) Data mining techniques and algorithms in cloud environment-a review. Int J Pure Appl Math 119:599–602
r\(_{-}\)squared (2019) http://www.moneychimp.com/glossary/popup/glossary.htm?entry=r_squared
Grid’5000 (2019) https://www.grid5000.fr
Nancy Site (2019) https://www.grid5000.fr/w/Nancy:Hardware
Prezi (2019) https://prezi.com/
Prezi DataSet (2019) https://tinyurl.com/sd3e6ac
LXC linux container (2019) https://linuxcontainers.org/
Alapati SR (2018) Cassandra on Docker, Apache Spark, and the Cassandra Cluster Manager. In: Expert Apache Cassandra Administration, Springer, pp 249–281
MLIB (2019) https://spark.apache.org/docs/2.2.0/ml-classification-regression.html
SMILE (2019) http://haifengl.github.io/regression.html
Scikit-learn (2019) https://scikit-learn.org/stable/supervisedlearning.html
Yassir S, Mostapha Z, Najlae K (2018) The impact of checkpointing interval selection on the scheduling performance of Hadoop framework. In: 2018 6th International Conference on Multimedia Computing and Systems (ICMCS), IEEE, pp 1–6
CRIU Logging (2019) https://criu.org/Logging
CRIU Better Logging (2019) https://criu.org/Betterlogging
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Abbes, H., Louati, T. & Cérin, C. Dynamic replication factor model for Linux containers-based cloud systems. J Supercomput 76, 7219–7241 (2020). https://doi.org/10.1007/s11227-020-03158-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03158-5