MLASP: Machine learning assisted capacity planning

Vitui, Arthur; Chen, Tse-Hsun (Peter)

doi:10.1007/s10664-021-09994-0

MLASP: Machine learning assisted capacity planning

An industrial experience report

Experience Report
Published: 24 June 2021

Volume 26, article number 87, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

626 Accesses
5 Citations
Explore all metrics

A Correction to this article was published on 16 August 2021

This article has been updated

Abstract

In industrial environments it is critical to find out the capacity of a system and plan for a deployment layout that meets the production traffic demands. The system capacity is influenced by both the performance of the system’s constituting components and the physical environment setup. In a large system, the configuration parameters of individual components give the flexibility to developers and load test engineers to tune system performance without changing the source code. However, due to the large search space, estimating the capacity of the system given different configuration values is a challenging and costly process. In this paper, we propose an approach, called MLASP, that uses machine learning models to predict the system key performance indicators (i.e., KPIs), such as throughput, given a set of features made off configuration parameter values, including server cluster setup, to help engineers in capacity planning for production environments. Under the same load, we evaluate MLASP on two large-scale mission-critical enterprise systems developed by Ericsson and on one open-source system. We find that: 1) MLASP can predict the system throughput with a very high accuracy. The difference between the predicted and the actual throughput is less than 1%; and 2) By using only a small subset of the training data (e.g., 3% of the entire data for the open-source system), MLASP can still predict the throughput accurately. We also document our experience of successfully integrating the approach into an industrial setting. In summary, this paper highlights the benefits and potential of using machine learning models to assist load test engineers in capacity planning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Change history

16 August 2021
A Correction to this paper has been published: https://doi.org/10.1007/s10664-021-10011-7

Notes

https://www.ericsson.com/en/press-releases/2017/9/ericsson-offers-continuous-software-updates

References

Aggarwal C, Chen C, Han J (2010) The inverse classification problem. J Comput Sci Technol 25:458–468
Article Google Scholar
ALQahtani AH, Whyte A (2016) Estimation of life-cycle costs of buildings: regression vs artificial neural network
Apache (2019) Apache kafka - a distributed streaming platform. https://kafka.apache.org/
Bao L, Liu X, Xu Z, Fang B (2018a) Autoconfig: Automatic configuration tuning for distributed message systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018
Bao L, Liu X, Xu Z, Fang B (2018b) Autoconfig: automatic configuration tuning for distributed message systems. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, ASE 2018, pp 29–40
Breiman L (2001) Random forests. Machine Learn 45(1):5–32
Article Google Scholar
Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp 785–794
Chen TH, Thomas SW, Nagappan M, Hassan AE (2012) Explaining software defects using topic models. In: Proceedings of the 9th IEEE working conference on mining software repositories, MSR ’12, pp 189–198
Chen TH, Shang W, Hassan AE, Nasser M, Flora P (2016) Cacheoptimizer: Helping developers configure caching frameworks for hibernate-based database-centric web applications. In: Proceedings of the 24th ACM SIGSOFT international symposium on foundations of software engineering, FSE 2016, pp 666–677
Chen TH, Syer MD, Shang W, Jiang ZM, Hassan AE, Nasser M, Flora P (2017) Analytics-driven load testing: An industrial experience report on load testing of large-scale systems
Cloudera Documentation (2018) Configuring apache kafka for performance and resource management. https://docs.cloudera.com/documentation/kafka/latest/topics/kafka_performance.html
Confluent Blogs (2017) Optimizing your apache kafka deployment. https://www.confluent.io/blog/optimizing-apache-kafka-deployment/
Ergen T, Kozat SS (2017) Online training of lstm networks in distributed systems for variable length data sequences. IEEE Trans Neural Netw Learn Syst 29(10):5159–5165
Article Google Scholar
FastCompany (2016) How one second could cost Amazon 1.6 billion sales. http://www.fastcompany.com/1825005/how-one-second-could-cost-amazon-16-billion-sales, Last Accessed Mar 3 2016
Friedman L, Wall M (2005) Graphical views of suppression and multicollinearity in multiple linear regression. Amer Statist 59:127–136. https://doi.org/10.1198/000313005X41337
Article MathSciNet Google Scholar
Garcia Asuero A, Sayago A, Gonzalez G (2006) The correlation coefficient: an overview. Critical Reviews in Analytical Chemistry - CRIT REV ANAL CHEM 36:41–59. https://doi.org/10.1080/10408340500526766
Article Google Scholar
Giulli A, Pal S (2017) Deep Learning with Keras. Packt Publishing Ltd, Birmingham
Google Scholar
Guo J, Czarnecki K, Apel S, Siegmund N, Wasowski A (2013) Variability-aware performance prediction: A statistical learning approach. 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) pp 301–311
Guo J, Yang D, Siegmund N, Apel S, Sarkar A, Valov P, Czarnecki K, Wasowski A, Yu H (2017) Data-efficient performance learning for configurable systems. Empir Softw Eng 23:1826–1867
Article Google Scholar
Ha H, Zhang H (2019) Deepperf: Performance prediction for configurable software with deep sparse neural network. In: Proceedings of the 41st international conference on software engineering, ICSE ’19, pp 1095–1106
Harrell FE (2006) Regression modeling strategies. Springer, Berlin
Google Scholar
Jiang ZM, Hassan AE (2015) A survey on load testing of large-scale software systems. IEEE Trans Softw Eng 41(11):1091–1118
Article Google Scholar
Lathuiliére S, Mesejo P, Alameda-Pineda X, Horaud R (2019) A comprehensive analysis of deep regression. IEEE Trans Pattern Anal Machine Intell 1–1
Le Noac’h P, Costan A, Bougé L (2017) A performance evaluation of apache kafka in support of big data streaming applications. In: 2017 IEEE international conference on big data (Big Data), pp 4803–4806
Li H, Chen THP, Hassan AE, Nasser M, Flora P (2018) Adopting autonomic computing capabilities in existing large-scale systems: An industrial experience report. In: Proceedings of the 40th international conference on software engineering: Software Engineering in Practice, ICSE-SEIP ’18, pp 1–10
MLASP (2020) Mlasp - open source system experimental data. https://github.com/SPEAR-SE/mlasp
Montero-Manso P, Athanasopoulos G, Hyndman RJ, Talagala TS (2020) Fforma: Feature-based forecast model averaging. Int J Forecast 36(1):86–92
Article Google Scholar
Ng AY (2004) Feature selection, l1 vs. l2 regularization, and rotational invariance. In: Proceedings of the twenty-first international conference on machine learning, association for computing machinery, New York, NY, USA, ICML ’04. https://doi.org/10.1145/1015330.1015435, p 78
Nigam K, Lafferty J, McCallum A (1999) Using maximum entropy for text classification. In: IJCAI-99 Workshop on machine learning for information filtering, Stockholom, Sweden, vol 1, pp 61–67
Pan B (2018) Application of xgboost algorithm in hourly pm2.5 concentration prediction. IOP Conf Series Earth Environ Sci 113:012127. https://doi.org/10.1088/1755-1315/113/1/012127
Article Google Scholar
Rabbit MQ (2020) Rabbit mq - an open source message broker system. https://www.rabbitmq.com/
Sayyad AS, Ingram J, Menzies T, Ammar H (2013) Scalable product line configuration: A straw to break the camel’s back. In: Proceedings of the 28th IEEE/ACM international conference on automated software engineering, IEEE Press, ASE’13, p 465474
SciKit-Learn (2019) Scikit learn - machine learning in python. https://pypi.org/project/psutil
Singh BK, Verma K, Thoke AS (2015) Investigations on impact of feature normalization techniques on classifier’s performance in breast tumor classification. Int J Comput Appl 116:11–15
Google Scholar
Sola J, Sevilla J (1997) Importance of input data normalization for the application of neural networks to complex industrial problems. Nuclear Sci IEEE Trans 44:1464–1468. https://doi.org/10.1109/23.589532
Article Google Scholar
Tibshirani R (2011) Regression shrinkage selection via the lasso. J R Stat Soc Series B 73:273–282. https://doi.org/10.2307/41262671
Article MathSciNet Google Scholar
Wöllmer M, Eyben F, Schuller B, Douglas-Cowie E, Cowie R (2009) Data-driven clustering in emotional space for affect recognition using discriminatively trained lstm networks. In: Proc Interspeech 2009, Brighton, UK, pp 1595–1598
Xu Y, Goodacre R (2018) On splitting training and validation set: a comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J Anal Test 2. https://doi.org/10.1007/s41664-018-0068-2
Yin Z, Ma X, Zheng J, Zhou Y, Bairavasundaram LN, Pasupathy S (2011) An empirical study on configuration errors in commercial and open source systems. SOSP ’11 159–172
Zaccone G, Karim MR, Menshawy A (2017) Deep Learning with TensorFlow. Packt Publishing Ltd, Birmingham
Google Scholar

Download references

Acknowledgements

We want to thank Ericsson for providing access to the enterprise systems that we used in our case study. The findings and opinions expressed in this paper are those of the authors and do not necessarily represent or reflect those of Ericsson and/or its subsidiaries and affiliation. Our results do not in any way reflect the quality of Ericsson’s products.

Author information

Authors and Affiliations

Software PErformance, Analysis, and Reliability (SPEAR) Lab, Concordia University, Montreal, Canada
Arthur Vitui & Tse-Hsun (Peter) Chen
Red Hat Inc, Toronto, Canada
Arthur Vitui

Authors

Arthur Vitui
View author publications
You can also search for this author in PubMed Google Scholar
Tse-Hsun (Peter) Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arthur Vitui.

Additional information

Communicated by: Sven Apel

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The original online version of this article was revised: Modifications have been made to the affiliation section and to Figure 6. Full information regarding the corrections made can be found in the erratum/correction for this article.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vitui, A., Chen, TH.(. MLASP: Machine learning assisted capacity planning. Empir Software Eng 26, 87 (2021). https://doi.org/10.1007/s10664-021-09994-0

Download citation

Accepted: 28 May 2021
Published: 24 June 2021
DOI: https://doi.org/10.1007/s10664-021-09994-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MLASP: Machine learning assisted capacity planning

Abstract

Access this article

Change history

16 August 2021

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation