Skip to main content
Log in

A hybrid approach to software fault prediction using genetic programming and ensemble learning methods

  • Original article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Software fault prediction techniques use previous software metrics and also use the fault data to predict fault-prone modules for the next release of software. In this article we review the literature that uses machine-learning techniques to find the defect, fault, ambiguous code, inappropriate branching and prospected runtime errors to establish a level of quality in software. This paper also proposes a hybrid technique for software fault prediction which is based on genetic programming and ensemble learning techniques. There are multiple software fault prediction (machine-learning) techniques available to predict the occurrence of faults. Our experiments perform a comparative study of the performance achieved by simple ensemble methods, simple genetic programming based classification and the hybrid approach. We find that machine learning techniques have different learning abilities that can be exploited by software professionals and researchers for software fault prediction. We find that the performance obtained by this proposed approach is superior to the simple statistical and ensemble techniques used in the automated fault prediction system. However, more studies should be performed on lesser used machine learning techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Adeli H, Hung SL (1994) Machine learning: neural networks, genetic algorithms, and fuzzy systems. Wiley (1994)

  • Akour M, Alsmadi I, Alazzam I (2017) Software fault proneness prediction: a comparative study between bagging, boosting, and stacking ensemble and base learner methods. Int J Data Anal Tech Strateg 9(1):1–16

    Article  Google Scholar 

  • Aleem S, Capretz LF, Ahmed F (2015) Benchmarking machine learning techniques for software defect detection. Int J Softw Eng Appl 6(3)

  • Arar ÖF, Ayan K (2015) Software defect prediction using cost-sensitive neural network. Appl Soft Comput 33:263–277

    Article  Google Scholar 

  • Arisholm E, Briand LC, Fuglerud M (2007) Data mining techniques for building fault-proneness models in telecom java software. In: The 18th IEEE international symposium on software reliability, 2007. ISSRE'07 (pp. 215–224). IEEE (2007, November)

  • Bal PR, Mohapatra DP (2017) Software reliability prediction based on radial basis function neural network. In: Advances in computational intelligence. Springer, Singapore, pp 101–110

  • Bal PR, Jena N, Mohapatra DP (2017) Software reliability prediction based on ensemble models. In: Proceeding of international conference on intelligent communication, control and devices, Springer, Singapore, pp 895–902

  • Blickle T (1997) Theory of evolutionary algorithms and application to system synthesis (No. 17). vdf Hochschulverlag AG

  • Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In; Proceedings of the fifth annual workshop on computational learning theory (pp 144–152). ACM

  • Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  • Cano A, Zafra A, Ventura S (2012) Speeding up the evaluation phase of GP classification algorithms on GPUs. Soft Comput 16(2):187–202

    Article  Google Scholar 

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object-oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Choudhary GR, Kumar S, Kumar K, Mishra A, Catal C (2018) Empirical analysis of change metrics for software fault prediction. Comput Electr Eng 67:15–24

    Article  Google Scholar 

  • Di Martino S, Ferrucci F, Gravino C, Sarro F (2011) A genetic algorithm to configure support vector machines for predicting fault-prone components. In: International conference on product focused software process improvement (pp 247–261). Springer, Berlin

  • Drucker H, Cortes C, Jackel LD, LeCun Y, Vapnik V (1994) Boosting and other ensemble methods. Neural Comput 6(6):1289–1301

    Article  Google Scholar 

  • Girija SS (2016) Tensorflow: large-scale machine learning on heterogeneous distributed systems

  • Guo L, Cukic B, Singh H (2003) Predicting fault prone modules by the dempster-shafer belief networks. In: Proceedings of the 18th IEEE international conference on automated software engineering, 2003, pp 249–252. IEEE

  • Gyimothy T, Ferenc R, Siket I (2005) Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE Trans Softw Eng 31(10):897–910

    Article  Google Scholar 

  • Jabangwe R, Börstler J, Šmite D, Wohlin C (2015) Empirical evidence on the link between object-oriented measures and external quality attributes: a systematic literature review. Empir Softw Eng 20(3):640–693

    Article  Google Scholar 

  • Kleinberg EM (2000) On the algorithmic implementation of stochastic discrimination. IEEE Trans Pattern Anal Mach Intell 5:473–490

    Article  Google Scholar 

  • Kpodjedo S, Ricca F, Galinier P, Guéhéneuc YG, Antoniol G (2011) Design evolution metrics for defect prediction in object oriented systems. Empir Softw Eng 16(1):141–175

    Article  Google Scholar 

  • Kulamala VK, Teja ASC, Maru A, Singla Y, Mohapatra DP (2018) Predicting software reliability using computational intelligence techniques: a review. In: 2018 international conference on information technology (ICIT), IEEE, pp 114–119

  • Kumar KV, Kumari P, Chatterjee A, Mohapatra DP (2021) Software fault prediction using random forests. In: Intelligent and cloud computing. Springer, Singapore, pp 95–103

  • Kumaresh, S., Baskaran, R., Sivaguru, M.: Software Defect Classification using Bayesian Classification Techniques.

  • Li M, Zhang H, Wu R, Zhou ZH (2012) Sample-based software defect prediction with active and semi-supervised learning. Autom Softw Eng 19(2):201–230

    Article  Google Scholar 

  • Maddipati SS, Pradeepini G, Yesubabu A (2018) Software defect prediction using adaptive neuro fuzzy inference system. Int J Appl Eng Res 13(1):394–397

  • Malhotra R (2015) A systematic review of machine learning techniques for software fault prediction. Appl Soft Comput 27:504–518

    Article  Google Scholar 

  • Mitchell TM (1997) Machine learning. WCB

  • Murillo-Morera J, Jenkins M (2015) A software defect-proneness prediction framework: a new approach using genetic algorithms to generate learning schemes. In: SEKE, pp 445–450

  • Purohit A, Chaudhari NS, Tiwari A (2010) Construction of classifier with feature selection based on genetic programming. In: 2010 IEEE congress on evolutionary computation (CEC) (pp 1–5). IEEE, (2010)

  • Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

    Google Scholar 

  • Quinlan JR (1987) Simplifying decision trees. Int J Man Mach Stud 27(3):221–234

    Article  Google Scholar 

  • Rathore SS, Kumar S (2019) A study on software fault prediction techniques. Artif Intell Rev 51(2):255–327

    Article  Google Scholar 

  • Ridella S, Rovetta S, Zunino R (1997) Circular backpropagation networks for classification. IEEE Trans Neural Netw 8(1):84–97

    Article  Google Scholar 

  • Rodríguez D, Ruiz R, Riquelme JC, Aguilar-Ruiz JS (2012) Searching for rules to detect defective modules: a subgroup discovery approach. Inf Sci 191:14–30

    Article  Google Scholar 

  • Rojas R (2009) AdaBoost and the super bowl of classifiers a tutorial introduction to adaptive boosting. Freie University, Berlin, Tech. Rep

  • Sathyaraj R, Prabu S (2015) An approach for software fault prediction to measure the quality of different prediction methodologies using software metrics. Indian J Sci Technol 8(35)

  • Sherer SA (1995) Software fault prediction. J Syst Softw 29(2):97–105

    Article  Google Scholar 

  • Singh Y, Kaur A, Malhotra R (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18(1):3

    Article  Google Scholar 

  • Song Q, Jia Z, Shepperd M, Ying S, Liu J (2011) A general software defect-proneness prediction framework. IEEE Trans Softw Eng 37(3):356–370

    Article  Google Scholar 

  • Specht DF (1988) Probabilistic neural networks for classification, mapping, or associative memory. In: IEEE international conference on neural networks (Vol. 1, No. 24, pp 525–532)

  • Stephens T (2016) Genetic Programming in Python, with a scikit-learn inspired API: gplearn, 2016–. [Online; accessed 21.6.2017]

  • Turhan B, Bener A (2009) Analysis of Naive Bayes’ assumptions on software fault data: an empirical study. Data Knowl Eng 68(2):278–290

    Article  Google Scholar 

  • Twala B (2011) Software faults prediction using multiple classifiers. In: 2011 3rd international conference on computer research and development (ICCRD) (Vol. 4, pp 504–510). IEEE

  • Vandecruys O, Martens D, Baesens B, Mues C, De Backer M, Haesen R (2008) Mining software repositories for comprehensible software fault prediction models. J Syst Softw 81(5):823–839

    Article  Google Scholar 

  • Zadeh LA (1996) Fuzzy logic, neural networks, and soft computing. In: Fuzzy Sets, Fuzzy logic, and fuzzy systems: selected papers by Lotfi A Zadeh (pp 775–782)

  • Zhou Y, Xu B, Leung H (2010) On the ability of complexity metrics to predict fault-prone classes in object-oriented systems. J Syst Softw 83(4):660–674

    Article  Google Scholar 

Download references

Funding

There was no funding support from any agencies.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Satya Prakash Sahu or B. Ramachandra Reddy.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sahu, S.P., Reddy, B.R., Mukherjee, D. et al. A hybrid approach to software fault prediction using genetic programming and ensemble learning methods. Int J Syst Assur Eng Manag 13, 1746–1760 (2022). https://doi.org/10.1007/s13198-021-01532-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-021-01532-x

Keywords

Navigation