Abstract
Software defect prediction (SDP) is a promising solution to save time and cost in the software testing phase for improving software quality. Numerous machine learning approaches have proven effective in SDP. However, the unbalanced class distribution in SDP datasets could be a problem for some conventional learning methods. In addition, class overlap increases the difficulty for the predictors to learn the defective class accurately. In this study, we propose a new SDP model which combines class overlap reduction and ensemble imbalance learning to improve defect prediction. First, the neighbor cleaning method is applied to remove the overlapping non-defective samples. The whole dataset is then randomly under-sampled several times to generate balanced subsets so that multiple classifiers can be trained on these data. Finally, these individual classifiers are assembled with the AdaBoost mechanism to build the final prediction model. In the experiments, we investigated nine highly unbalanced datasets selected from a public software repository and confirmed that the high rate of overlap between classes existed in SDP data. We assessed the performance of our proposed model by comparing it with other state-of-the-art methods including conventional SDP models, imbalance learning and data cleaning methods. Test results and statistical analysis show that the proposed model provides more reasonable defect prediction results and performs best in terms of G-mean and AUC among all tested models.
Similar content being viewed by others
References
Arar, O. F., & Ayan, K. (2015). Software defect prediction using cost-sensitive neural network. Applied Soft Computing, 33, 263–277.
Catal, C., & Diri, B. (2009). A systematic review of software fault prediction studies. Expert Systems with Applications, 36(4), 7346–7354.
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Chawla, N. V., Lazarevic, A., Hall, L. O., & Bowyer, K. W. (2003). SMOTEBoost: Improving prediction of the minority class in boosting. In European Conference on Principles of Data Mining and Knowledge Discovery (pp. 107–119). Dubrovnik: Springer.
Conover, W. J., & Iman, R. L. (1981). Rank transformations as a bridge between parametric and nonparametric statistics. The American Statistician, 35(3), 124–129.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1–30.
Denil, M., & Trappenberg, T. (2010). Overlap versus imbalance. In Proceedings of Advances in Artificial Intelligence, Canadian Conference on Artificial Intelligence, Canadian, Ai 2010, Ottawa, Canada, May 31–June 2, 2010 (pp. 220–231).
Drown, D. J., Khoshgoftaar, T. M., & Seliya, N. (2009). Evolutionary sampling and software quality modeling of high-assurance systems. IEEE Transactions on Systems Man and Cybernetics Part a-Systems and Humans, 39(5), 1097–1107.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
Fenton, N. E., & Ohlsson, N. (2000). Quantitative analysis of faults and failures in a complex software system. IEEE Transactions on Software Engineering, 26(8), 797–814.
Freund, Y., & Schapire, R. E. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. In European conference on computational learning theory (pp. 23–37). London: Springer.
Ghotra, B., McIntosh, S., & Hassan, A. E. (2015). Revisiting the impact of classification techniques on the performance of defect prediction models. In Proceedings of the 37th International Conference on Software Engineering-Volume 1 (pp. 789–800). Piscataway: IEEE.
Gondra, I. (2008). Applying machine learning to software fault-proneness prediction. Journal of Systems and Software, 81(2), 186–195.
Gray, D., Bowes, D., Davey, N., Sun, Y., & Christianson, B. (2011). The misuse of the NASA metrics data program data sets for automated software defect prediction. In 15th Annual Conference on Evaluation and Assessment in Software Engineering (EASE 2011) (pp. 96–103). Durham: IET.
Hall, T., Beecham, S., Bowes, D., Gray, D., & Counsell, S. (2012). A systematic literature review on fault prediction performance in software engineering. IEEE Transactions on Software Engineering, 38(6), 1276–1304.
Halstead, M. H. (1977). Elements of software science (Vol. 7). New York: Elsevier.
He, H., & Garcia, E. (2009). Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.
Japkowicz, N., & Stephen, S. (2002). The class imbalance problem: A systematic study. Intelligent Data Analysis, 6(5), 429–449.
Jing, X., Wu, F., Dong, X., Qi, F., & Xu, B. (2015). Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (pp. 496–507). New York: ACM.
Kampenes, V. B., Dybå, T., Hannay, J. E., & Sjøberg, D. I. (2007). A systematic review of effect size in software engineering experiments. Information and Software Technology, 49(11), 1073–1086.
Khoshgoftaar, T. M., Gao, K., & Seliya, N. (2010). Attribute selection and imbalanced data: Problems in software defect prediction. In 2010 22nd IEEE International Conference on Tools with Artificial Intelligence (Vol. 1, pp. 137–144). Arras: IEEE.
Kim, S., Zhang, H., Wu, R., & Gong, L. (2011). Dealing with noise in defect prediction. In 2011 33rd International Conference on Software Engineering (ICSE) (pp. 481–490). New York: IEEE.
Kubat, M., & Matwin, S. (1997). Addressing the curse of imbalanced training sets: one-sided selection. In ICML (Vol. 97, pp. 179–186). Nashville.
Laurikkala, J. (2001). Improving identification of difficult small classes by balancing class distribution. Berlin: Springer.
Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In European conference on machine learning (pp. 4–15). Chemnitz: Springer.
Liu, M. X., Miao, L. S., & Zhang, D. Q. (2014). Two-Stage cost-Sensitive learning for Software defect prediction. IEEE Transactions on Reliability, 63(2), 676–686.
Liu, X., Wu, J., & Zhou, Z. (2009). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics. Part B, Cybernetics, 39(2), 539–550.
López, V., Fernández, A., García, S., Palade, V., & Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences, 250(11), 113–141.
Malhotra, R. (2015). A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing, 27, 504–518.
McCabe, T. J. (1976). A complexity measure. IEEE Transactions on Software Engineering, 4, 308–320.
Menzies, T., Butcher, A., Cok, D., Marcus, A., Layman, L., Shull, F., et al. (2013). Local versus global lessons for defect prediction and effort estimation. IEEE Transactions on Software Engineering, 39(6), 822–834.
Menzies, T., Caglayan, B., Kocaguneli, E., Krall, J., Peters, F., & Turhan, B. (2012). The promise repository of empirical software engineering data. promisedata. googlecode. com.
Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. IEEE Transactions on Software Engineering, 33(1), 2–13.
Nam, J., & Kim, S. (2015). Heterogeneous defect prediction. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (pp. 508–519). New York: ACM.
Pelayo, L., & Dick, S. (2007). Applying novel resampling strategies to software defect prediction. In Fuzzy Information Processing Society, 2007. NAFIPS’07. Annual Meeting of the North American (pp. 69–72). San Diego: IEEE.
Pelayo, L., & Dick, S. (2012). Evaluating stratification alternatives to improve software defect prediction. IEEE Transactions on Reliability, 61(2), 516–525.
Prati, R. C., Batista, G. E. A. P. A., & Monard, M. C. (2004). Class imbalances versus class overlapping: An analysis of a learning system behavior. Lecture Notes in Computer Science, 2972, 312–321.
Ryu, D., Choi, O., & Baik, J. (2016). Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering, 21(1), 43–71.
Ryu, D., Jang, J.-I., & Baik, J. (2015). A transfer cost-sensitive boosting approach for cross-project defect prediction. Software Quality Journal, 1–38.
Seiffert, C., Khoshgoftaar, T. M., & Van Hulse, J. (2009). Improving software-quality predictions with data sampling and boosting. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 39(6), 1283–1294.
Seiffert, C., Khoshgoftaar, T. M., Van Hulse, J., & Napolitano, A. (2010). RUSBoost: A hybrid approach to alleviating class imbalance. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems and Humans, 40(1), 185–197.
Shepperd, M., & Ince, D. C. (1994). A critique of three metrics. Journal of Systems and Software, 26(3), 197–210.
Shepperd, M., Song, Q., Sun, Z., & Mair, C. (2013). Data quality: Some comments on the nasa software defect datasets. IEEE Transactions on Software Engineering, 39(9), 1208–1215.
Siers, M. J., & Islam, M. Z. (2015). Software defect prediction using a cost sensitive decision forest and voting, and a potential solution to the class imbalance problem. Information Systems, 51, 62–71.
Song, Q., Jia, Z., Shepperd, M., Ying, S., & Liu, S. Y. J. (2011). A general software defect-proneness prediction framework. IEEE Transactions on Software Engineering, 37(3), 356–370.
Srinivasan, K., & Fisher, D. (1995). Machine learning approaches to estimating software development effort. IEEE Transactions on Software Engineering, 21(2), 126–137.
Tan, M., Tan, L., Dara, S., & Mayeux, C. (2015). Online defect prediction for imbalanced data. In Proceedings of the 37th International Conference on Software Engineering-Volume 2 (pp. 99–108). Piscataway: IEEE.
Tang, W., & Khoshgoftaar, T. M. (2004). Noise identification with the k-means algorithm. In ICTAI 2004. 16th IEEE International Conference on Tools with Artificial Intelligence (pp. 373–378). Boca Raton: IEEE.
Turhan, B., Menzies, T., Bener, A. B., & Di Stefano, J. (2009). On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 14(5), 540–578.
Turhan, B., Mısırlı, A. T., & Bener, A. (2013). Empirical evaluation of the effects of mixed project data on learning defect predictors. Information and Software Technology, 55(6), 1101–1118.
Wang, S., & Yao, X. (2013). Using Class Imbalance Learning for Software Defect Prediction. IEEE Transactions on Reliability, 62(2), 434–443.
Zhang, F., Zheng, Q., Zou, Y., & Hassan, A. E. (2016). Cross-project defect prediction using a connectivity-based unsupervised classifier. In Proceedings of the 38th International Conference on Software Engineering (pp. 309–320). New York: ACM.
Acknowledgments
This paper is supported by National Key Basic Research Program of China (973 program 2013CB329103 of 2013CB329100), the Program for Natural Science Foundation of China (No. 61672120, No. 61472053, No. 91118005), the Doctoral Program of Higher Education (20120191110027) and Natural Science Foundation of Chongqing (No. CSTC2010BB2217, No. cstc2012jjA40017).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chen, L., Fang, B., Shang, Z. et al. Tackling class overlap and imbalance problems in software defect prediction. Software Qual J 26, 97–125 (2018). https://doi.org/10.1007/s11219-016-9342-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-016-9342-6