Towards Benchmarking Feature Subset Selection Methods for Software Fault Prediction

Afzal, Wasif; Torkar, Richard

doi:10.1007/978-3-319-25964-2_3

Wasif Afzal^5,9 &
Richard Torkar^6,7,8

Part of the book series: Studies in Computational Intelligence ((SCI,volume 617))

691 Accesses
8 Citations
1 Altmetric

Abstract

Despite the general acceptance that software engineering datasets often contain noisy, irrelevant or redundant variables, very few benchmark studies of feature subset selection (FSS) methods on real-life data from software projects have been conducted. This paper provides an empirical comparison of state-of-the-art FSS methods: information gain attribute ranking (IG); Relief (RLF); principal component analysis (PCA); correlation-based feature selection (CFS); consistency-based subset evaluation (CNS); wrapper subset evaluation (WRP); and an evolutionary computation method, genetic programming (GP), on five fault prediction datasets from the PROMISE data repository. For all the datasets, the area under the receiver operating characteristic curve—the AUC value averaged over 10-fold cross-validation runs—was calculated for each FSS method-dataset combination before and after FSS. Two diverse learning algorithms, C4.5 and naïve Bayes (NB) are used to test the attribute sets given by each FSS method. The results show that although there are no statistically significant differences between the AUC values for the different FSS methods for both C4.5 and NB, a smaller set of FSS methods (IG, RLF, GP) consistently select fewer attributes without degrading classification accuracy. We conclude that in general, FSS is beneficial as it helps improve classification accuracy of NB and C4.5. There is no single best FSS method for all datasets but IG, RLF and GP consistently select fewer attributes without degrading classification accuracy within statistically significant boundaries.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The requirement that the number of training data points to be an exponential function of the feature dimension.
2.
Section 4 provides more details about AUC.

References

Khoshgoftaar, T.M., Seliya, N.: Fault prediction modeling for software quality estimation: Comparing commonly used techniques. Empirical Softw. Eng. 8(3), 255–283 (2004)
Article Google Scholar
Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)
Article Google Scholar
Hall, T., Beecham, S., Bowes, D., Gray, D., Counsell, S.: A systematic review of fault prediction performance in software engineering. IEEE Trans. Softw. Eng. (99) (2011)
Google Scholar
Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans. Softw. Eng. 34(4), 485–496 (2008)
Article Google Scholar
Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Trans. Softw. Eng. 25(5), 675–689 (1999)
Article Google Scholar
Song, Q., Jia, Z., Shepperd, M., Ying, S., Liu, J.: A general software defect-proneness prediction framework. IEEE Trans. Softw. Eng. 37(3), 356–370 (2011)
Article Google Scholar
Foss, T., Stensrud, E., Kitchenham, B.A., Myrtveit, I.: A simulation study of the model evaluation criterion MMRE. IEEE Trans. Softw. Eng. 29(11) (2003)
Google Scholar
Afzal, W., Torkar, R., Feldt, R.: Resampling methods in software quality classification. Int. J. Software Eng. Knowl. Eng. 22, 203–223 (2012)
Article Google Scholar
Gray, D., Bowes, D., Davey, N., Sun, Y., Christianson, B.: The misuse of the NASA metrics data program data sets for automated software defect prediction. IET Semin. Dig. 1, 96–103 (2011)
Google Scholar
Khoshgoftaar, T.M., Gao, K., Seliya, N.: Attribute selection and imbalanced data: Problems in software defect prediction. IEEE Computer Society, Los Alamitos, CA, USA (2010)
Google Scholar
Shivaji, S., Whitehead, J.E.J, Akella, R., Kim, S. Reducing features to improve bug prediction. In: Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering (ASE’09), IEEE Computer Society, Washington, DC, USA (2009)
Google Scholar
Rodriguez, D., Ruiz, R., Cuadrado-Gallego, J., Aguilar-Ruiz, J.: Detecting fault modules applying feature selection to classifiers. In: IEEE International Conference on Information Reuse and Integration (IRI’07) (2007a)
Google Scholar
Rodriguez, D., Ruiz, R., Cuadrado-Gallego, J., Aguilar-Ruiz, J., Garre, M.: Attribute selection in software engineering datasets for detecting fault modules. In: 33rd EUROMICRO Conference on Software Engineering and Advanced Applications (EUROMICRO’07) (2007b)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97, 273–324 (1997)
Article MATH Google Scholar
Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans. Knowl. Data Eng. 15, 1437–1447 (2003)
Article Google Scholar
Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 22, 4–37 (2000)
Article Google Scholar
Chen, Z., Boehm, B., Menzies, T., Port, D.: Finding the right data for software cost modeling. IEEE Softw. 22, 38–46 (2005)
Article Google Scholar
Janecek, A., Gansterer, W., Demel, M., Ecker, G.: On the relationship between feature selection and classification accuracy. In: Proceedings of the 3rd Workshop on New Challenges for Feature Selection in Data Mining and Knowledge Discovery (FSDM’08), Microtome Publishing, Brookline, MA, USA (2008)
Google Scholar
Burke, E.K., Kendall, G. (eds.): Search methodologies—Introductory tutorials in optimization and decision support techniques. Springer Science and Business Media, Inc., 233 Spring Street, New York, USA (2005)
Google Scholar
Dybå, T., Kampenes, V.B., Sjøberg, D.I.: A systematic review of statistical power in software engineering experiments. Inf. Softw. Technol. 48(8), 745–755 (2006)
Article Google Scholar
Afzal, W., Torkar, R., Feldt, R., Gorschek, T.: Genetic programming for cross-release fault count predictions in large and complex software projects. In: Chis, M. (ed.) Evolutionary Computation and Optimization Algorithms in Software Engineering: Applications and Techniques, pp. 94–126. IGI Global, Hershey, USA (2009)
Google Scholar
Muni, D., Pal, N., Das, J.: Genetic programming for simultaneous feature selection and classifier design. IEEE Trans. Syst. Man Cybern. B Cybern. 36(1), 106–117 (2006)
Article Google Scholar
Smith, M.G., Bull. L.: Feature construction and selection using genetic programming and a genetic algorithm. In: Proceedings of the 6th European Conference on Genetic Programming (EuroGP’03), Springer-Verlag, Berlin, Heidelberg (2003)
Google Scholar
Vivanco, R., Kamei, Y., Monden, A., Matsumoto, K., Jin, D.: Using search-based metric selection and oversampling to predict fault prone modules. In: 2010 23rd Canadian Conference on Electrical and Computer Engineering (CCECE’10) (2010)
Google Scholar
Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. IEEE Intell. Syst. and Their Appl. 13(2), 44–49 (1998)
Article Google Scholar
Boetticher, G., Menzies, T., Ostrand, T.: PROMISE repository of empirical software engineering data. http://promisedata.org/ repository, West Virginia University, Department of Computer Science (2007)
Molina, L.C., Belanche, L., Nebot, Àngela: Feature selection algorithms: a survey and experimental evaluation. Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), pp. 306–313. IEEE Computer Society, Washington, DC, USA (2002)
Chapter Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97, 245–271 (1997)
Article MathSciNet MATH Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(1–4), 131–156 (1997)
Article Google Scholar
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Article Google Scholar
Dejaeger, K., Verbeke, W., Martens, D., Baesens, B.: Data mining techniques for software effort estimation: a comparative study. IEEE Trans. Softw. Eng. 38, 375–397 (2012)
Article Google Scholar
Chen, Z., Menzies, T., Port, D., Boehm, B.: Feature subset selection can improve software cost estimation accuracy. SIGSOFT Softw. Eng. Notes 30(4), 1–6 (2005)
Google Scholar
Menzies, T., Jalali, O., Hihn, J., Baker, D., Lum, K.: Stable rankings for different effort models. Autom. Softw. Eng. 17, 409–437 (2010)
Article Google Scholar
Kirsopp, C., Shepperd, M.J., Hart, J.: Search heuristics, case-based reasoning and software project effort prediction. Proceedings of the 2002 Genetic and Evolutionary Computation Conference (GECCO’02), pp. 1367–1374. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2002)
Google Scholar
Azzeh, M., Neagu, D., Cowling, P.: Improving analogy software effort estimation using fuzzy feature subset selection algorithm. In: Proceedings of the 4th International Workshop on Predictor Models in Software Engineering (PROMISE’08), ACM, New York, NY, USA (2008)
Google Scholar
Li, Y., Xie, M., Goh, T.: A study of mutual information based feature selection for case based reasoning in software cost estimation. Expert Systems with Applications 36(3, Part 2):5921–5931 (2009)
Google Scholar
Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1), 2–13 (2007)
Article Google Scholar
Catal, C., Diri, B.: Investigating the effect of dataset size, metrics sets, and feature selection techniques on software fault prediction problem. Inf. Sci. 179, 1040–1058 (2009)
Article Google Scholar
Khoshgoftaar, T.M., Seliya, N., Sundaresh, N.: An empirical study of predicting software faults with case-based reasoning. Softw. Qual. Control 14, 85–111 (2006)
Article Google Scholar
Wang, H., Khoshgoftaar, T., Gao, K., Seliya, N.: High-dimensional software engineering data and feature selection. In: 21st International Conference on Tools with Artificial Intelligence (ICTAI’09), pp. 83–90 (2009)
Google Scholar
Khoshgoftaar, T.M., Nguyen, L., Gao, K., Rajeevalochanam, J.: Application of an attribute selection method to CBR-based software quality classification. In: Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’03), IEEE Computer Society, Washington, DC, USA (2003)
Google Scholar
Altidor, W., Khoshgoftaar, T.M., Gao, K.: Wrapper-based feature ranking techniques for determining relevance of software engineering metrics. Int. J. Reliab. Qual. Saf. Eng. 17, 425–464 (2010)
Article Google Scholar
Gao, K., Khoshgoftaar, T., Seliya, N.: Predicting high-risk program modules by selecting the right software measurements. Softw. Qual. J. 20, 3–42 (2012)
Article Google Scholar
Gao, K., Khoshgoftaar, T.M., Wang, H., Seliya, N.: Choosing software metrics for defect prediction: an investigation on feature selection techniques. Softw. Pract. Experience 41(5), 579–606 (2011)
Article Google Scholar
Khoshgoftaar, T.M., Gao, K., Napolitano, A.: An empirical study of feature ranking techniques for software quality prediction. Int. J. Softw. Eng. Knowl. Eng. (IJSEKE) 22, 161–183 (2012)
Article Google Scholar
Wang, H., Khoshgoftaar, T.M., Napolitano, A.: Software measurement data reduction using ensemble techniques. Neurocomputing 92, 124–132 (2012)
Article Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1993)
Google Scholar
Novakovic, J.: Using information gain attribute evaluation to classify sonar targets. In: Proceedings of the 17th Telecommunications forum (TELFOR’09) (2009)
Google Scholar
Kira, K., Rendell, L.A.: The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 10th National Conference on Artificial Intelligence (AAAI’92) (1992)
Google Scholar
Sikonja, M., Kononenko, I.: An adaptation of relief for attribute estimation in regression. In: Proceedings of the 14th International Conference on Machine Learning (ICML’97) (1997)
Google Scholar
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the 2000 International Conference on Machine Learning (ICML’00), Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2000)
Google Scholar
Liu, H., Setiono, R.: A probabilistic approach to feature selection—A filter solution. Proceedings of the 1996 International Conference on Machine Learning (ICML’96), pp. 319–327. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1996)
Google Scholar
Poli, R., Langdon, W.B., McPhee, N.F.: A field guide to genetic programming. Published via http://lulu.com and freely available at http://www.gp-field-guide.org.uk. URL: http://www.gp-field-guide.org.uk, (with contributions by Koza, J.R.) (2008)
Koza, J.R.: Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, MA, USA (1992)
MATH Google Scholar
Silva, S.: GPLAB—A genetic programming toolbox for MATLAB. http://gplab.sourceforge.net, Last checked: 22 Dec 2014 (2007)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Mach. Learn. 29(2–3), 131–163 (1997)
Article MATH Google Scholar
Rish, I.: An empirical study of the naive Bayes classifier. In: Proceedings of the workshop on empirical methods in AI (IJCAI’01) (2001)
Google Scholar
Kotsiantis, S., Zaharakis, I., Pintelas, P.: Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26(3), 159–190 (2007)
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)
Article Google Scholar
Menzies, T., DiStefano, J., Orrego, A., Chapman, R.M.: Assessing predictors of software defects. In: Proceedings of the Workshop on Predictive Software Models, collocated with ICSM’04. URL: http://menzies.us/pdf/04psm.pdf (2004)
El-Emam, K., Benlarbi, S., Goel, N., Rai, S.N.: Comparing case-based reasoning classifiers for predicting high risk software components. J. Syst. Softw. 55(3), 301–320 (2001)
Article Google Scholar
Ma, Y., Cukic, B.: Adequate and precise evaluation of quality models in software engineering studies. In: Proceedings of the 3rd International Workshop on Predictor Models in Software Engineering (PROMISE’07), IEEE Computer Society, pp 1, Washington, DC, USA(2007)
Google Scholar
Fawcett, T.: An introduction to ROC analysis. Pattern Recogn. Lett. 27(8), 861–874 (2006)
Article MathSciNet Google Scholar
Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36 (1982)
Google Scholar
Ling, C.X., Huang, J., Zhang, H.: AUC: a statistically consistent and more discriminating measure than accuracy. In: Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI’03) (2003)
Google Scholar
Yousef, W.A., Wagner, R.F., Loew, M.H.: Comparison of non-parametric methods for assessing classifier performance in terms of ROC parameters. In: Proceedings of the 33rd Applied Imagery Pattern Recognition Workshop (AIPR’04), IEEE Computer Society, Washington, DC, USA (2004)
Google Scholar
Jiang, Y., Cukic, B., Menzies, T., Bartlow, N.: Comparing design and code metrics for software quality prediction. In: Proceedings of the 4th international workshop on predictor models in software engineering (PROMISE’08), ACM, New York, NY, USA (2008)
Google Scholar
Jiang, Y., Cukic, B., Menzies, T.: Fault prediction using early lifecycle data. In: Proceedings of the 18th IEEE International Symposium on Software Reliability (ISSRE’07), IEEE Computer Society, Washington, DC, USA (2007)
Google Scholar
Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn. 30, 1145–1159 (1997)
Article Google Scholar
Kitchenham, B.A., Pickard, L.M., MacDonell, S., Shepperd, M.: What accuracy statistics really measure? IEE Proc. Softw. 148(3) (2001)
Google Scholar
Myrtveit, I., Stensrud, E., Shepperd, M.: Reliability and validity in comparative studies of software prediction models. IEEE Trans. Softw. Eng. 31(5), 380–391 (2005)
Article Google Scholar
Langdon, W.B., Buxton, B.F.: Genetic programming for mining DNA chip data from cancer patients. Genet. Program Evolvable Mach. 5, 251–257 (2004)
Article Google Scholar
Wohlin, C., Runeson, P., Höst, M., Ohlsson, M., Regnell, B., Wesslén, A.: Experimentation in software engineering: an introduction. Kluwer Academic Publishers, USA (2000)
Book MATH Google Scholar
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th International Joint conference on Artificial Intelligence (IJCAI’95), Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Innovation, Design & Engineering, Mälardalen University, Västerås, Sweden
Wasif Afzal
Blekinge Institute of Technology, Karlskrona, Sweden
Richard Torkar
Chalmers University of Technology, Gothenburg, Sweden
Richard Torkar
University of Gothenburg, Gothenburg, Sweden
Richard Torkar
Department of Computer Science, Bahria University, Islamabad, Pakistan
Wasif Afzal

Authors

Wasif Afzal
View author publications
You can also search for this author in PubMed Google Scholar
Richard Torkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wasif Afzal .

Editor information

Editors and Affiliations

Department of Electrical and Computer En, University of Alberta, Edmonton, Alberta, Canada
Witold Pedrycz
Innopolis University, Bolzano, Italy
Giancarlo Succi
Center for Applied Software Engineering, Bolzano, Italy
Alberto Sillitti

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Afzal, W., Torkar, R. (2016). Towards Benchmarking Feature Subset Selection Methods for Software Fault Prediction. In: Pedrycz, W., Succi, G., Sillitti, A. (eds) Computational Intelligence and Quantitative Software Engineering. Studies in Computational Intelligence, vol 617. Springer, Cham. https://doi.org/10.1007/978-3-319-25964-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-25964-2_3
Published: 15 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25962-8
Online ISBN: 978-3-319-25964-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics