How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)

Shu, Rui; Xia, Tianpei; Chen, Jianfeng; Williams, Laurie; Menzies, Tim

doi:10.1007/s10664-020-09906-8

How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)

Published: 05 April 2021

Volume 26, article number 53, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

Rui Shu¹,
Tianpei Xia¹,
Jianfeng Chen¹,
Laurie Williams¹ &
…
Tim Menzies ORCID: orcid.org/0000-0002-5040-3196¹

594 Accesses
11 Citations
Explore all metrics

Abstract

Background

In order that the general public is not vulnerable to hackers, security bug reports need to be handled by small groups of engineers before being widely discussed. But learning how to distinguish the security bug reports from other bug reports is challenging since they may occur rarely. Data mining methods that can find such scarce targets require extensive optimization effort.

Goal

The goal of this research is to aid practitioners as they struggle to optimize methods that try to distinguish between rare security bug reports and other bug reports.

Method

Our proposed method, called SWIFT, is a dual optimizer that optimizes both learner and pre-processor options. Since this is a large space of options, SWIFT uses a technique called 𝜖-dominance that learns how to avoid operations that do not significantly improve performance.

Result

When compared to recent state-of-the-art results (from FARSEC which is published in TSE’18), we find that the SWIFT’s dual optimization of both pre-processor and learner is more useful than optimizing each of them individually. For example, in a study of security bug reports from the Chromium dataset, the median recalls of FARSEC and SWIFT were 15.7% and 77.4%, respectively. For another example, in experiments with data from the Ambari project, the median recalls improved from 21.5% to 85.7% (FARSEC to SWIFT).

Conclusion

Overall, our approach can quickly optimize models that achieve better recalls than the prior state-of-the-art. These increases in recall are associated with moderate increases in false positive rates (from 8% to 24%, median). For future work, these results suggest that dual optimization is both practical and useful.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

E-APR: Mapping the effectiveness of automated program repair techniques

Article 13 July 2021

Inferring test models from user bug reports using multi-objective search

Article Open access 20 June 2023

Provable training set debugging for linear regression

Article 16 August 2021

Notes

e.g. Fig. 12 of Menzies et al. (2007c) lists nine SE data mining applications with median false positive rates of 25%.

References

Agrawal A, Menzies T (2018) Is “Better Data” Better than “Better Data Miner”? (on the benefits of tuning SMOTE for defect prediction). In: Proceedings of the 40th international conference on software engineering, ACM, pp 1050–1061
Agrawal A, Fu W, Menzies T (2018) What is wrong with topic modeling? and how to fix it using search-based software engineering. Inf Softw Technol 98:74–88
Article Google Scholar
Agrawal A, Fu W, Chen D, Shen X, Menzies T (2019) How to “DODGE” complex software analytics. IEEE Trans Softw Eng
Arcuri A, Briand L (2011) A practical guide for using statistical tests to assess randomized algorithms in software engineering. In: Proceedings of the 33rd international conference on software engineering ICSE ’11. https://doi.org/10.1145/1985793.1985795. ACM, New York, pp 1–10
Bennin KE, Keung JW, Monden A (2019) On the relative value of data resampling approaches for software defect prediction. Empir Softw Eng 24 (2):602–636
Article Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(Feb):281–305
MathSciNet MATH Google Scholar
Bergstra JS, Bardenet R, Bengio Y, Kégl B (2011) Algorithms for hyper-parameter optimization. In: Advances in neural information processing systems, pp 2546–2554
Biedenkapp A, Eggensperger K, Elsken T, Falkner S, Feurer M, Gargiani M, Hutter F, Klein A, Lindauer M, Loshchilov I et al (2018) Hyperparameter optimization. Artif Intell 1:35
Google Scholar
Binkley D, Lawrie D, Morrell C (2018) The need for software specific natural language techniques. Empir Softw Eng 23(4):2398–2425
Article Google Scholar
Black PE, Badger L, Guttman B, Fong E (2016) Dramatically reducing software vulnerabilities. Report to the White House Office of Science and Technology Policy, Information Technology Laboratory
Chan S, Treleaven P, Capra L (2013) Continuous hyperparameter optimization for large-scale recommender systems. In: 2013 IEEE international conference on big data, IEEE, pp 350–358
Chen L et al (2013) R2fix: automatically generating bug fixes from bug reports. Proceedings of the 2013 IEEE 6th ICST
Deb K, Mohan M, Mishra S (2005) Evaluating the ε-domination based multi-objective evolutionary algorithm for a quick computation of pareto-optimal solutions. Evol Comput 13(4):501–525
Article Google Scholar
Deshmukh J, Podder S, Sengupta S, Dubash N, et al. (2017) Towards accurate duplicate bug retrieval using deep learning techniques. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 115–124
Di Francescomarino C, Dumas M, Federici M, Ghidini C, Maggi F M, Rizzi W, Simonetto L (2018) Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Inf Syst 74:67–83
Article Google Scholar
Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Boca Raton
Book Google Scholar
Feurer M, Springenberg JT, Hutter F (2015) Initializing bayesian hyperparameter optimization via meta-learning. In: Twenty-Ninth AAAI conference on artificial intelligence
Fu W, Menzies T (2017) Easy over hard: A case study on deep learning. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering. ACM, pp 49–60
Fu W, Menzies T, Shen X (2016) Tuning for software analytics: is it really necessary? Inf Softw Technol 76:135–146
Article Google Scholar
Gegick M, Rotella P, Xie T (2010) Identifying security bug reports via text mining: An industrial case study. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 11–20
Goldberg DE (2006) Genetic algorithms. Pearson Education India
Goseva-Popstojanova K, Tyo J (2018) Identification of security related bug reports via text mining using supervised and unsupervised classification. In: 2018 IEEE international conference on software quality, reliability and security (QRS). IEEE, pp 344–355
Graham P (2004) Hackers & painters: big ideas from the computer age. O’Reilly Media, Inc
Google Scholar
Han X, Yu T, Lo D (2018) Perflearner: learning from bug reports to understand and generate performance test frames. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering. ACM, pp 17–28
Herodotou H, Lim H, Luo G, Borisov N, Dong L, Cetin FB, Babu S (2011) Starfish: a self-tuning system for big data analytics. In: Cidr, vol 11, pp 261–272
Hindle A, Alipour A, Stroulia E (2016) A contextual approach towards more accurate duplicate bug report detection and ranking. Empir Softw Eng 21 (2):368–410
Article Google Scholar
Holland JH (1992) Genetic algorithms. Sci Am 267(1):66–73
Article Google Scholar
Huang Q, Xia X, Lo D (2017) Supervised vs unsupervised models: A holistic look at effort-aware just-in-time defect prediction. In: 2017 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 159–170
Huang Q, Xia X, Lo D (2019) Revisiting supervised and unsupervised models for effort-aware just-in-time defect prediction. Empir Softw Eng 24 (5):2823–2862
Article Google Scholar
Jalali O, Menzies T, Feather M (2008) Optimizing requirements decisions with keys. In: Proceedings of the 4th international workshop on predictor models in software engineering. ACM, pp 79–86
Kampenes VB, Dybå T, Hannay JE, Sjøberg DIK (2007) A systematic review of effect size in software engineering experiments. Inf Softw Technol 49(11–12):1073–1086
Article Google Scholar
Keller JM, Gray MR, Givens JA (1985) A fuzzy k-nearest neighbor algorithm. IEEE Trans Sys Man Cybern (4)580–585
Kim S, Zhang H, Wu R, Gong L (2011) Dealing with noise in defect prediction. In: 2011 33rd international conference on software engineering (ICSE). IEEE, pp 481–490
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220(4598):671–680
Article MathSciNet Google Scholar
Kochhar PS, Xia X, Lo D, Li S (2016) Practitioners’ expectations on automated fault localization. In: Proceedings of the 25th international symposium on software testing and analysis. ACM, pp 165–176
Lamkanfi A, Demeyer S, Giger E, Goethals B (2010) Predicting the severity of a reported bug. In: 2010 7th IEEE working conference on mining software repositories (MSR). IEEE, pp 1–10
Lazar A, Ritchey S, Sharif B (2014) Improving the accuracy of duplicate bug report detection using textual similarity measures. In: Proceedings of the 11th working conference on mining software repositories. ACM, pp 308–311
Lessmann S, Baesens B, Mues C, Pietsch S (2008) Benchmarking classification models for software defect prediction: a proposed framework and novel findings. IEEE Trans Softw Eng 34(4):485–496
Article Google Scholar
Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2017) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):6765–6816
MathSciNet MATH Google Scholar
Menzies T, Shepperd M (2019) “Bad smells” in software analytics papers. Inf Softw Technol 112:35–47
Article Google Scholar
Menzies T, Greenwald J, Frank A (2006) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Eng 33(1):2–13
Article Google Scholar
Menzies T, Dekhtyar A, Distefano J, Greenwald J (2007a) Problems with precision: a response to” comments on’data mining static code attributes to learn defect predictors’”. IEEE Trans Softw Eng 33(9):637–640
Article Google Scholar
Menzies T, Elrawas O, Hihn J, Feather M, Madachy R, Boehm B (2007b) The business case for automated software engineering. In: Proceedings of the Twenty-second IEEE/ACM international conference on automated software engineering ASE ’07. https://doi.org/10.1145/1321631.1321676. ACM, New York, pp 303–312
Menzies T, Greenwald J, Frank A (2007c) Data mining static code attributes to learn defect predictors. IEEE Trans Softw Engineering (1) 2–13
Menzies T, Majumder S, Balaji N, Brey K, Fu W (2018) 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 554–563
MITRE (2017) Common Vulnerabilities and Exposures (CVE). https://cve.mitre.org/about/terminology.html#vulnerability
Mittas N, Angelis L (2013) Ranking and clustering software cost estimation models through a multiple comparisons algorithm. IEEE Trans Softw Eng 39(4):537–551
Article Google Scholar
Nair V, Yu Z, Menzies T, Siegmund N, Apel S (2018) Finding faster configurations using flash. IEEE Trans Softw Eng
Neuhaus S, Zimmermann T (2009) The beauty and the beast: vulnerabilities in red hat’s packages. In: USENIX annual technical conference
Neuhaus S, Zimmermann T, Holler C, Zeller A (2007) Predicting vulnerable software components. In: Proceedings of the 14th ACM conference on computer and communications security. ACM, pp 529–540
Nguyen VH, Tran LMS (2010) Predicting vulnerable software components with dependency graphs. In: Proceedings of the 6th international workshop on security measurements and metrics. ACM, p 3
Novielli N, Girardi D, Lanubile F (2018) A benchmark study on sentiment analysis for software engineering research. In: 2018 IEEE/ACM 15Th international conference on mining software repositories (MSR). IEEE, pp 364–375
Ohira M, Kashiwa Y, Yamatani Y, Yoshiyuki H, Maeda Y, Limsettho N, Fujino K, Hata H, Ihara A, Matsumoto K (2015) A dataset of high impact bugs: manually-classified issue reports. In: 2015 IEEE/ACM 12th working conference on mining software repositories (MSR). IEEE, pp 518–521
Onan A, Korukoğlu S, Bulut H (2016) A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification. Expert Syst Appl 62:1–16
Article Google Scholar
Osman H, Ghafari M, Nierstrasz O (2017) Hyperparameter optimization to improve bug prediction accuracy. In: IEEE workshop on machine learning techniques for software quality evaluation (maLTeSQue). IEEE, pp 33–38
Panichella A, Dit B, Oliveto R, Di Penta M, Poshyvanyk D, De Lucia A (2013) How to effectively use topic models for software engineering tasks? An approach based on genetic algorithms. In: International conference on software engineering
Parnin C, Orso A (2011) Are automated debugging techniques actually helping programmers?. In: Proceedings of the 2011 international symposium on software testing and analysis. ACM, pp 199–209
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Peters F, Tun T, Yu Y, Nuseibeh B (2018) Text filtering and ranking for security bug report prediction. IEEE Trans Softw Eng:Early–Access
Scandariato R, Walden J, Hovsepyan A, Joosen W (2014) Predicting vulnerable software components via text mining. IEEE Trans Softw Eng 40(10):993–1006
Article Google Scholar
Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11(4):341–359
Article MathSciNet Google Scholar
Sun C, Lo D, Khoo SC, Jiang J (2011) Towards more accurate retrieval of duplicate bug reports. In: Proceedings of the 2011 26th IEEE/ACM international conference on automated software engineering. IEEE Computer Society, pp 253–262
Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models. In: 2016 IEEE/ACM 38th international conference on software engineering (ICSE). IEEE, pp 321–332
Tantithamthavorn C, Hassan AE, Matsumoto K (2018) The impact of class rebalancing techniques on the performance and interpretation of defect prediction models. IEEE Trans Softw Eng
The Equifax Data Breach (2019) https://epic.org/privacy/data-breach/equifax/
Thornton C, Hutter F, Hoos HH, Leyton-Brown K (2013) Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 847–855
Tian Y, Lo D, Sun C (2012) Information retrieval based nearest neighbor classification for fine-grained bug severity prediction. In: 2012 19th working conference on reverse engineering. IEEE, pp 215–224
Tian Y, Lo D, Xia X, Sun C (2015) Automated prediction of bug report priority using multi-factor analysis. Empir Softw Eng 20(5):1354–1383
Article Google Scholar
Van Aken D, Pavlo A, Gordon GJ, Zhang B (2017) Automatic database management system tuning through large-scale machine learning. In: Proceedings of the 2017 ACM international conference on management of data. ACM, pp 1009–1024
Vesterstrøm J, Thomsen R (2004) A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In: Congress on evolutionary computation. IEEE
Wang L, Zeng Y, Chen T (2015) Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst Appl 42(2):855–863
Article Google Scholar
Wang Y, Xu W (2018) Leveraging deep learning with lda-based text analytics to detect automobile insurance fraud. Decis Support Syst 105:87–95
Article Google Scholar
WannaCry Ransomware Attack (2017) https://en.wikipedia.org/wiki/WannaCry_ransomware_attack
Wijayasekara D, Manic M, McQueen M (2014) Vulnerability identification and classification via text mining bug databases. In: IECON 2014-40th annual conference of the IEEE industrial electronics society. IEEE, pp 3612–3618
Wolpert DH, Macready WG (1997) No free lunch theorems for optimization. IEEE Trans Evol Comput 1(1):67–82
Article Google Scholar
Xia X, Lo D, Qiu W, Wang X, Zhou B (2014) Automated configuration bug report prediction using text mining. In: 2014 IEEE 38Th annual computer software and applications conference (COMPSAC). IEEE, pp 107–116
Xia X, Lo D, Shihab E, Wang X (2016) Automated bug report field reassignment and refinement prediction. IEEE Trans Reliab 65 (3):1094–1113
Article Google Scholar
Xia Y, Liu C, Li Y, Liu N (2017) A boosted decision tree approach using bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl 78:225–241
Article Google Scholar
Yang X, Lo D, Huang Q, Xia X, Sun J (2016) Automated identification of high impact bug reports leveraging imbalanced learning strategies. In: 2016 IEEE 40Th annual computer software and applications conference (COMPSAC), vol 1. IEEE, pp 227–232
Yang XL, Lo D, Xia X, Huang Q, Sun JL (2017) High-impact bug report identification with imbalanced learning strategies. J Comput Sci Technol 32(1):181–198
Article Google Scholar
Yildizdan G, Baykan ÖK (2020) A novel modified bat algorithm hybridizing by differential evolution algorithm. Expert Syst Appl 141:112949
Article Google Scholar
Zaman S, Adams B, Hassan AE (2011) Security versus performance bugs: a case study on firefox. In: Proceedings of the 8th working conference on mining software repositories. ACM, pp 93–102
Zhang T, Yang G, Lee B, Chan AT (2015) Predicting severity of bug report by mining bug repository with concept profile. In: Proceedings of the 30th annual ACM symposium on applied computing. ACM, pp 1553–1558
Zhou Y, Sharma A (2017) Automated identification of security issues from commit messages and bug reports. In: Proceedings of the 2017 11th joint meeting on foundations of software engineering, pp 914–919
Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evol Process 28(3):150–176
Article Google Scholar

Download references

Acknowledgments

This work was partially funded via an NSF-CISE grant #1909516.

Author information

Authors and Affiliations

Department of Computer Science, North Carolina State University, Raleigh, NC, 27695, USA
Rui Shu, Tianpei Xia, Jianfeng Chen, Laurie Williams & Tim Menzies

Authors

Rui Shu
View author publications
You can also search for this author in PubMed Google Scholar
Tianpei Xia
View author publications
You can also search for this author in PubMed Google Scholar
Jianfeng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Laurie Williams
View author publications
You can also search for this author in PubMed Google Scholar
Tim Menzies
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tim Menzies.

Additional information

Communicated by: Bram Adams

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Shu, R., Xia, T., Chen, J. et al. How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization). Empir Software Eng 26, 53 (2021). https://doi.org/10.1007/s10664-020-09906-8

Download citation

Accepted: 06 November 2020
Published: 05 April 2021
DOI: https://doi.org/10.1007/s10664-020-09906-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to Better Distinguish Security Bug Reports (Using Dual Hyperparameter Optimization)