Predicting the objective and priority of issue reports in software repositories

Izadi, Maliheh; Akbari, Kiana; Heydarnoori, Abbas

doi:10.1007/s10664-021-10085-3

Predicting the objective and priority of issue reports in software repositories

Published: 01 February 2022

Volume 27, article number 50, (2022)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

968 Accesses
24 Citations
3 Altmetric
Explore all metrics

Abstract

Software repositories such as GitHub host a large number of software entities. Developers collaboratively discuss, implement, use, and share these entities. Proper documentation plays an important role in successful software management and maintenance. Users exploit Issue Tracking Systems, a facility of software repositories, to keep track of issue reports, to manage the workload and processes, and finally, to document the highlight of their team’s effort. An issue report is a rich source of collaboratively-curated software knowledge, and can contain a reported problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. GitHub provides labels for tagging issues, as a means of issue management. However, about half of the issues in GitHub’s top 1000 repositories do not have any labels. In this work, we aim at automating the process of managing issue reports for software teams. We propose a two-stage approach to predict both the objective behind opening an issue and its priority level using feature engineering methods and state-of-the-art text classifiers. To the best of our knowledge, we are the first to fine-tune a Transformer for issue classification. We train and evaluate our models in both project-based and cross-project settings. The latter approach provides a generic prediction model applicable for any unseen software project or projects with little historical data. Our proposed approach can successfully predict the objective and priority level of issue reports with \(82\%\) (fine-tuned RoBERTa) and \(75\%\) (Random Forest) accuracy, respectively. Moreover, we conducted human labeling and evaluation on unlabeled issues from six unseen GitHub projects to assess the performance of the cross-project model on new data. The model achieves \(90\%\) accuracy on the sample set. We measure inter-rater reliability and obtain an average Percent Agreement of \(85.3\%\) and Randolph’s free-marginal Kappa of 0.71 that translate to a substantial agreement among labelers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An empirical study on the issue reports with questions raised during the issue resolving process

Article 06 August 2018

Identifying self-admitted technical debt in issue tracking systems using machine learning

Article Open access 10 July 2022

Requirements and GitHub Issues: An Automated Approach for Quality Requirements Classification

Article 28 December 2021

Notes

https://github.com/MalihehIzadi/IssueReportsManagement
https://zenodo.org/record/4925855#.YNME2r4zbtQ
https://developer.github.com/v3/
https://api.github.com/search/repositories?q=stars:>500&sort=stars
https://docs.github.com/en/issues/using-labels-and-milestones-to-track-work/managing-labels
https://github.com/casics/spiral.
https://www.nltk.org/
http://sentistrength.wlv.ac.uk/
https://textblob.readthedocs.io/en/dev/
A complete list of these 66 clusters is available in our repository.
https://github.com/MalihehIzadi/IssueReportsManagement

References

Aghamohammadi A, Izadi M, Heydarnoori A (2020) Generating summaries for methods of event-driven programs: an android case study. J Syst Softw 170:110,800
Al Shalabi L, Shaaban Z, Kasasbeh B (2006) Data mining: a preprocessing engine. J Comp Sci 2(9):735–739
Article Google Scholar
Alenezi M, Banitaan S (2013) Bug reports prioritization: which features and classifier to use? In: 2013 12th international conference on machine learning and applications. IEEE, Miami, FL, USA, pp 112–116. https://doi.org/10.1109/ICMLA.2013.114. http://ieeexplore.ieee.org/document/6786091/
Alonso O, Marshall C, Najork M (2014) Crowdsourcing a subjective labeling task: a human-centered framework to ensure reliable results. Microsoft Res., Redmond, WA, USA, Tech. Rep. MSR-TR-2014–91
Antoniol G, Ayari K, Di Penta M, Khomh F, Guéhéneuc YG (2008) Is it a bug or an enhancement? A text-based approach to classify change requests. In: Proceedings of the 2008 conference of the center for advanced studies on collaborative research meeting of minds - CASCON ’08. ACM Press, Ontario, Canada, pp. 304. https://doi.org/10.1145/1463788.1463819. http://portal.acm.org/citation.cfm?doid=1463788.1463819
Baltes S, Diehl S (2019) Usage and attribution of stack overflow code snippets in github projects. Emp Softw Eng 24(3):1259–1295
Article Google Scholar
Baltes S, Treude C, Diehl S (2019) Sotorrent: studying the origin, evolution, and usage of stack overflow code snippets. In: 2019 IEEE/ACM 16th international conference on mining software repositories (MSR). IEEE, pp 191–194
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13(1):281–305
MathSciNet MATH Google Scholar
Bissyandé TF, Lo D, Jiang L, Réveillere L, Klein J, Le Traon Y (2013) Got issues? Who cares about it? a large scale investigation of issue trackers from github. In: 2013 IEEE 24th international symposium on software reliability engineering (ISSRE). IEEE, pp 188–197
Brennan RL, Prediger DJ (1981) Coefficient kappa: some uses, misuses, and alternatives. Educ Psychol Measure 41(3):687–699
Article Google Scholar
Cabot J, Izquierdo JLC, Cosentino V, Rolandi B (2015) Exploring the use of labels to categorize issues in open-source software projects. In: 2015 IEEE 22nd international conference on software analysis, evolution, and reengineering (SANER). IEEE, pp 550–554
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Chen AR, Chen THP, Wang S (2021) Demystifying the challenges and benefits of analyzing user-reported logs in bug reports. Emp Softw Eng 26(1):1–30
Google Scholar
da Costa DA, McIntosh S, Treude C, Kulesza U, Hassan AE (2018) The impact of rapid release cycles on the integration delay of fixed issues. Emp Softw Eng 23(2):835–904
Article Google Scholar
Dash M, Liu H (1997) Feature selection for classification. Intell Data Anal 1(3):131–156
Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805
Dhasade AB, Venigalla ASM, Chimalakonda S (2020) Towards prioritizing github issues. In: Proceedings of the 13th innovations in software engineering conference on formerly known as India software engineering conference, pp 1–5
Di Sorbo A, Grano G, Aaron Visaggio C, Panichella S (2020) Investigating the criticality of user-reported issues through their relations with app rating. J Softw Evol Process, pp e2316
Fan Q, Yu Y, Yin G, Wang T, Wang H (2017) Where is the road for issue reports classification based on text mining? In: 2017 ACM/IEEE international symposium on Emp Softw Eng and measurement (ESEM). IEEE, pp 121–130
Fleiss JL (1971) Measuring nominal scale agreement among many raters. Psychological Bulletin 76(5):378
Article Google Scholar
Fleiss JL, Cohen J (1973) The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability. Educ Psychol Measure 33(3):613–619
Article Google Scholar
Gao C, Wang B, He P, Zhu J, Zhou Y, Lyu MR (2015) PAID: prioritizing app issues for developers by tracking user reviews over versions. In: 2015 IEEE 26th international symposium on software reliability engineering (ISSRE). IEEE, Gaithersbury, MD, USA, pp 35–45. https://doi.org/10.1109/ISSRE.2015.7381797. http://ieeexplore.ieee.org/document/7381797/
Gousios G, Zaidman A, Storey MA, Van Deursen A (2015) Work practices and challenges in pull-based development: the integrator’s perspective. In: 2015 IEEE/ACM 37th IEEE international conference on software engineering. IEEE, vol 1, pp 358–368
Gwet KL (2008) Computing inter-rater reliability and its variance in the presence of high agreement. Brit J Mathemat Stat Psychol 61(1):29–48
Article MathSciNet Google Scholar
Herzig K, Just S, Zeller A (2013) It’s not a bug, it’s a feature: how misclassification impacts bug prediction. In: 2013 35th international conference on software engineering (ICSE). IEEE, pp 392–401
Hu H, Wang S, Bezemer CP, Hassan AE (2019) Studying the consistency of star ratings and reviews of popular free hybrid android and ios apps. Emp Softw Eng 24(1):7–32
Article Google Scholar
Huang Q, Xia X, Lo D, Murphy GC (2018) Automating intention mining. IIEEE Trans. Software Eng, pp 1–1. https://doi.org/10.1109/TSE.2018.2876340. https://ieeexplore.ieee.org/document/8493285/
Izadi M, Heydarnoori A, Gousios G (2021) Topic recommendation for software repositories using multi-label classification algorithms. Emp Softw Eng 26(5):1–33
Google Scholar
Kalliamvakou E, Gousios G, Blincoe K, Singer L, German DM, Damian D (2014) The promises and perils of mining github. In: Proceedings of the 11th working conference on mining software repositories, pp 92–101
Kallis R, Di Sorbo A, Canfora G, Panichella S (2019) Ticket tagger: machine learning driven issue classification. In: 2019 IEEE international conference on software maintenance and evolution (ICSME). IEEE, Cleveland, OH, USA, pp 406–409. https://doi.org/10.1109/ICSME.2019.00070. https://ieeexplore.ieee.org/document/8918993/
Kanwal J, Maqbool O (2012) Bug prioritization to facilitate bug report triage. J Comput Sci Technol. 27(2):397–412 (2012). https://doi.org/10.1007/s11390-012-1230-3. http://link.springer.com/10.1007/s11390-012-1230-3
Khandkar SH (2009) University of Calgary. Open coding 23:2009
Google Scholar
Kikas R, Dumas M, Pfahl D (2016) Using dynamic and contextual features to predict issue lifetime in github projects. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR). IEEE, pp 291–302
Kitchenham BA, Mendes E, Travassos GH (2007) Cross versus within-company cost estimation studies: a systematic review. IEEE Trans Softw Eng 33(5):316–329
Article Google Scholar
Landis JR, Koch GG (1977) An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. Biometrics, pp 363–374
Li C, Xu L, Yan M, He J, Zhang Z (2019) Tagdeeprec: tag recommendation for software information sites using attention-based bi-lstm. In: International conference on knowledge science, engineering and management. Springer, pp 11–24
Liao Z, He D, Chen Z, Fan X, Zhang Y, Liu S (2018) Exploring the characteristics of issue-related behaviors in github using visualization techniques. IEEE Access 6:24003–24015
Article Google Scholar
Limsettho N, Hata H, Monden A, Matsumoto K (2016) Unsupervised bug report categorization using clustering and labeling algorithm. Int J Softw Eng Knowledge Eng 26(07):1027–1053
Article Google Scholar
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V (2019) Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692
McHugh ML (2012) Interrater reliability: the kappa statistic. Biochemia medica 22(3):276–282
Article MathSciNet Google Scholar
Merten T, Falis M, Hübner P, Quirchmayr T, Bürsner S, Paech B (2016) Software feature request detection in issue tracking systems. In: 2016 IEEE 24th international requirements engineering conference (RE). IEEE, pp 166–175
Noei E, Zhang F, Wang S, Zou Y (2019) Towards prioritizing user-related issue reports of mobile applications. Empir Software Eng 24(4):1964–1996. https://doi.org/10.1007/s10664-019-09684-y. http://link.springer.com/10.1007/s10664-019-09684-y
Noei E, Zhang F, Zou Y (2019) Too many user-reviews, what should app developers look at first? IIEEE Trans Software Eng, pp 1–1. https://doi.org/10.1109/TSE.2019.2893171. https://ieeexplore.ieee.org/document/8613795/
Pandey N, Sanyal DK, Hudait A, Sen A (2017) Automated classification of software issue reports using machine learning techniques: an empirical study. Innov Syst Softw Eng 13(4):279–297
Article Google Scholar
Peters F, Menzies T, Marcus A (2013) Better cross company defect prediction. In: 2013 10th working conference on mining software repositories (MSR). IEEE, pp 409–418
Pingclasai N, Hata H, Matsumoto KI (2013) Classifying bug reports to bugs and other requests using topic modeling. In: 2013 20Th asia-pacific software engineering conference (APSEC). IEEE, vol 2, pp 13–18
Randolph JJ (2005) Free-marginal multirater kappa (multirater k [free]): An alternative to fleiss’ fixed-marginal multirater kappa. Online submission
Sharma M, Bedi P, Chaturvedi K, Singh V (2012) Predicting the priority of a reported bug using machine learning techniques and cross project validation. In: 2012 12th International Conference on Intelligent Systems Design and Applications (ISDA). IEEE, pp 539–545
Sohrawardi SJ, Azam I, Hosain S (2014) A comparative study of text classification algorithms on user submitted bug reports. In: Ninth International Conference on Digital Information Management (ICDIM 2014). IEEE, Phitsanulok, Thailand, pp 242–247. https://doi.org/10.1109/ICDIM.2014.6991434. http://ieeexplore.ieee.org/document/6991434/
Song Y, Chaparro O (2020) Bee: a tool for structuring and analyzing bug reports. In: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pp 1551–1555
Svyatkovskiy A, Deng SK, Fu S, Sundaresan N (2020) Intellicode compose: code generation using transformer. arXiv preprint arXiv:2005.08025
Tavakoli M, Izadi M, Heydarnoori A (2020) Improving quality of a post’s set of answers in stack overflow. In: 46th Euromicro conference on software engineering and advanced applications, SEAA 2020, Portoroz, Slovenia, August 26-28, 2020. IEEE, pp 504–512. https://doi.org/10.1109/SEAA51224.2020.00084
Terdchanakul P, Hata H, Phannachitta P, Matsumoto K (2017) Bug or not? Bug report classification using n-gram idf. In: 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE, pp 534–538
Tian Y, Lo D, Sun C (2013) DRONE: predicting priority of reported bugs by multi-factor analysis. In: 2013 IEEE International Conference on Software Maintenance. IEEE, Eindhoven, Netherlands, pp 200–209. https://doi.org/10.1109/ICSM.2013.31. http://ieeexplore.ieee.org/document/6676891/
Uddin J, Ghazali R, Deris MM, Naseem R, Shah H (2017) A survey on bug prioritization. Artif Intell Rev 47(2):145–180. https://doi.org/10.1007/s10462-016-9478-6. http://link.springer.com/10.1007/s10462-016-9478-6
Vasilescu B, Filkov V, Serebrenik A (2013) Stackoverflow and github: associations between software development and crowdsourced knowledge. In: 2013 international conference on social computing. IEEE, pp 188–195
Vasilescu B, Serebrenik A, Devanbu P, Filkov V (2014) How social q&a sites are changing knowledge sharing in open source software communities. In: Proceedings of the 17th ACM conference on computer supported cooperative work & social computing, pp 342–354
Vee EVD, Gousios G, Zaidman A (2015) Automatically prioritizing pull requests. In: 2015 IEEE/ACM 12th working conference on mining software repositories. IEEE, Florence, Italy, pp 357–361. https://doi.org/10.1109/MSR.2015.40. http://ieeexplore.ieee.org/document/7180094/
Wan Y, Zhao Z, Yang M, Xu G, Ying H, Wu J, Yu PS (2018) Improving automatic source code summarization via deep reinforcement learning. In: Proceedings of the 33rd ACM/IEEE international conference on automated software engineering, pp 397–407
Wang S, Lo D, Vasilescu B, Serebrenik A (2018) Entagrec++: an enhanced tag recommendation system for software information sites. Emp Softw Eng 23(2):800–832
Article Google Scholar
Weiss GM, Provost F (2001) The effect of class distribution on classifier learning: an empirical study
Wu Y, Wang S, Bezemer CP, Inoue K (2019) How do developers utilize source code from stack overflow? Emp Softw Eng 24(2):637–673
Article Google Scholar
Yu Y, Wang H, Filkov V, Devanbu P, Vasilescu B (2015) Wait for it: determinants of pull request evaluation latency on github. In: 2015 IEEE/ACM 12th working conference on mining software repositories. IEEE, pp 367–371
Yu Y, Zeng Y, Fan Q, Wang H (2018) Transferring well-trained models for cross-project issue classification: a large-scale empirical study. In: Proceedings of the Tenth Asia-Pacific Symposium on Internetware, pp 1–6
Zeng Y, Chen J, Shang W, Chen THP (2019) Studying the characteristics of logging practices in mobile apps: a case study on f-droid. Emp Softw Eng 24(6):3394–3434
Article Google Scholar
Zhang J, Wang X, Hao D, Xie B, Zhang L, Mei H (2015) A survey on bug-report analysis. Sci China Inform Sci 58(2):1–24
Article Google Scholar
Zhou J, Wang S, Bezemer CP, Zou Y, Hassan AE (2020) Studying the association between bountysource bounties and the issue-addressing likelihood of github issue reports. IEEE Trans Softw Eng
Zhou Y, Tong Y, Gu R, Gall H (2016) Combining text mining and data mining for bug report classification. J Softw Evolut Process 28(3):150–176
Article Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent Software Engineering Lab, Sharif University of Technology, Tehran, Iran
Maliheh Izadi, Kiana Akbari & Abbas Heydarnoori

Authors

Maliheh Izadi
View author publications
You can also search for this author in PubMed Google Scholar
Kiana Akbari
View author publications
You can also search for this author in PubMed Google Scholar
Abbas Heydarnoori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maliheh Izadi.

Additional information

Communicated by: Shaowei Wang, Tse-Hsun (Peter) Chen, Sebastian Baltes, Ivano Malavolta, Christoph Treude, and Alexander Serebrenik

Dr. Abbas Heydarnoori is also a corresponding author for this work.

Appendix: Priority Labels

Table 8 presents the list of manually extracted labels from top GitHub repositories (most star) for categories of high and low priority issues.

Table 8 Selected labels for each category of issue priority

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Izadi, M., Akbari, K. & Heydarnoori, A. Predicting the objective and priority of issue reports in software repositories. Empir Software Eng 27, 50 (2022). https://doi.org/10.1007/s10664-021-10085-3

Download citation

Accepted: 02 November 2021
Published: 01 February 2022
DOI: https://doi.org/10.1007/s10664-021-10085-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting the objective and priority of issue reports in software repositories

Abstract

Access this article

Similar content being viewed by others

An empirical study on the issue reports with questions raised during the issue resolving process

Identifying self-admitted technical debt in issue tracking systems using machine learning

Requirements and GitHub Issues: An Automated Approach for Quality Requirements Classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Priority Labels

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting the objective and priority of issue reports in software repositories

Abstract

Access this article

Similar content being viewed by others

An empirical study on the issue reports with questions raised during the issue resolving process

Identifying self-admitted technical debt in issue tracking systems using machine learning

Requirements and GitHub Issues: An Automated Approach for Quality Requirements Classification

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: Priority Labels

Appendix: Priority Labels

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation