E-APR: Mapping the effectiveness of automated program repair techniques

Aleti, Aldeida; Martinez, Matias

doi:10.1007/s10664-021-09989-x

E-APR: Mapping the effectiveness of automated program repair techniques

Published: 13 July 2021

Volume 26, article number 99, (2021)
Cite this article

Empirical Software Engineering Aims and scope Submit manuscript

575 Accesses
7 Citations
2 Altmetric
Explore all metrics

Abstract

Automated Program Repair (APR) is a fast growing area with numerous new techniques being developed to tackle one of the most challenging software engineering problems. APR techniques have shown promising results, giving us hope that one day it will be possible for software to repair itself. In this paper, we focus on the problem of objective performance evaluation of APR techniques. We introduce a new approach, Explaining Automated Program Repair (E-APR), which identifies features of buggy programs that explain why a particular instance is difficult for an APR technique. E-APR is used to examine the diversity and quality of the buggy programs used by most researchers, and analyse the strengths and weaknesses of existing APR techniques. E-APR visualises an instance space of buggy programs, with each buggy program represented as a point in the space. The instance space is constructed to reveal areas of hard and easy buggy programs, and enables the strengths and weaknesses of APR techniques to be identified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

Nature-inspired metaheuristic methods in software testing

Article 08 June 2023

Software defect prediction: future directions and challenges

Article 27 February 2024

References

Abreu R, Zoeteweij P, van Gemund AJC (2007) On the accuracy of spectrum-based fault localization. In: Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007), pp. 89–98
Aleti A, Moser I, Meedeniya I, Grunske L (2014) Choosing the appropriate forecasting model for predictive parameter control. Evolutionary computation 22(2):319–349
Article Google Scholar
Anand S, Burke EK, Chen TY, Clark J, Cohen MB, Grieskamp W, Harman M, Harrold MJ, Mcminn P (2013) An orchestrated survey of methodologies for automated software test case generation. Journal of Systems Software 86(8):1978–2001
Article Google Scholar
Appendix (2020) Appendix e-apr. https://github.com/UPHF/eapr
Bengio Y, Chapados N (2003) Extensions to metric-based model selection. J Mach Learn Res 3(Mar):1209–1227
MATH Google Scholar
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ser. COLT ’92. New York, NY, USA: Association for Computing Machinery, p. 144–152. [Online]. Available: https://doi.org/10.1145/130385.130401
Campos J, Riboira A, Perez A, Abreu R (2012) Gzoltar: an eclipse plug-in for testing and debugging. In: 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp 378–381
Charette RN (2009) This Car Runs on Code. [Online; accessed 10-December-2018]. [Online]. Available: https://spectrum.ieee.org/transportation/systems/this-car-runs-on-code
Charte F, Rivera AJ, del Jesus MJ, Herrera F (2015) Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397
Article Google Scholar
Chidamber S, Kemerer C (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Article Google Scholar
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. Software Engineering, IEEE Transactions on 20(6):476–493
Article Google Scholar
Durieux T, Cornu B, Seinturier L, Monperrus M (2017) Dynamic Patch Generation for Null Pointer Exceptions Using Metaprogramming. In: Proceedings of the 24th IEEE International Conference on Software Analysis. Evolution and reengineering (SANER ’17). IEEE, pp 349–358
Durieux T, Madeiral F, Martinez M, Abreu R (2019) Empirical review of java program repair tools: A large-scale experiment on 2,141 bugs and 23,551 repair attempts, in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, pp. 302–313
Durieux T, Monperrus M (2016) Dynamoth: Dynamic Code Synthesis for Automatic Program Repair, in International Workshop on Automation of Software Test. ACM, pp 85–91
Durieux T, Monperrus M (2016) Introclassjava: A Benchmark of 297 Small and Buggy Java Programs, University of Lille, University of Lille, Tech. Rep #hal-01272126
Eisenstadt M (1997) My hairiest bug war stories. Commun ACM 40(4):30–37
Article Google Scholar
El-Wakil M, El-Bastawisi A, Boshra M, Fahmy A (2004) Object-oriented design quality models a survey and comparison. In: 2nd International Conference on Informatics and Systems, pp 1–11
Gazzola L, Micucci D, Mariani L (2019) Automatic Software Repair: A Survey. IEEE Transactions on Software Engineering 45(1):34–67
Article Google Scholar
Ginelli D, Martinez M, Mariani L, Monperrus M (2020) A comprehensive study of code-removal patches in automated program repair,” arXiv, Tech. Rep. 2012.06264, [Online]. Available: 2012.06264
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. Journal of machine learning research 3(Mar):1157–1182
MATH Google Scholar
Harris M (2016) Google reports self-driving car mistakes: 272 failures and 13 near misses, [Online; accessed 10-December-2018]. [Online]. Available: https://www.theguardian.com/technology/2016/jan/12/google-self-driving-cars-mistakes-data-reports
Jiang J, Xiong Y, Zhang H, Gao Q, Chen X (2018) Shaping Program Repair Space with Existing Patches and Similar Code, in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’18). ACM, pp 298–309
Jolliffe I (2011) Principal component analysis. Springer
Just R, Jalali D, Ernst MD (2014) Defects4j: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs, in Proceedings of the 23rd International Symposium on Software Testing and Analysis. ACM, pp. 437–440
Kaner C, Bach J, Pettichord B (2008) Lessons learned in software testing. John Wiley & Sons
Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: International Conference on Software Engineering. IEEE Press, pp 802–811
Koyuncu A, Liu K, Bissyandé T, Kim D, Klein J, Monperrus M, Le Traon Y (2020) Fixminer: Mining relevant fix patterns for automated program repair. Empir Softw Eng 25(3):1980–2024. [Online]. Available: https://doi.org/10.1007/s10664-019-09780-z
Article Google Scholar
Le X-BD, Bao L, Lo D, Xia X, Li S, Pasareanu C (2019) On reliability of patch correctness assessment. In: Proceedings of the 41st International Conference on Software engineering, ser. ICSE ’19. IEEE Press, p. 524–535. [Online]. Available: https://doi.org/10.1109/ICSE.2019.00064
Le Goues C, Nguyen T, Forrest S, Weimer W (2012a) Genprog: a generic method for automatic software repair. Software Engineering, IEEE Transactions on 38(1):54–72
Article Google Scholar
Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012b) A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In: International Conference on Software engineering ser. ICSE IEEE Press. pp. 3–13
Le Goues C, Forrest S, Weimer W (2013) Current challenges in automatic software repair. Softw Qual J 21(3):421–443
Article Google Scholar
Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs. IEEE Trans Softw Eng 41(12):1236–1256
Article Google Scholar
Le XD, Le TB, Lo D (2015) Should fixing these failures be delegated to automated program repair?. In: 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE), pp 427–437
Lin D, Koppel J, Chen A, Solar-Lezama A (2017) Quixbugs: A Multi-Lingual Program Repair Benchmark Set Based on the Quixey Challenge, in ACM SIGPLAN International Conference on Systems, Programming, Languages, and applications: Software for Humanity. ACM, pp 55–56
Lin B, Wang S, Wen M, Zhang Z, Wu H, Qin Y, Mao X (2020) Understanding the non-repairability factors of automated program repair techniques, p 10
Liu K, Koyuncu A, Bissyandé TF, Kim D, Klein J, Le Traon Y (2019) You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems. In: 2019 12th IEEE Conference on Software Testing Validation and Verification (ICST), pp 102–113
Liu K, Koyuncu A, Kim D, Bissyandé TF (2019) AVATAR: Fixing semantic bugs with fix patterns of static analysis violations,” in Proceedings of the 26th. IEEE International Conference on Software Analysis, Evolution, and Reengineering. IEEE, pp 456–467
Liu K, Koyuncu A, Kim D, Bissyandé TF (2019) Tbar: Revisiting template-based automated program repair. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2019. New York, NY, USA: Association for Computing Machinery, p. 31–42. [Online]. Available: https://doi.org/10.1145/3293882.3330577
Liu K, Wang S, Koyuncu A, Kim K, Bissyandé TF, Kim D, Wu P, Klein J, Mao X, Traon YL (2020) On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for java programs. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20. New York, NY, USA: Association for Computing Machinery, p. 615–627. [Online]. Available: https://doi.org/10.1145/3377811.3380338
Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. New York, NY, USA: Association for Computing Machinery, pp. 166–178. [Online]. Available: https://doi.org/10.1145/2786805.2786811
Long F, Rinard M (2016) An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th International Conference on Software Engineering, ser. ICSE ’16. New York, NY, USA: Association for Computing Machinery, p. 702–713. [Online]. Available: https://doi.org/10.1145/2884781.2884872
Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’16. New York, NY, USA: Association for Computing Machinery, p. 298–312. [Online]. Available: https://doi.org/10.1145/2837614.2837617
Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In: Proceedings of the 26th, IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19), pp 468–478. Hangzhou, China: IEEE
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern recognition 45(9):3084–3104
Article Google Scholar
Mark Harman PO (2018) From start-ups to scale-ups: Opportunities and open problems for static and dynamic program analysis. In: IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 1–23
Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic Repair of Real Bugs in Java:, A Large-scale Experiment on the Defects4J Dataset. Empir Softw Eng 22(4):1936–1964
Article Google Scholar
Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset. Empir Softw Eng 22(4):1936–1964
Article Google Scholar
Martinez M, Monperrus M (2016) ASTOR: A Program Repair Library For Java. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, Demonstration Track. ACM, pp 441–444
Martinez M, Monperrus M (2018) Ultra-Large Repair Search Space with Automatically Mined Templates: the Cardumen Mode of Astor. In: Colanzi TE, McMinn P (eds) International Symposium on Search-Based Software Engineering. Lecture Notes in Computer Science, vol 11036, Springer, pp. 65–86
Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20 (1):176–205
Article Google Scholar
Martinez M, Monperrus M (2019) Coming: A tool for mining change pattern instances from git commits,” in Proceedings of the 41st International Conference on Software Engineering: Companion proceedings, ser. ICSE ’19. IEEE Press, p. 79–82. [Online]. Available: https://doi.org/10.1109/ICSE-Companion.2019.00043
Monperrus M (2018) Automatic Software Repair: a Bibliography. ACM Comput Surv 51(1):17:1–17:24. [Online]. Available: https://doi.org/10.1145/3105906
Article Google Scholar
Monperrus M, Urli S, Durieux T, Martinez M, Baudry B, Seinturier L (2019) Repairnator patches programs automatically. Ubiquity, vol. 2019
Motwani M, Sankaranarayanan S, Just R, Brun Y (2018) Do automated program repair techniques repair hard and important bugs? Empir Softw Eng 23(5):2901–2947
Article Google Scholar
Muñoz MA, Villanova L, Baatar D, Smith-Miles K (2018) Instance spaces for machine learning classification. Mach Learn 107(1):109–147
Article MathSciNet Google Scholar
Oliveira C, Aleti A, Grunske L, Smith-Miles K (2018) Mapping the effectiveness of automated test suite generation techniques. IEEE Trans Reliab 67(3):771–785
Article Google Scholar
Oliveira C, Aleti A, Li Y-F, Abdelrazek M (2019) Footprints of fitness functions in search-based software testing. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1399–1407
Prabhu Y, Varma M (2014) Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 263–272
Qi Z, Long F, Achour S, Rinard M (2015) An Analysis of Patch Plausibility and Correctness for Generate-and-Validate Patch Generation Systems. In: Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’15),. ACM, vol 2015, pp 24–36
Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The Strength of Random Search on Automated Program Repair. In: Proceedings of the 36th International Conference on Software Engineering. ACM, pp 254–265
Quinlan JR (1996) Learning decision tree classifiers. ACM Computing Surveys (CSUR) 28(1):71–72
Article Google Scholar
Rice JR, et al. (1976) The algorithm selection problem. Advances in computers 15(65-118):5
Google Scholar
Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW (1990) The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE transactions on neural networks 1(4):296–298
Article Google Scholar
Saha RK, Lyu Y, Lam W, Yoshida H, Prasad MR (2018) Bugs.jar: A Large-scale, Diverse Dataset of Real-world Java Bugs, in International Conference on Mining Software Repositories. ACM, pp 10–13
Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair, in Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’15). ACM, pp 532–543
Smith-Miles K, Baatar D, Wreford B, Lewis R (2014) Towards objective measures of algorithm performance across instance space. Computers & Operations Research 45:12–24
Article MathSciNet Google Scholar
Smith-Miles K, Tan TT (2012) Measuring algorithm footprints in instance space. In: 2012 IEEE Congress on Evolutionary Computation. IEEE, pp. 1–8
Sobreira V, Durieux T, Madeiral F, Monperrus M, Maia MA (2018) Dissection of a Bug dataset: Anatomy of 395 Patches from Defects4J. In: Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’18),. Campobasso, Italy: IEEE, pp 130–140
Software RW (2013) University of Cambridge Study: Failure to Adopt Reverse Debugging Costs Global Economy $ 41 Billion Annually, [Online; accessed 10-December-2018]. [Online]. Available: . https://www.roguewave.com/company/news/2013/university-of-cambridge-reverse-debugging-study
Tan SH, Yoshida H, Prasad MR, Roychoudhury A (2016) Anti-patterns in search-based program repair. In: ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 727–738
Tian H, Liu K, Kaboré AK, Koyuncu A, Li L, Klein J, Bissyandé TF (2020) Evaluating representation learning of code changes for predicting patch correctness in program repair, Inproceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering ACM
Vapnik VN (1995) The nature of statistical learning theory. Berlin heidelberg: Springer-Verlag
Wang S, Wen M, Chen L, Yi X, Mao X (2019) How different is it between machine-generated and developer-provided patches? an empirical study on the correct patches generated by automated program repair techniques
Wang S, Wen M, Lin B, Wu H, Qin Y, Zou D, Mao X, Jin H (2020) Automated patch correctness assessment: How far are we?. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 968–980
Wen M, Chen J, Wu R, Hao D, Cheung S-C (2018) Context-Aware Patch generation for better automated program repair. In: International conference on software engineering. ACM, pp 1–11
Xin Q, Reiss S (2017) Identifying test-suite-overfitted patches through test case generation. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2017. New York, NY, USA: Association for Computing Machinery, p. 226–236. [Online]. Available: https://doi.org/10.1145/3092703.3092718
Xiong Y, Liu X, Zeng M, Zhang L, Huang G (2018) Identifying patch correctness in test-based program repair. In: Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. New York, NY, USA: Association for Computing Machinery, p. 789–799. [Online]. Available: https://doi.org/10.1145/3180155.3180182
Xiong Y, Wang J, Yan R, Zhang J, Han S, Huang G, Zhang L (2017) Precise Condition Synthesis for Program Repair, in Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, pp 416–426
Xuan J, Martinez M, Demarco F, Clement M, Marcote SRL, Durieux T, Berre DL, Monperrus M (2017) Nopol: Automatic repair of conditional statement bugs in java programs. IEEE, Transactions Software Engineering 43(1):34–55
Article Google Scholar
Ye H, Martinez M, Durieux T, Monperrus M (2019) A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark, in International Workshop on Intelligent Bug Fixing (co-located with SANER). IEEE, pp 1–10
Ye H, Gu J, Martinez M, Durieux T, Monperrus M (2019) Automated classification of overfitting patches with statically extracted code features. arXiv, Tech. Rep. 1910.12057, [Online]. Available: http://arxiv.org/pdf/1910.12057
Yu Z, Martinez M, Bissyandé TF, Monperrus M (2019) Learning the relation between code features and code transforms with structured prediction,” arXiv, Tech. Rep. 1907.09282, [Online]. Available: 1907.09282
Yu Z, Martinez M, Danglot B, Durieux T, Monperrus M (2019) Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the nopol repair system. Empir Softw Eng 24(1):33–67
Article Google Scholar
Yuan Y, Banzhaf W (2018) ARJA: Automated Repair Of Java Programs via Multi-Objective Genetic Programming, IEEE Transactions on Software Engineering, vol PP

Download references

Acknowledgements

The authors would like to acknowledge Prof. Kate Smith-Miles and her team working on matilda.unimelb.edu.au. The methodology on Instance Space Analysis constitutes the foundations of this work. Matilda was used to create the instance spaces presented in Figures 2 and 3.

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, Melbourne, Australia
Aldeida Aleti
Université Polytechnique Hauts-de-France, Valenciennes, France
Matias Martinez

Authors

Aldeida Aleti
View author publications
You can also search for this author in PubMed Google Scholar
Matias Martinez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aldeida Aleti.

Additional information

Communicated by: Christoph Treude

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aleti, A., Martinez, M. E-APR: Mapping the effectiveness of automated program repair techniques. Empir Software Eng 26, 99 (2021). https://doi.org/10.1007/s10664-021-09989-x

Download citation

Accepted: 27 May 2021
Published: 13 July 2021
DOI: https://doi.org/10.1007/s10664-021-09989-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

E-APR: Mapping the effectiveness of automated program repair techniques

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

Nature-inspired metaheuristic methods in software testing

Software defect prediction: future directions and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

E-APR: Mapping the effectiveness of automated program repair techniques

Abstract

Access this article

Similar content being viewed by others

Large Language Model Assisted Software Engineering: Prospects, Challenges, and a Case Study

Nature-inspired metaheuristic methods in software testing

Software defect prediction: future directions and challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation