Skip to main content
Log in

E-APR: Mapping the effectiveness of automated program repair techniques

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

Automated Program Repair (APR) is a fast growing area with numerous new techniques being developed to tackle one of the most challenging software engineering problems. APR techniques have shown promising results, giving us hope that one day it will be possible for software to repair itself. In this paper, we focus on the problem of objective performance evaluation of APR techniques. We introduce a new approach, Explaining Automated Program Repair (E-APR), which identifies features of buggy programs that explain why a particular instance is difficult for an APR technique. E-APR is used to examine the diversity and quality of the buggy programs used by most researchers, and analyse the strengths and weaknesses of existing APR techniques. E-APR visualises an instance space of buggy programs, with each buggy program represented as a point in the space. The instance space is constructed to reveal areas of hard and easy buggy programs, and enables the strengths and weaknesses of APR techniques to be identified.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Listing 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abreu R, Zoeteweij P, van Gemund AJC (2007) On the accuracy of spectrum-based fault localization. In: Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007), pp. 89–98

  • Aleti A, Moser I, Meedeniya I, Grunske L (2014) Choosing the appropriate forecasting model for predictive parameter control. Evolutionary computation 22(2):319–349

    Article  Google Scholar 

  • Anand S, Burke EK, Chen TY, Clark J, Cohen MB, Grieskamp W, Harman M, Harrold MJ, Mcminn P (2013) An orchestrated survey of methodologies for automated software test case generation. Journal of Systems Software 86(8):1978–2001

    Article  Google Scholar 

  • Appendix (2020) Appendix e-apr. https://github.com/UPHF/eapr

  • Bengio Y, Chapados N (2003) Extensions to metric-based model selection. J Mach Learn Res 3(Mar):1209–1227

    MATH  Google Scholar 

  • Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ser. COLT ’92. New York, NY, USA: Association for Computing Machinery, p. 144–152. [Online]. Available: https://doi.org/10.1145/130385.130401

  • Campos J, Riboira A, Perez A, Abreu R (2012) Gzoltar: an eclipse plug-in for testing and debugging. In: 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp 378–381

  • Charette RN (2009) This Car Runs on Code. [Online; accessed 10-December-2018]. [Online]. Available: https://spectrum.ieee.org/transportation/systems/this-car-runs-on-code

  • Charte F, Rivera AJ, del Jesus MJ, Herrera F (2015) Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397

    Article  Google Scholar 

  • Chidamber S, Kemerer C (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493

    Article  Google Scholar 

  • Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. Software Engineering, IEEE Transactions on 20(6):476–493

    Article  Google Scholar 

  • Durieux T, Cornu B, Seinturier L, Monperrus M (2017) Dynamic Patch Generation for Null Pointer Exceptions Using Metaprogramming. In: Proceedings of the 24th IEEE International Conference on Software Analysis. Evolution and reengineering (SANER ’17). IEEE, pp 349–358

  • Durieux T, Madeiral F, Martinez M, Abreu R (2019) Empirical review of java program repair tools: A large-scale experiment on 2,141 bugs and 23,551 repair attempts, in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, pp. 302–313

  • Durieux T, Monperrus M (2016) Dynamoth: Dynamic Code Synthesis for Automatic Program Repair, in International Workshop on Automation of Software Test. ACM, pp 85–91

  • Durieux T, Monperrus M (2016) Introclassjava: A Benchmark of 297 Small and Buggy Java Programs, University of Lille, University of Lille, Tech. Rep #hal-01272126

  • Eisenstadt M (1997) My hairiest bug war stories. Commun ACM 40(4):30–37

    Article  Google Scholar 

  • El-Wakil M, El-Bastawisi A, Boshra M, Fahmy A (2004) Object-oriented design quality models a survey and comparison. In: 2nd International Conference on Informatics and Systems, pp 1–11

  • Gazzola L, Micucci D, Mariani L (2019) Automatic Software Repair: A Survey. IEEE Transactions on Software Engineering 45(1):34–67

    Article  Google Scholar 

  • Ginelli D, Martinez M, Mariani L, Monperrus M (2020) A comprehensive study of code-removal patches in automated program repair,” arXiv, Tech. Rep. 2012.06264, [Online]. Available: 2012.06264

  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. Journal of machine learning research 3(Mar):1157–1182

    MATH  Google Scholar 

  • Harris M (2016) Google reports self-driving car mistakes: 272 failures and 13 near misses, [Online; accessed 10-December-2018]. [Online]. Available: https://www.theguardian.com/technology/2016/jan/12/google-self-driving-cars-mistakes-data-reports

  • Jiang J, Xiong Y, Zhang H, Gao Q, Chen X (2018) Shaping Program Repair Space with Existing Patches and Similar Code, in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’18). ACM, pp 298–309

  • Jolliffe I (2011) Principal component analysis. Springer

  • Just R, Jalali D, Ernst MD (2014) Defects4j: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs, in Proceedings of the 23rd International Symposium on Software Testing and Analysis. ACM, pp. 437–440

  • Kaner C, Bach J, Pettichord B (2008) Lessons learned in software testing. John Wiley & Sons

  • Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: International Conference on Software Engineering. IEEE Press, pp 802–811

  • Koyuncu A, Liu K, Bissyandé T, Kim D, Klein J, Monperrus M, Le Traon Y (2020) Fixminer: Mining relevant fix patterns for automated program repair. Empir Softw Eng 25(3):1980–2024. [Online]. Available: https://doi.org/10.1007/s10664-019-09780-z

    Article  Google Scholar 

  • Le X-BD, Bao L, Lo D, Xia X, Li S, Pasareanu C (2019) On reliability of patch correctness assessment. In: Proceedings of the 41st International Conference on Software engineering, ser. ICSE ’19. IEEE Press, p. 524–535. [Online]. Available: https://doi.org/10.1109/ICSE.2019.00064

  • Le Goues C, Nguyen T, Forrest S, Weimer W (2012a) Genprog: a generic method for automatic software repair. Software Engineering, IEEE Transactions on 38(1):54–72

    Article  Google Scholar 

  • Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012b) A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In: International Conference on Software engineering ser. ICSE IEEE Press. pp. 3–13

  • Le Goues C, Forrest S, Weimer W (2013) Current challenges in automatic software repair. Softw Qual J 21(3):421–443

    Article  Google Scholar 

  • Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs. IEEE Trans Softw Eng 41(12):1236–1256

    Article  Google Scholar 

  • Le XD, Le TB, Lo D (2015) Should fixing these failures be delegated to automated program repair?. In: 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE), pp 427–437

  • Lin D, Koppel J, Chen A, Solar-Lezama A (2017) Quixbugs: A Multi-Lingual Program Repair Benchmark Set Based on the Quixey Challenge, in ACM SIGPLAN International Conference on Systems, Programming, Languages, and applications: Software for Humanity. ACM, pp 55–56

  • Lin B, Wang S, Wen M, Zhang Z, Wu H, Qin Y, Mao X (2020) Understanding the non-repairability factors of automated program repair techniques, p 10

  • Liu K, Koyuncu A, Bissyandé TF, Kim D, Klein J, Le Traon Y (2019) You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems. In: 2019 12th IEEE Conference on Software Testing Validation and Verification (ICST), pp 102–113

  • Liu K, Koyuncu A, Kim D, Bissyandé TF (2019) AVATAR: Fixing semantic bugs with fix patterns of static analysis violations,” in Proceedings of the 26th. IEEE International Conference on Software Analysis, Evolution, and Reengineering. IEEE, pp 456–467

  • Liu K, Koyuncu A, Kim D, Bissyandé TF (2019) Tbar: Revisiting template-based automated program repair. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2019. New York, NY, USA: Association for Computing Machinery, p. 31–42. [Online]. Available: https://doi.org/10.1145/3293882.3330577

  • Liu K, Wang S, Koyuncu A, Kim K, Bissyandé TF, Kim D, Wu P, Klein J, Mao X, Traon YL (2020) On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for java programs. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20. New York, NY, USA: Association for Computing Machinery, p. 615–627. [Online]. Available: https://doi.org/10.1145/3377811.3380338

  • Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. New York, NY, USA: Association for Computing Machinery, pp. 166–178. [Online]. Available: https://doi.org/10.1145/2786805.2786811

  • Long F, Rinard M (2016) An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th International Conference on Software Engineering, ser. ICSE ’16. New York, NY, USA: Association for Computing Machinery, p. 702–713. [Online]. Available: https://doi.org/10.1145/2884781.2884872

  • Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’16. New York, NY, USA: Association for Computing Machinery, p. 298–312. [Online]. Available: https://doi.org/10.1145/2837614.2837617

  • Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In: Proceedings of the 26th, IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19), pp 468–478. Hangzhou, China: IEEE

  • Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern recognition 45(9):3084–3104

    Article  Google Scholar 

  • Mark Harman PO (2018) From start-ups to scale-ups: Opportunities and open problems for static and dynamic program analysis. In: IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 1–23

  • Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic Repair of Real Bugs in Java:, A Large-scale Experiment on the Defects4J Dataset. Empir Softw Eng 22(4):1936–1964

    Article  Google Scholar 

  • Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset. Empir Softw Eng 22(4):1936–1964

    Article  Google Scholar 

  • Martinez M, Monperrus M (2016) ASTOR: A Program Repair Library For Java. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, Demonstration Track. ACM, pp 441–444

  • Martinez M, Monperrus M (2018) Ultra-Large Repair Search Space with Automatically Mined Templates: the Cardumen Mode of Astor. In: Colanzi TE, McMinn P (eds) International Symposium on Search-Based Software Engineering. Lecture Notes in Computer Science, vol 11036, Springer, pp. 65–86

  • Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20 (1):176–205

    Article  Google Scholar 

  • Martinez M, Monperrus M (2019) Coming: A tool for mining change pattern instances from git commits,” in Proceedings of the 41st International Conference on Software Engineering: Companion proceedings, ser. ICSE ’19. IEEE Press, p. 79–82. [Online]. Available: https://doi.org/10.1109/ICSE-Companion.2019.00043

  • Monperrus M (2018) Automatic Software Repair: a Bibliography. ACM Comput Surv 51(1):17:1–17:24. [Online]. Available: https://doi.org/10.1145/3105906

    Article  Google Scholar 

  • Monperrus M, Urli S, Durieux T, Martinez M, Baudry B, Seinturier L (2019) Repairnator patches programs automatically. Ubiquity, vol. 2019

  • Motwani M, Sankaranarayanan S, Just R, Brun Y (2018) Do automated program repair techniques repair hard and important bugs? Empir Softw Eng 23(5):2901–2947

    Article  Google Scholar 

  • Muñoz MA, Villanova L, Baatar D, Smith-Miles K (2018) Instance spaces for machine learning classification. Mach Learn 107(1):109–147

    Article  MathSciNet  Google Scholar 

  • Oliveira C, Aleti A, Grunske L, Smith-Miles K (2018) Mapping the effectiveness of automated test suite generation techniques. IEEE Trans Reliab 67(3):771–785

    Article  Google Scholar 

  • Oliveira C, Aleti A, Li Y-F, Abdelrazek M (2019) Footprints of fitness functions in search-based software testing. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1399–1407

  • Prabhu Y, Varma M (2014) Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 263–272

  • Qi Z, Long F, Achour S, Rinard M (2015) An Analysis of Patch Plausibility and Correctness for Generate-and-Validate Patch Generation Systems. In: Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’15),. ACM, vol 2015, pp 24–36

  • Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The Strength of Random Search on Automated Program Repair. In: Proceedings of the 36th International Conference on Software Engineering. ACM, pp 254–265

  • Quinlan JR (1996) Learning decision tree classifiers. ACM Computing Surveys (CSUR) 28(1):71–72

    Article  Google Scholar 

  • Rice JR, et al. (1976) The algorithm selection problem. Advances in computers 15(65-118):5

    Google Scholar 

  • Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW (1990) The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE transactions on neural networks 1(4):296–298

    Article  Google Scholar 

  • Saha RK, Lyu Y, Lam W, Yoshida H, Prasad MR (2018) Bugs.jar: A Large-scale, Diverse Dataset of Real-world Java Bugs, in International Conference on Mining Software Repositories. ACM, pp 10–13

  • Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair, in Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’15). ACM, pp 532–543

  • Smith-Miles K, Baatar D, Wreford B, Lewis R (2014) Towards objective measures of algorithm performance across instance space. Computers & Operations Research 45:12–24

    Article  MathSciNet  Google Scholar 

  • Smith-Miles K, Tan TT (2012) Measuring algorithm footprints in instance space. In: 2012 IEEE Congress on Evolutionary Computation. IEEE, pp. 1–8

  • Sobreira V, Durieux T, Madeiral F, Monperrus M, Maia MA (2018) Dissection of a Bug dataset: Anatomy of 395 Patches from Defects4J. In: Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’18),. Campobasso, Italy: IEEE, pp 130–140

  • Software RW (2013) University of Cambridge Study: Failure to Adopt Reverse Debugging Costs Global Economy $ 41 Billion Annually, [Online; accessed 10-December-2018]. [Online]. Available: . https://www.roguewave.com/company/news/2013/university-of-cambridge-reverse-debugging-study

  • Tan SH, Yoshida H, Prasad MR, Roychoudhury A (2016) Anti-patterns in search-based program repair. In: ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 727–738

  • Tian H, Liu K, Kaboré AK, Koyuncu A, Li L, Klein J, Bissyandé TF (2020) Evaluating representation learning of code changes for predicting patch correctness in program repair, Inproceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering ACM

  • Vapnik VN (1995) The nature of statistical learning theory. Berlin heidelberg: Springer-Verlag

  • Wang S, Wen M, Chen L, Yi X, Mao X (2019) How different is it between machine-generated and developer-provided patches? an empirical study on the correct patches generated by automated program repair techniques

  • Wang S, Wen M, Lin B, Wu H, Qin Y, Zou D, Mao X, Jin H (2020) Automated patch correctness assessment: How far are we?. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 968–980

  • Wen M, Chen J, Wu R, Hao D, Cheung S-C (2018) Context-Aware Patch generation for better automated program repair. In: International conference on software engineering. ACM, pp 1–11

  • Xin Q, Reiss S (2017) Identifying test-suite-overfitted patches through test case generation. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2017. New York, NY, USA: Association for Computing Machinery, p. 226–236. [Online]. Available: https://doi.org/10.1145/3092703.3092718

  • Xiong Y, Liu X, Zeng M, Zhang L, Huang G (2018) Identifying patch correctness in test-based program repair. In: Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. New York, NY, USA: Association for Computing Machinery, p. 789–799. [Online]. Available: https://doi.org/10.1145/3180155.3180182

  • Xiong Y, Wang J, Yan R, Zhang J, Han S, Huang G, Zhang L (2017) Precise Condition Synthesis for Program Repair, in Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, pp 416–426

  • Xuan J, Martinez M, Demarco F, Clement M, Marcote SRL, Durieux T, Berre DL, Monperrus M (2017) Nopol: Automatic repair of conditional statement bugs in java programs. IEEE, Transactions Software Engineering 43(1):34–55

    Article  Google Scholar 

  • Ye H, Martinez M, Durieux T, Monperrus M (2019) A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark, in International Workshop on Intelligent Bug Fixing (co-located with SANER). IEEE, pp 1–10

  • Ye H, Gu J, Martinez M, Durieux T, Monperrus M (2019) Automated classification of overfitting patches with statically extracted code features. arXiv, Tech. Rep. 1910.12057, [Online]. Available: http://arxiv.org/pdf/1910.12057

  • Yu Z, Martinez M, Bissyandé TF, Monperrus M (2019) Learning the relation between code features and code transforms with structured prediction,” arXiv, Tech. Rep. 1907.09282, [Online]. Available: 1907.09282

  • Yu Z, Martinez M, Danglot B, Durieux T, Monperrus M (2019) Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the nopol repair system. Empir Softw Eng 24(1):33–67

    Article  Google Scholar 

  • Yuan Y, Banzhaf W (2018) ARJA: Automated Repair Of Java Programs via Multi-Objective Genetic Programming, IEEE Transactions on Software Engineering, vol PP

Download references

Acknowledgements

The authors would like to acknowledge Prof. Kate Smith-Miles and her team working on matilda.unimelb.edu.au. The methodology on Instance Space Analysis constitutes the foundations of this work. Matilda was used to create the instance spaces presented in Figures 2 and 3.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aldeida Aleti.

Additional information

Communicated by: Christoph Treude

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Aleti, A., Martinez, M. E-APR: Mapping the effectiveness of automated program repair techniques. Empir Software Eng 26, 99 (2021). https://doi.org/10.1007/s10664-021-09989-x

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10664-021-09989-x

Keywords

Navigation