Abstract
Automated Program Repair (APR) is a fast growing area with numerous new techniques being developed to tackle one of the most challenging software engineering problems. APR techniques have shown promising results, giving us hope that one day it will be possible for software to repair itself. In this paper, we focus on the problem of objective performance evaluation of APR techniques. We introduce a new approach, Explaining Automated Program Repair (E-APR), which identifies features of buggy programs that explain why a particular instance is difficult for an APR technique. E-APR is used to examine the diversity and quality of the buggy programs used by most researchers, and analyse the strengths and weaknesses of existing APR techniques. E-APR visualises an instance space of buggy programs, with each buggy program represented as a point in the space. The instance space is constructed to reveal areas of hard and easy buggy programs, and enables the strengths and weaknesses of APR techniques to be identified.
Similar content being viewed by others
References
Abreu R, Zoeteweij P, van Gemund AJC (2007) On the accuracy of spectrum-based fault localization. In: Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION (TAICPART-MUTATION 2007), pp. 89–98
Aleti A, Moser I, Meedeniya I, Grunske L (2014) Choosing the appropriate forecasting model for predictive parameter control. Evolutionary computation 22(2):319–349
Anand S, Burke EK, Chen TY, Clark J, Cohen MB, Grieskamp W, Harman M, Harrold MJ, Mcminn P (2013) An orchestrated survey of methodologies for automated software test case generation. Journal of Systems Software 86(8):1978–2001
Appendix (2020) Appendix e-apr. https://github.com/UPHF/eapr
Bengio Y, Chapados N (2003) Extensions to metric-based model selection. J Mach Learn Res 3(Mar):1209–1227
Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, ser. COLT ’92. New York, NY, USA: Association for Computing Machinery, p. 144–152. [Online]. Available: https://doi.org/10.1145/130385.130401
Campos J, Riboira A, Perez A, Abreu R (2012) Gzoltar: an eclipse plug-in for testing and debugging. In: 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pp 378–381
Charette RN (2009) This Car Runs on Code. [Online; accessed 10-December-2018]. [Online]. Available: https://spectrum.ieee.org/transportation/systems/this-car-runs-on-code
Charte F, Rivera AJ, del Jesus MJ, Herrera F (2015) Mlsmote: approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397
Chidamber S, Kemerer C (1994) A metrics suite for object oriented design. IEEE Trans Softw Eng 20(6):476–493
Chidamber SR, Kemerer CF (1994) A metrics suite for object oriented design. Software Engineering, IEEE Transactions on 20(6):476–493
Durieux T, Cornu B, Seinturier L, Monperrus M (2017) Dynamic Patch Generation for Null Pointer Exceptions Using Metaprogramming. In: Proceedings of the 24th IEEE International Conference on Software Analysis. Evolution and reengineering (SANER ’17). IEEE, pp 349–358
Durieux T, Madeiral F, Martinez M, Abreu R (2019) Empirical review of java program repair tools: A large-scale experiment on 2,141 bugs and 23,551 repair attempts, in Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. ACM, pp. 302–313
Durieux T, Monperrus M (2016) Dynamoth: Dynamic Code Synthesis for Automatic Program Repair, in International Workshop on Automation of Software Test. ACM, pp 85–91
Durieux T, Monperrus M (2016) Introclassjava: A Benchmark of 297 Small and Buggy Java Programs, University of Lille, University of Lille, Tech. Rep #hal-01272126
Eisenstadt M (1997) My hairiest bug war stories. Commun ACM 40(4):30–37
El-Wakil M, El-Bastawisi A, Boshra M, Fahmy A (2004) Object-oriented design quality models a survey and comparison. In: 2nd International Conference on Informatics and Systems, pp 1–11
Gazzola L, Micucci D, Mariani L (2019) Automatic Software Repair: A Survey. IEEE Transactions on Software Engineering 45(1):34–67
Ginelli D, Martinez M, Mariani L, Monperrus M (2020) A comprehensive study of code-removal patches in automated program repair,” arXiv, Tech. Rep. 2012.06264, [Online]. Available: 2012.06264
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. Journal of machine learning research 3(Mar):1157–1182
Harris M (2016) Google reports self-driving car mistakes: 272 failures and 13 near misses, [Online; accessed 10-December-2018]. [Online]. Available: https://www.theguardian.com/technology/2016/jan/12/google-self-driving-cars-mistakes-data-reports
Jiang J, Xiong Y, Zhang H, Gao Q, Chen X (2018) Shaping Program Repair Space with Existing Patches and Similar Code, in Proceedings of the 27th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’18). ACM, pp 298–309
Jolliffe I (2011) Principal component analysis. Springer
Just R, Jalali D, Ernst MD (2014) Defects4j: A Database of Existing Faults to Enable Controlled Testing Studies for Java Programs, in Proceedings of the 23rd International Symposium on Software Testing and Analysis. ACM, pp. 437–440
Kaner C, Bach J, Pettichord B (2008) Lessons learned in software testing. John Wiley & Sons
Kim D, Nam J, Song J, Kim S (2013) Automatic patch generation learned from human-written patches. In: International Conference on Software Engineering. IEEE Press, pp 802–811
Koyuncu A, Liu K, Bissyandé T, Kim D, Klein J, Monperrus M, Le Traon Y (2020) Fixminer: Mining relevant fix patterns for automated program repair. Empir Softw Eng 25(3):1980–2024. [Online]. Available: https://doi.org/10.1007/s10664-019-09780-z
Le X-BD, Bao L, Lo D, Xia X, Li S, Pasareanu C (2019) On reliability of patch correctness assessment. In: Proceedings of the 41st International Conference on Software engineering, ser. ICSE ’19. IEEE Press, p. 524–535. [Online]. Available: https://doi.org/10.1109/ICSE.2019.00064
Le Goues C, Nguyen T, Forrest S, Weimer W (2012a) Genprog: a generic method for automatic software repair. Software Engineering, IEEE Transactions on 38(1):54–72
Le Goues C, Dewey-Vogt M, Forrest S, Weimer W (2012b) A systematic study of automated program repair: Fixing 55 out of 105 bugs for $8 each. In: International Conference on Software engineering ser. ICSE IEEE Press. pp. 3–13
Le Goues C, Forrest S, Weimer W (2013) Current challenges in automatic software repair. Softw Qual J 21(3):421–443
Le Goues C, Holtschulte N, Smith EK, Brun Y, Devanbu P, Forrest S, Weimer W (2015) The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs. IEEE Trans Softw Eng 41(12):1236–1256
Le XD, Le TB, Lo D (2015) Should fixing these failures be delegated to automated program repair?. In: 2015 IEEE 26th International Symposium on Software Reliability Engineering (ISSRE), pp 427–437
Lin D, Koppel J, Chen A, Solar-Lezama A (2017) Quixbugs: A Multi-Lingual Program Repair Benchmark Set Based on the Quixey Challenge, in ACM SIGPLAN International Conference on Systems, Programming, Languages, and applications: Software for Humanity. ACM, pp 55–56
Lin B, Wang S, Wen M, Zhang Z, Wu H, Qin Y, Mao X (2020) Understanding the non-repairability factors of automated program repair techniques, p 10
Liu K, Koyuncu A, Bissyandé TF, Kim D, Klein J, Le Traon Y (2019) You cannot fix what you cannot find! an investigation of fault localization bias in benchmarking automated program repair systems. In: 2019 12th IEEE Conference on Software Testing Validation and Verification (ICST), pp 102–113
Liu K, Koyuncu A, Kim D, Bissyandé TF (2019) AVATAR: Fixing semantic bugs with fix patterns of static analysis violations,” in Proceedings of the 26th. IEEE International Conference on Software Analysis, Evolution, and Reengineering. IEEE, pp 456–467
Liu K, Koyuncu A, Kim D, Bissyandé TF (2019) Tbar: Revisiting template-based automated program repair. In: Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2019. New York, NY, USA: Association for Computing Machinery, p. 31–42. [Online]. Available: https://doi.org/10.1145/3293882.3330577
Liu K, Wang S, Koyuncu A, Kim K, Bissyandé TF, Kim D, Wu P, Klein J, Mao X, Traon YL (2020) On the efficiency of test suite based program repair: A systematic assessment of 16 automated repair systems for java programs. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, ser. ICSE ’20. New York, NY, USA: Association for Computing Machinery, p. 615–627. [Online]. Available: https://doi.org/10.1145/3377811.3380338
Long F, Rinard M (2015) Staged program repair with condition synthesis. In: Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, ser. ESEC/FSE 2015. New York, NY, USA: Association for Computing Machinery, pp. 166–178. [Online]. Available: https://doi.org/10.1145/2786805.2786811
Long F, Rinard M (2016) An analysis of the search spaces for generate and validate patch generation systems. In: Proceedings of the 38th International Conference on Software Engineering, ser. ICSE ’16. New York, NY, USA: Association for Computing Machinery, p. 702–713. [Online]. Available: https://doi.org/10.1145/2884781.2884872
Long F, Rinard M (2016) Automatic patch generation by learning correct code. In: Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’16. New York, NY, USA: Association for Computing Machinery, p. 298–312. [Online]. Available: https://doi.org/10.1145/2837614.2837617
Madeiral F, Urli S, Maia M, Monperrus M (2019) Bears: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In: Proceedings of the 26th, IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’19), pp 468–478. Hangzhou, China: IEEE
Madjarov G, Kocev D, Gjorgjevikj D, Džeroski S (2012) An extensive experimental comparison of methods for multi-label learning. Pattern recognition 45(9):3084–3104
Mark Harman PO (2018) From start-ups to scale-ups: Opportunities and open problems for static and dynamic program analysis. In: IEEE International Working Conference on Source Code Analysis and Manipulation, pp. 1–23
Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic Repair of Real Bugs in Java:, A Large-scale Experiment on the Defects4J Dataset. Empir Softw Eng 22(4):1936–1964
Martinez M, Durieux T, Sommerard R, Xuan J, Monperrus M (2017) Automatic repair of real bugs in java: a large-scale experiment on the defects4j dataset. Empir Softw Eng 22(4):1936–1964
Martinez M, Monperrus M (2016) ASTOR: A Program Repair Library For Java. In: Proceedings of the 25th International Symposium on Software Testing and Analysis, Demonstration Track. ACM, pp 441–444
Martinez M, Monperrus M (2018) Ultra-Large Repair Search Space with Automatically Mined Templates: the Cardumen Mode of Astor. In: Colanzi TE, McMinn P (eds) International Symposium on Search-Based Software Engineering. Lecture Notes in Computer Science, vol 11036, Springer, pp. 65–86
Martinez M, Monperrus M (2015) Mining software repair models for reasoning on the search space of automated program fixing. Empir Softw Eng 20 (1):176–205
Martinez M, Monperrus M (2019) Coming: A tool for mining change pattern instances from git commits,” in Proceedings of the 41st International Conference on Software Engineering: Companion proceedings, ser. ICSE ’19. IEEE Press, p. 79–82. [Online]. Available: https://doi.org/10.1109/ICSE-Companion.2019.00043
Monperrus M (2018) Automatic Software Repair: a Bibliography. ACM Comput Surv 51(1):17:1–17:24. [Online]. Available: https://doi.org/10.1145/3105906
Monperrus M, Urli S, Durieux T, Martinez M, Baudry B, Seinturier L (2019) Repairnator patches programs automatically. Ubiquity, vol. 2019
Motwani M, Sankaranarayanan S, Just R, Brun Y (2018) Do automated program repair techniques repair hard and important bugs? Empir Softw Eng 23(5):2901–2947
Muñoz MA, Villanova L, Baatar D, Smith-Miles K (2018) Instance spaces for machine learning classification. Mach Learn 107(1):109–147
Oliveira C, Aleti A, Grunske L, Smith-Miles K (2018) Mapping the effectiveness of automated test suite generation techniques. IEEE Trans Reliab 67(3):771–785
Oliveira C, Aleti A, Li Y-F, Abdelrazek M (2019) Footprints of fitness functions in search-based software testing. In: Proceedings of the Genetic and Evolutionary Computation Conference, pp. 1399–1407
Prabhu Y, Varma M (2014) Fastxml: A fast, accurate and stable tree-classifier for extreme multi-label learning. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 263–272
Qi Z, Long F, Achour S, Rinard M (2015) An Analysis of Patch Plausibility and Correctness for Generate-and-Validate Patch Generation Systems. In: Proceedings of the International Symposium on Software Testing and Analysis (ISSTA ’15),. ACM, vol 2015, pp 24–36
Qi Y, Mao X, Lei Y, Dai Z, Wang C (2014) The Strength of Random Search on Automated Program Repair. In: Proceedings of the 36th International Conference on Software Engineering. ACM, pp 254–265
Quinlan JR (1996) Learning decision tree classifiers. ACM Computing Surveys (CSUR) 28(1):71–72
Rice JR, et al. (1976) The algorithm selection problem. Advances in computers 15(65-118):5
Ruck DW, Rogers SK, Kabrisky M, Oxley ME, Suter BW (1990) The multilayer perceptron as an approximation to a bayes optimal discriminant function. IEEE transactions on neural networks 1(4):296–298
Saha RK, Lyu Y, Lam W, Yoshida H, Prasad MR (2018) Bugs.jar: A Large-scale, Diverse Dataset of Real-world Java Bugs, in International Conference on Mining Software Repositories. ACM, pp 10–13
Smith EK, Barr ET, Le Goues C, Brun Y (2015) Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair, in Proceedings of the 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE ’15). ACM, pp 532–543
Smith-Miles K, Baatar D, Wreford B, Lewis R (2014) Towards objective measures of algorithm performance across instance space. Computers & Operations Research 45:12–24
Smith-Miles K, Tan TT (2012) Measuring algorithm footprints in instance space. In: 2012 IEEE Congress on Evolutionary Computation. IEEE, pp. 1–8
Sobreira V, Durieux T, Madeiral F, Monperrus M, Maia MA (2018) Dissection of a Bug dataset: Anatomy of 395 Patches from Defects4J. In: Proceedings of the 25th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’18),. Campobasso, Italy: IEEE, pp 130–140
Software RW (2013) University of Cambridge Study: Failure to Adopt Reverse Debugging Costs Global Economy $ 41 Billion Annually, [Online; accessed 10-December-2018]. [Online]. Available: . https://www.roguewave.com/company/news/2013/university-of-cambridge-reverse-debugging-study
Tan SH, Yoshida H, Prasad MR, Roychoudhury A (2016) Anti-patterns in search-based program repair. In: ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp 727–738
Tian H, Liu K, Kaboré AK, Koyuncu A, Li L, Klein J, Bissyandé TF (2020) Evaluating representation learning of code changes for predicting patch correctness in program repair, Inproceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering ACM
Vapnik VN (1995) The nature of statistical learning theory. Berlin heidelberg: Springer-Verlag
Wang S, Wen M, Chen L, Yi X, Mao X (2019) How different is it between machine-generated and developer-provided patches? an empirical study on the correct patches generated by automated program repair techniques
Wang S, Wen M, Lin B, Wu H, Qin Y, Zou D, Mao X, Jin H (2020) Automated patch correctness assessment: How far are we?. 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), pp 968–980
Wen M, Chen J, Wu R, Hao D, Cheung S-C (2018) Context-Aware Patch generation for better automated program repair. In: International conference on software engineering. ACM, pp 1–11
Xin Q, Reiss S (2017) Identifying test-suite-overfitted patches through test case generation. In: Proceedings of the 26th ACM SIGSOFT International Symposium on Software Testing and Analysis, ser. ISSTA 2017. New York, NY, USA: Association for Computing Machinery, p. 226–236. [Online]. Available: https://doi.org/10.1145/3092703.3092718
Xiong Y, Liu X, Zeng M, Zhang L, Huang G (2018) Identifying patch correctness in test-based program repair. In: Proceedings of the 40th International Conference on Software Engineering, ser. ICSE ’18. New York, NY, USA: Association for Computing Machinery, p. 789–799. [Online]. Available: https://doi.org/10.1145/3180155.3180182
Xiong Y, Wang J, Yan R, Zhang J, Han S, Huang G, Zhang L (2017) Precise Condition Synthesis for Program Repair, in Proceedings of the 39th International Conference on Software Engineering (ICSE ’17). IEEE Press, pp 416–426
Xuan J, Martinez M, Demarco F, Clement M, Marcote SRL, Durieux T, Berre DL, Monperrus M (2017) Nopol: Automatic repair of conditional statement bugs in java programs. IEEE, Transactions Software Engineering 43(1):34–55
Ye H, Martinez M, Durieux T, Monperrus M (2019) A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark, in International Workshop on Intelligent Bug Fixing (co-located with SANER). IEEE, pp 1–10
Ye H, Gu J, Martinez M, Durieux T, Monperrus M (2019) Automated classification of overfitting patches with statically extracted code features. arXiv, Tech. Rep. 1910.12057, [Online]. Available: http://arxiv.org/pdf/1910.12057
Yu Z, Martinez M, Bissyandé TF, Monperrus M (2019) Learning the relation between code features and code transforms with structured prediction,” arXiv, Tech. Rep. 1907.09282, [Online]. Available: 1907.09282
Yu Z, Martinez M, Danglot B, Durieux T, Monperrus M (2019) Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the nopol repair system. Empir Softw Eng 24(1):33–67
Yuan Y, Banzhaf W (2018) ARJA: Automated Repair Of Java Programs via Multi-Objective Genetic Programming, IEEE Transactions on Software Engineering, vol PP
Acknowledgements
The authors would like to acknowledge Prof. Kate Smith-Miles and her team working on matilda.unimelb.edu.au. The methodology on Instance Space Analysis constitutes the foundations of this work. Matilda was used to create the instance spaces presented in Figures 2 and 3.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Christoph Treude
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Aleti, A., Martinez, M. E-APR: Mapping the effectiveness of automated program repair techniques. Empir Software Eng 26, 99 (2021). https://doi.org/10.1007/s10664-021-09989-x
Accepted:
Published:
DOI: https://doi.org/10.1007/s10664-021-09989-x