Skip to main content

Evaluating Three Approaches to Extracting Fault Data from Software Change Repositories

  • Conference paper
Product-Focused Software Process Improvement (PROFES 2010)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6156))

Abstract

Software products can only be improved if we have a good understanding of the faults they typically contain. Code faults are a significant source of software product problems which we currently do not understand sufficiently. Open source change repositories are potentially a rich and valuable source of fault data for both researchers and practitioners. Such fault data can be used to better understand current product problems so that we can predict and address future product problems. However extracting fault data from change repositories is difficult. In this paper we compare the performance of three approaches to extracting fault data from the change repository of the Barcode Open Source System. Our main findings are that we have most confidence in our manual evaluation of diffs to identify fault fixing changes. We had less confidence in the ability of the two automatic approaches to separate fault fixing from non-fault fixing changes. We conclude that it is very difficult to reliably extract fault fixing data from change repositories, especially using automatic tools and that we need to be cautious when reporting or using such data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Levinson, M.: Let’s stop wasting $78 billion a year. CIO Magazine (2001)

    Google Scholar 

  2. Runeson, P., Andrews, A.: Detection or Isolation of Defects? An Experimental Comparison of Unit Testing and Code Inspection. In: ISSRE 2003, pp. 3–13 (2003)

    Google Scholar 

  3. Di Fatta, G., Leue, S., Stegantova, E.: Dis-criminative Pattern Mining in Software Fault Detection. In: SOQUA Workshop (2006)

    Google Scholar 

  4. Turhan, B., Kocak, G., Bener, A.: Data mining source code for locating software bugs: A case study in telecommunication industry. Expert Syst. Appl. 36, 6 (2009)

    Article  Google Scholar 

  5. Bezerra, M.E.R., Oliveira, A.L.I., Adeodato, P.J.L., Meira, S.R.L.: Enhancing RBF-DDA Algorithm’s Robustness: Neural Networks Applied to Prediction of Fault-Prone Software Modules. In: Artificial Intelligence in Theory and Practice II (2007)

    Google Scholar 

  6. Oral, A.D., Bener, A.: Defect prediction for embedded software. In: Proceedings of the 22nd International Symposium on Computer and Information Sciences, pp. 1–6 (2007)

    Google Scholar 

  7. Pai, G.J., Dugan, J.B.: Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods. IEEE Trans. Software Eng. 33(10), 675–686 (2007)

    Article  Google Scholar 

  8. Tomaszewski, P., Håkansson, J., Grahn, H., Lundberg, L.: Statistical models vs. expert estimation for fault prediction in modified code – An industrial case study. Journal of Systems and Software 80(8), 1227–1238 (2007)

    Article  Google Scholar 

  9. Zimmermann, T., Premraj, R., Zeller, A.: Predicting defects for eclipse. In: Proceedings of the Third International Workshop on Predictor Models in Software Engineering (2007)

    Google Scholar 

  10. Sliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? In: Proceedings of the Second International Workshop on Mining Software Repositories, pp. 24–28 (2005)

    Google Scholar 

  11. Schröter, A., Zimmermann, T., Premraj, R., Zeller, A.: Where do bugs come from? SIGSOFT Softw. Eng. Notes 31(6), 1–2 (2006)

    Article  Google Scholar 

  12. Weyuker, E.J., Ostrand, T.J.: Comparing methods to identify defect reports in a change management database. In: DEFECTS 2008: Proceedings of the 2008 workshop on Defects in large software systems, pp. 27–31 (2008)

    Google Scholar 

  13. Ostrand, T.J., Weyuker, E.J., Bell, R.M.: Predicting the location and number of faults in large software systems. IEEE Trans. Software Eng. 31(4), 340–355 (2005)

    Article  Google Scholar 

  14. Zimmermann, T., Weissgerber, P.: Preprocessing cvs data for fine-grained analysis. In: Proceedings of the First International Workshop on Mining Software Repositories, pp. 2–6 (2004)

    Google Scholar 

  15. Meyers, T.M., Binkley, D.: An empirical study of slice-based cohesion and coupling metrics. ACM Trans. Softw. Eng. Methodol. 17(1), 1–27 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hall, T., Bowes, D., Liebchen, G., Wernick, P. (2010). Evaluating Three Approaches to Extracting Fault Data from Software Change Repositories. In: Ali Babar, M., Vierimaa, M., Oivo, M. (eds) Product-Focused Software Process Improvement. PROFES 2010. Lecture Notes in Computer Science, vol 6156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13792-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13792-1_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13791-4

  • Online ISBN: 978-3-642-13792-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics