Skip to main content
Log in

Quality assessment of individual classifications in machine learning and data mining

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Although in the past machine learning algorithms have been successfully used in many problems, their serious practical use is affected by the fact that often they cannot produce reliable and unbiased assessments of their predictions' quality. In last few years, several approaches for estimating reliability or confidence of individual classifiers have emerged, many of them building upon the algorithmic theory of randomness, such as (historically ordered) transduction-based confidence estimation, typicalness-based confidence estimation, and transductive reliability estimation. Unfortunately, they all have weaknesses: either they are tightly bound with particular learning algorithms, or the interpretation of reliability estimations is not always consistent with statistical confidence levels. In the paper we describe typicalness and transductive reliability estimation frameworks and propose a joint approach that compensates the above-mentioned weaknesses by integrating typicalness-based confidence estimation and transductive reliability estimation into a joint confidence machine. The resulting confidence machine produces confidence values in the statistical sense. We perform series of tests with several different machine learning algorithms in several problem domains. We compare our results with that of a proprietary method as well as with kernel density estimation. We show that the proposed method performs as well as proprietary methods and significantly outperforms density estimation methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bay SD, Pazzani MJ (2000) Characterizing model errors and differences. In: Proc. 17th international conf. on machine learning. Morgan Kaufmann, San Francisco, CA,pp 49–56

  2. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Bartlett P, Mansour Y (eds) Proceedings of the 11th annual conference on computational learning theory. ACM Press, New York, USA, Madison, Wisconsin, pp 92–100

  3. Diamond GA, Forester JS (1979) Analysis of probability as an aid in the clinical diagnosis of coronary artery disease. New England Journal of Medicine 300: 1350

    Article  Google Scholar 

  4. Gammerman A, Vovk V, Vapnik V (1998) Learning by transduction. In: Cooper GF, Moral S (eds) Proceedings of the 14th conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco, USA, Madison, Wisconsin, pp 148–155

  5. Gibbs AL, Su FE (2002) On choosing and bounding probability metrics. International Statistical Review 70(3): 419–435

    Article  MATH  Google Scholar 

  6. Halck OM (2002) Using hard classifiers to estimate conditional class probabilities. In:Elomaa T, Mannila H, Toivonen H (eds) Proceedings of the thirteenth European conference on machine learning. Springer-Verlag, Berlin, pp 124–134

  7. Hastie T, Tibisharani R, Friedman J (2001) The elements of statistical learning. Springer-Verlag

  8. Ho SS, Wechsler H (2003) Transductive confidence machine for active learning. In: Proc. Int. joint Conf. on neural networks'03. Portland, OR

  9. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Besnard P, Hanks S (eds) Proceedings of the eleventh conference on uncertainty in artificial intelligence. Morgan Kaufmann, San Francisco, USA

  10. Kononenko I (1991) Semi-naive Bayesian classifier. In: Kodratoff Y (ed) Proc. European working session on learning-91. Springer-Verlag, Berlin-Heidelberg-New York, Porto, Potrugal, pp 206–219

  11. Kukar M (2001a) Estimating classifications' reliability. PhD thesis, University of Ljubljana, faculty of computer and information science, Ljubljana, Slovenia. In Slovene

  12. Kukar M (2001b) Making reliable diagnoses with machine learning: A case study. In: Quaglini S, Barahona P, Andreassen S (eds) Proceedings of artificial intelligence in medicine Europe, AIME 2001. Springer-Verlag, Berlin, Cascais, Portugal, pp 88–96

  13. Kukar M, Kononenko I (2002) Reliable classifications with machine learning. In: Elomaa T, Mannila H, Toivonen H (eds) Proceedings of 13th European conference on machine learning. ECML 2002', Springer-Verlag, Berlin, pp 219–231

  14. Li M, Vitányi P (1997) An introduction to Kolmogorov complexity and its applications. 2nd edn. Springer-Verlag, New York

    MATH  Google Scholar 

  15. Melluish T, Saunders C, Nouretdinov I, Vovk V (2001) Comparing the Bayes and typicalness frameworks. In: Proc. ECML 2001. vol 2167, pp 350–357

  16. Nouretdinov I, Melluish T, Vovk V (2001) Ridge regressioon confidence machine. In: Proc. 18th international conf. on machine learning. Morgan Kaufmann, San Francisco, CA, pp 385–392

  17. Olona-Cabases M (1994) The probability of a correct diagnosis. In: Candell-Riera J, Ortega-Alcalde D (eds) Nuclear cardiology in everyday practice. Kluwer, Dordrecht, NL, pp 348–357

  18. Pfahringer B, Bensuasan H, Giraud-Carrier C (2000) Meta-learning by landmarking various learning algorithms. In: Proc. 17th international conf. on machine learning. Morgan Kaufmann, San Francisco, CA

  19. Proedrou K, Nouretdinov I, Vovk V, Gammerman A (2002) Transductive confidence machines for pattern recognition. In: Proc. ECML 2002. Springer, Berlin, pp 381–390

  20. Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach. learn. 53: 23–69

    Article  MATH  Google Scholar 

  21. Rumelhart D, McClelland JL (1986) Parallel distributed processing, vol 1: Foundations. MIT Press, Cambridge

  22. Saunders C, Gammerman A, Vovk V (1999) Transduction with confidence and credibility. In: Dean T (ed) Proceedings of the international joint conference on artificial intelligence. Morgan Kaufmann, San Francisco, USA, Stockholm, Sweden

  23. Seewald A, Furnkranz J (2001) An evaluation of grading classifiers. In: Proc. 4th international symposium on advances in intelligent data analysis. pp 115–124

  24. Smyth P, Gray A, Fayyad U (1995) Retrofitting decision tree classifiers using kernel density estimation. In: Prieditis A, Russell SJ (eds) Proceedings of the twelvth international conference on machine learning. Morgan Kaufmann, San Francisco, USA, Tahoe City, California, USA, pp 506–514

  25. Specht DF, Romsdahl H (1994) Experience with adaptive pobabilistic neural networks and adaptive general regression neural networks. In: Rogers SK (ed) Proceedings of IEEE international conference on neural networks. IEEE Press, Piscataway, USA, Orlando, USA

  26. Taneja IJ (1995) On generalized information measures and their applications. Adv. Electron. Elect. Physics 76: 327–416

    Google Scholar 

  27. Vapnik V (1998) Statistical learning theory. John Wiley, New York, USA

    MATH  Google Scholar 

  28. Venables WN, Ripley BD (2002) Modern applied statistics with S-PLUS, 4th ed. Springer-Verlag

  29. Vovk V, Gammerman A, Saunders C (1999) Machine learning application of algorithmic randomness. In: Bratko I, Dzeroski S (eds) Proceedings of the 16th international conference on machine learning (ICML'99). Morgan Kaufmann, San Francisco, USA, Bled, Slovenija

  30. Wand MP, Jones MC (1995) Kernel smoothing. Chapman and Hall, London

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Matjaž Kukar is currently Assistant Professor in the Faculty of Computer and Information Science at University of Ljubljana. His research interests include machine learning, data mining and intelligent data analysis, ROC analysis, cost-sensitive learning, reliability estimation, and latent structure analysis, as well as applications of data mining in medical and business problems.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kukar, M. Quality assessment of individual classifications in machine learning and data mining. Knowl Inf Syst 9, 364–384 (2006). https://doi.org/10.1007/s10115-005-0203-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-005-0203-z

Navigation