Skip to main content

Improving Classification by Removing or Relabeling Mislabeled Instances

  • Conference paper
  • First Online:
Foundations of Intelligent Systems (ISMIS 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2366))

Included in the following conference series:

Abstract

It is common that a database contains noisy data. An important source of noise consists in mislabeled training instances. We present a new approach that deals with improving classification accuracies in such a case by using a preliminary filtering procedure. An example is suspect when in its neighborhood defined by a geometrical graph the proportion of examples of the same class is not significantly greater than in the whole database. Such suspect examples in the training data can be removed or relabeled. The filtered training set is then provided as input to learning algorithm. Our experiments on ten benchmarks of UCI Machine Learning Repository using 1-NN as the final algorithm show that removing give better results than relabeling. Removing allows maintaining the generalization error rate when we introduce from 0 to 20% of noise on the class, especially when classes are well separable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. V. Barnett and T. Lewis. Outliers in statistical data. Wiley, Norwich, 1984.

    MATH  Google Scholar 

  2. R. J. Beckman and R. D. Cooks. Oulier...s. Technometrics, 25:119–149, 1983.

    Article  MATH  MathSciNet  Google Scholar 

  3. C. L. Blake and C. J. Merz. UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science [http://www.ics.uci.edu/~mlearn/MLRepository.html], 1998.

    Google Scholar 

  4. C. E. Brodley and M. A. Friedl. Identifying and eliminating mislabeled training instances. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 799–805, Portland OR, 1996. AAI Press.

    Google Scholar 

  5. C. E. Brodley and M. A. Friedl. Identifying mislabeled training data. Journal of Artificial Intelligence Research, 11:131–167, 1999.

    MATH  Google Scholar 

  6. G. H. John. Robust decision trees: removing outliers from data. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pages 174–179, Montréal, Québec, 1995. AAI Press.

    Google Scholar 

  7. E. M. Knorr, R. T. Ng, and V. Tucakov. Distance-based outliers: Algorithms and applications. The VLDB Journal, 8(3):237–253, February 2000.

    Google Scholar 

  8. C. Largeron. Reconnaissance des formes par relaxation: un modèle d’aide à la décision. PhD thesis, Université Lyon 1, 1991.

    Google Scholar 

  9. F. Muhlenbach, S. Lallich, and D. A. Zighed. Amélioration d’une classification par filtrage des exemples mal étiquetés. ECA, 1(4):155–166, 2001.

    Google Scholar 

  10. J. R. Quinlan. Induction of decisions trees. Machine Learning, 1:81–106, 1986.

    Google Scholar 

  11. J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.

    Google Scholar 

  12. I. Tomek. An experiment with the edited nearest neighbor rule. IEEE Transactions on Systems, Man and Cybernetics, 6(6):448–452, 1976.

    Article  MATH  MathSciNet  Google Scholar 

  13. G. Toussaint. The relative neighborhood graph of a finite planar set. Pattern recognition, 12:261–268, 1980.

    Article  MATH  MathSciNet  Google Scholar 

  14. D. Wilson. Asymptotic properties of nearest neighbors rules using edited data. In IEEE Transactions on systems, Man and Cybernetics, 2:408–421, 1972.

    Article  MATH  Google Scholar 

  15. D. R. Wilson and T. R. Martinez. Reduction techniques for exemplar-based learning algorithms. Machine Learning, 38:257–268, 2000.

    Article  MATH  Google Scholar 

  16. D. A. Zighed, S. Lallich, and F. Muhlenbach. Séparabilité des classes dans Rp. In Actes des 8èmes Rencontres de la SFC, pages 356–363, 2001.

    Google Scholar 

  17. D. A. Zighed and M. Sebban. Sélection et validation statistique de variables et de prototypes. In M. Sebban and G. Venturini, editors, Apprentissage automatique. Hermès Science, 1999.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lallich, S., Muhlenbach, F., Zighed, D.A. (2002). Improving Classification by Removing or Relabeling Mislabeled Instances. In: Hacid, MS., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds) Foundations of Intelligent Systems. ISMIS 2002. Lecture Notes in Computer Science(), vol 2366. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48050-1_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-48050-1_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43785-7

  • Online ISBN: 978-3-540-48050-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics