Skip to main content

A Study on the Effiect of Class Distribution Using Cost-Sensitive Learning

  • Conference paper
  • First Online:
Discovery Science (DS 2002)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2534))

Included in the following conference series:

Abstract

This paper investigates the effect of class distribution on the predictive performance of classification models using cost-sensitive learning, rather than the sampling approach employed previously by a similar study. The predictive performance is measured using the cost space representation, which is a dual to the ROC representation. This study shows that distributions which range between the natural distribution and the balanced distribution can also produce the best models, contrary to the finding of the previous study. In addition, we find that the best models are larger in size than those trained using the natural distribution. We also show two different ways to achieve the same effect of the corrected probability estimates proposed by the previous study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Blake, C. & Merz, C.J. UCI Repository of machine learning databases. [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California (1998).

    Google Scholar 

  2. Bradford, J., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. Pruning decision trees with misclassification costs. Proceedings of the European Conference on Machine Learning. (1998) 131–136.

    Google Scholar 

  3. Drummond C. & Holte R. Explicitly Representing Expected Cost: An Alternative to ROC Representation. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000) 198–207.

    Google Scholar 

  4. Drummond C. & Holte R. Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria. Proceedings of The Seventeenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann. (2000) 239–246.

    Google Scholar 

  5. Michie, D., D.J. Spiegelhalter, & C.C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood Limited. (1994).

    Google Scholar 

  6. Provost, F. & Fawcett, T. Robust Classification for Imprecise Environments. Machine Learning 42 (2001) 203–231.

    Article  MATH  Google Scholar 

  7. Quinlan, J.R. C4.5: Program for Machine Learning. Morgan Kaufmann. (1993).

    Google Scholar 

  8. Ting, K.M. An Instance-Weighting Method to Induce Cost-Sensitive Trees. IEEE Transactions on Knowledge and Data Engineering. Vol. 14, No. 3. (2002) 659–665.

    Article  Google Scholar 

  9. Ting, K.M. Issues in Classifier Evaluation using Optimal Cost Curves. Proceedings of The Nineteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann. (2002) 642–649.

    Google Scholar 

  10. Webb, G. Decision tree grafting from the all-tests-but-one partition. Proceedings of the 16th International Joint Conference on Artificial Intelligence. San Fransisco: Morgan Kaufmann. (1999) 702–707.

    Google Scholar 

  11. Weiss, G. & Provost, F. The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report ML-TR-44, Department of Computer Science, Rutgers University. (2001).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ting, K.M. (2002). A Study on the Effiect of Class Distribution Using Cost-Sensitive Learning. In: Lange, S., Satoh, K., Smith, C.H. (eds) Discovery Science. DS 2002. Lecture Notes in Computer Science, vol 2534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36182-0_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-36182-0_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-00188-1

  • Online ISBN: 978-3-540-36182-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics