A Study on the Effiect of Class Distribution Using Cost-Sensitive Learning

Ting, Kai Ming

doi:10.1007/3-540-36182-0_11

Kai Ming Ting⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2534))

Included in the following conference series:

International Conference on Discovery Science

972 Accesses
3 Citations

Abstract

This paper investigates the effect of class distribution on the predictive performance of classification models using cost-sensitive learning, rather than the sampling approach employed previously by a similar study. The predictive performance is measured using the cost space representation, which is a dual to the ROC representation. This study shows that distributions which range between the natural distribution and the balanced distribution can also produce the best models, contrary to the finding of the previous study. In addition, we find that the best models are larger in size than those trained using the natural distribution. We also show two different ways to achieve the same effect of the corrected probability estimates proposed by the previous study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Blake, C. & Merz, C.J. UCI Repository of machine learning databases. [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California (1998).
Google Scholar
Bradford, J., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. Pruning decision trees with misclassification costs. Proceedings of the European Conference on Machine Learning. (1998) 131–136.
Google Scholar
Drummond C. & Holte R. Explicitly Representing Expected Cost: An Alternative to ROC Representation. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000) 198–207.
Google Scholar
Drummond C. & Holte R. Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria. Proceedings of The Seventeenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann. (2000) 239–246.
Google Scholar
Michie, D., D.J. Spiegelhalter, & C.C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood Limited. (1994).
Google Scholar
Provost, F. & Fawcett, T. Robust Classification for Imprecise Environments. Machine Learning 42 (2001) 203–231.
Article MATH Google Scholar
Quinlan, J.R. C4.5: Program for Machine Learning. Morgan Kaufmann. (1993).
Google Scholar
Ting, K.M. An Instance-Weighting Method to Induce Cost-Sensitive Trees. IEEE Transactions on Knowledge and Data Engineering. Vol. 14, No. 3. (2002) 659–665.
Article Google Scholar
Ting, K.M. Issues in Classifier Evaluation using Optimal Cost Curves. Proceedings of The Nineteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann. (2002) 642–649.
Google Scholar
Webb, G. Decision tree grafting from the all-tests-but-one partition. Proceedings of the 16th International Joint Conference on Artificial Intelligence. San Fransisco: Morgan Kaufmann. (1999) 702–707.
Google Scholar
Weiss, G. & Provost, F. The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report ML-TR-44, Department of Computer Science, Rutgers University. (2001).
Google Scholar

Download references

Author information

Authors and Affiliations

Gippsland School of Computing and Information Technology, Monash University, 3842, Victoria, Australia
Kai Ming Ting

Authors

Kai Ming Ting
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Deutsches Forschungszentrum für Künstliche Intelligenz, Stuhlsatzenhausweg 3, 66123, Saarbrücken, Germany
Steffen Lange
National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, 101-8430, Tokyo, Japan
Ken Satoh
Department of Computer Science, University of Maryland, College Park, 20742, Maryland, MD, USA
Carl H. Smith

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ting, K.M. (2002). A Study on the Effiect of Class Distribution Using Cost-Sensitive Learning. In: Lange, S., Satoh, K., Smith, C.H. (eds) Discovery Science. DS 2002. Lecture Notes in Computer Science, vol 2534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36182-0_11

Download citation

DOI: https://doi.org/10.1007/3-540-36182-0_11
Published: 08 November 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00188-1
Online ISBN: 978-3-540-36182-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics