Abstract
This paper investigates the effect of class distribution on the predictive performance of classification models using cost-sensitive learning, rather than the sampling approach employed previously by a similar study. The predictive performance is measured using the cost space representation, which is a dual to the ROC representation. This study shows that distributions which range between the natural distribution and the balanced distribution can also produce the best models, contrary to the finding of the previous study. In addition, we find that the best models are larger in size than those trained using the natural distribution. We also show two different ways to achieve the same effect of the corrected probability estimates proposed by the previous study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blake, C. & Merz, C.J. UCI Repository of machine learning databases. [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California (1998).
Bradford, J., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. Pruning decision trees with misclassification costs. Proceedings of the European Conference on Machine Learning. (1998) 131–136.
Drummond C. & Holte R. Explicitly Representing Expected Cost: An Alternative to ROC Representation. Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000) 198–207.
Drummond C. & Holte R. Exploiting the Cost (In)sensitivity of Decision Tree Splitting Criteria. Proceedings of The Seventeenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann. (2000) 239–246.
Michie, D., D.J. Spiegelhalter, & C.C. Taylor. Machine Learning, Neural and Statistical Classification. Ellis Horwood Limited. (1994).
Provost, F. & Fawcett, T. Robust Classification for Imprecise Environments. Machine Learning 42 (2001) 203–231.
Quinlan, J.R. C4.5: Program for Machine Learning. Morgan Kaufmann. (1993).
Ting, K.M. An Instance-Weighting Method to Induce Cost-Sensitive Trees. IEEE Transactions on Knowledge and Data Engineering. Vol. 14, No. 3. (2002) 659–665.
Ting, K.M. Issues in Classifier Evaluation using Optimal Cost Curves. Proceedings of The Nineteenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann. (2002) 642–649.
Webb, G. Decision tree grafting from the all-tests-but-one partition. Proceedings of the 16th International Joint Conference on Artificial Intelligence. San Fransisco: Morgan Kaufmann. (1999) 702–707.
Weiss, G. & Provost, F. The Effect of Class Distribution on Classifier Learning: An Empirical Study. Technical Report ML-TR-44, Department of Computer Science, Rutgers University. (2001).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ting, K.M. (2002). A Study on the Effiect of Class Distribution Using Cost-Sensitive Learning. In: Lange, S., Satoh, K., Smith, C.H. (eds) Discovery Science. DS 2002. Lecture Notes in Computer Science, vol 2534. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36182-0_11
Download citation
DOI: https://doi.org/10.1007/3-540-36182-0_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00188-1
Online ISBN: 978-3-540-36182-4
eBook Packages: Springer Book Archive