Abstract
Several real world prediction problems involve forecasting rare values of a target variable. When this variable is nominal we have a problem of class imbalance that was already studied thoroughly within machine learning. For regression tasks, where the target variable is continuous, few works exist addressing this type of problem. Still, important application areas involve forecasting rare extreme values of a continuous target variable. This paper describes a contribution to this type of tasks. Namely, we propose to address such tasks by sampling approaches. These approaches change the distribution of the given training data set to decrease the problem of imbalance between the rare target cases and the most frequent ones. We present a modification of the well-known Smote algorithm that allows its use on these regression tasks. In an extensive set of experiments we provide empirical evidence for the superiority of our proposals for these particular regression tasks. The proposed SmoteR method can be used with any existing regression algorithm turning it into a general tool for addressing problems of forecasting rare extreme values of a continuous target variable.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Domingos, P.: Metacost: A general method for making classifiers cost-sensitive. In: KDD 1999: Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, pp. 155–164. ACM Press (1999)
Elkan, C.: The foundations of cost-sensitive learning. In: IJCAI 2001: Proc. of 17th Int. Joint Conf. of Artificial Intelligence, vol. 1, pp. 973–978. Morgan Kaufmann Publishers (2001)
Zadrozny, B.: One-benefit learning: cost-sensitive learning with restricted cost information. In: UBDM 2005: Proc. of the 1st Int. Workshop on Utility-Based Data Mining, pp. 53–58. ACM Press (2005)
Chawla, N.V.: Data mining for imbalanced datasets: An overview. In: The Data Mining and Knowledge Discovery Handbook. Springer (2005)
Zadrozny, B.: Policy mining: Learning decision policies from fixed sets of data. PhD thesis, University of California, San Diego (2003)
Ling, C., Sheng, V.: Cost-sensitive learning and the class imbalance problem. In: Encyclopedia of Machine Learning. Springer (2010)
Kubat, M., Matwin, S.: Addressing the curse of imbalanced training sets: One-sided selection. In: Proc. of the 14th Int. Conf. on Machine Learning, pp. 179–186. Morgan Kaufmann (1997)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. JAIR 16, 321–357 (2002)
Torgo, L., Ribeiro, R.: Precision and recall for regression. In: Gama, J., Costa, V.S., Jorge, A.M., Brazdil, P.B. (eds.) DS 2009. LNCS, vol. 5808, pp. 332–346. Springer, Heidelberg (2009)
Ribeiro, R.P.: Utility-based Regression. PhD thesis, Dep. Computer Science, Faculty of Sciences - University of Porto (2011)
Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: ICML 2006: Proc. of the 23rd Int. Conf. on Machine Learning, pp. 233–240. ACM ICPS, ACM (2006)
Torgo, L., Ribeiro, R.P.: Utility-based regression. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 597–604. Springer, Heidelberg (2007)
Milborrow, S.: Earth: Multivariate Adaptive Regression Spline Models. Derived from mda:mars by Trevor Hastie and Rob Tibshirani (2012)
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Misc Functions of the Department of Statistics (e1071), TU Wien (2011)
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Torgo, L., Ribeiro, R.P., Pfahringer, B., Branco, P. (2013). SMOTE for Regression. In: Correia, L., Reis, L.P., Cascalho, J. (eds) Progress in Artificial Intelligence. EPIA 2013. Lecture Notes in Computer Science(), vol 8154. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40669-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-40669-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40668-3
Online ISBN: 978-3-642-40669-0
eBook Packages: Computer ScienceComputer Science (R0)