Abstract
As one of the main components of haze, topics with respect to PM2.5 are coming into people’s sight recently in China. In this paper, we try to predict PM2.5 concentrations in Dalian, China via symbolic regression (SR) based on genetic programming (GP). During predicting, the key problem is how to select accurate models by proper interestingness measures. In addition to the commonly used measures, such as R-squared value, mean squared error, number of parameters, etc., we also study the effectiveness of a set of potentially useful measures, such as AIC, BIC, HQC, AICc and EDC. Besides, a new interestingness measure, namely Interestingness Elasticity (IE), is proposed in this paper. From the experimental results, we find that the new measure gains the best performance on selecting candidate models and shows promising extrapolative capability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chan, C.K., Yao, X.: Air pollution in mega cities in China. Atmos. Environ. 42(1), 1–42 (2008)
Pope III, C.A., Dockery, D.W.: Health effects of fine particulate air pollution: lines that connect. J. Air Waste Manage. Assoc. 56(6), 709–742 (2006)
Vladislavleva, E.J., Smits, G.F., Den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans. Evol. Comput. IEEE 13(2), 333–349 (2009)
Cherkassky, V., Ma, Y.: Comparison of model selection for regression. Neural Comput. 15(7), 1691–1714 (2003)
Wagenmakers, E.J., Farrell, S.: AIC model selection using Akaike weights. Psychon. Bull. Rev. 11(1), 192–196 (2004)
Chen, H., Huang, S.: A comparative study on model selection and multiple model fusion. In: 2005 8th International Conference on Information Fusion, pp. 820–826. IEEE (2005)
Garg, A., Sriram, S., Tai, K.: Empirical analysis of model selection criteria for genetic programming in modeling of time series system. In: Conference on Computational Intelligence for Financial Engineering and Economics (CIFEr), pp. 90–94. IEEE (2013)
Posada, D., Buckley, T.R.: Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst. Biol. 53(5), 793–808 (2004)
Koza, J.R., Rice, J.P.: Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge (1994)
Kaboudan, M.A.: A measure of time series’ predictability using genetic programming applied to stock returns. J. Forecast. 18(5), 345–357 (1999)
Montaña, J.L., Alonso, C.L., Borges, C.E., de la Dehesa, J.: Penalty functions for genetic programming algorithms. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part I. LNCS, vol. 6782, pp. 550–562. Springer, Heidelberg (2011)
Myung, I.J.: The importance of complexity in model selection. J. Math. Psychol. 44(1), 190–204 (2000)
Akaike, H.: An information criterion (AIC). Math. Sci. 14(153), 5–9 (1976)
Yamaoka, K., Nakagawa, T., Uno, T.: Application of Akaike’s information criterion (AIC) in the evaluation of linear pharmacokinetic equations. J. Pharmacokinet. Biopharm. 6(2), 165–175 (1978)
Bozdogan, H.: Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)
Seghouane, A.K., Bekara, M.: A small sample model selection criterion based on Kullback’s symmetric divergence. Trans. Signal Process. IEEE 52(12), 3314–3323 (2004)
Hurvich, C.M., Tsai, C.L.: Regression and time series model selection in small samples. Biometrika 76(2), 297–307 (1989)
Burnham, K.P., Anderson, D.R.: Multimodel inference understanding AIC and BIC in model selection. Sociol. Methods Res. 33(2), 261–304 (2004)
Hannan, E.J., Quinn, B.G.: The determination of the order of an autoregression. J. R. Stat. Soc. Ser. B (Methodol.), 190–195 (1979)
Kundu, D., Murali, G.: Model selection in linear regression. Comput. Stat. Data Anal. 22(5), 461–469 (1996)
Acknowledgements
This work is supported by the National Natural Science Foundation of China (71001016).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Yang, G., Huang, J. (2015). Model Selection of Symbolic Regression to Improve the Accuracy of PM2.5 Concentration Prediction. In: Li, XL., Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D. (eds) Trends and Applications in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science(), vol 9441. Springer, Cham. https://doi.org/10.1007/978-3-319-25660-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-25660-3_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25659-7
Online ISBN: 978-3-319-25660-3
eBook Packages: Computer ScienceComputer Science (R0)