Model Selection of Symbolic Regression to Improve the Accuracy of PM2.5 Concentration Prediction

Yang, Guangfei; Huang, Jian

doi:10.1007/978-3-319-25660-3_16

Guangfei Yang¹⁹ &
Jian Huang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9441))

834 Accesses
1 Citations

Abstract

As one of the main components of haze, topics with respect to PM_2.5 are coming into people’s sight recently in China. In this paper, we try to predict PM_2.5 concentrations in Dalian, China via symbolic regression (SR) based on genetic programming (GP). During predicting, the key problem is how to select accurate models by proper interestingness measures. In addition to the commonly used measures, such as R-squared value, mean squared error, number of parameters, etc., we also study the effectiveness of a set of potentially useful measures, such as AIC, BIC, HQC, AICc and EDC. Besides, a new interestingness measure, namely Interestingness Elasticity (IE), is proposed in this paper. From the experimental results, we find that the new measure gains the best performance on selecting candidate models and shows promising extrapolative capability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chan, C.K., Yao, X.: Air pollution in mega cities in China. Atmos. Environ. 42(1), 1–42 (2008)
Article Google Scholar
Pope III, C.A., Dockery, D.W.: Health effects of fine particulate air pollution: lines that connect. J. Air Waste Manage. Assoc. 56(6), 709–742 (2006)
Article Google Scholar
Vladislavleva, E.J., Smits, G.F., Den Hertog, D.: Order of nonlinearity as a complexity measure for models generated by symbolic regression via pareto genetic programming. Trans. Evol. Comput. IEEE 13(2), 333–349 (2009)
Article Google Scholar
Cherkassky, V., Ma, Y.: Comparison of model selection for regression. Neural Comput. 15(7), 1691–1714 (2003)
Article MATH Google Scholar
Wagenmakers, E.J., Farrell, S.: AIC model selection using Akaike weights. Psychon. Bull. Rev. 11(1), 192–196 (2004)
Article MathSciNet Google Scholar
Chen, H., Huang, S.: A comparative study on model selection and multiple model fusion. In: 2005 8th International Conference on Information Fusion, pp. 820–826. IEEE (2005)
Google Scholar
Garg, A., Sriram, S., Tai, K.: Empirical analysis of model selection criteria for genetic programming in modeling of time series system. In: Conference on Computational Intelligence for Financial Engineering and Economics (CIFEr), pp. 90–94. IEEE (2013)
Google Scholar
Posada, D., Buckley, T.R.: Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Syst. Biol. 53(5), 793–808 (2004)
Article Google Scholar
Koza, J.R., Rice, J.P.: Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge (1994)
MATH Google Scholar
Kaboudan, M.A.: A measure of time series’ predictability using genetic programming applied to stock returns. J. Forecast. 18(5), 345–357 (1999)
Article Google Scholar
Montaña, J.L., Alonso, C.L., Borges, C.E., de la Dehesa, J.: Penalty functions for genetic programming algorithms. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011, Part I. LNCS, vol. 6782, pp. 550–562. Springer, Heidelberg (2011)
Chapter Google Scholar
Myung, I.J.: The importance of complexity in model selection. J. Math. Psychol. 44(1), 190–204 (2000)
Article MATH Google Scholar
Akaike, H.: An information criterion (AIC). Math. Sci. 14(153), 5–9 (1976)
Google Scholar
Yamaoka, K., Nakagawa, T., Uno, T.: Application of Akaike’s information criterion (AIC) in the evaluation of linear pharmacokinetic equations. J. Pharmacokinet. Biopharm. 6(2), 165–175 (1978)
Article Google Scholar
Bozdogan, H.: Model selection and Akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)
Article MathSciNet MATH Google Scholar
Seghouane, A.K., Bekara, M.: A small sample model selection criterion based on Kullback’s symmetric divergence. Trans. Signal Process. IEEE 52(12), 3314–3323 (2004)
Article MathSciNet Google Scholar
Hurvich, C.M., Tsai, C.L.: Regression and time series model selection in small samples. Biometrika 76(2), 297–307 (1989)
Article MathSciNet MATH Google Scholar
Burnham, K.P., Anderson, D.R.: Multimodel inference understanding AIC and BIC in model selection. Sociol. Methods Res. 33(2), 261–304 (2004)
Article MathSciNet Google Scholar
Hannan, E.J., Quinn, B.G.: The determination of the order of an autoregression. J. R. Stat. Soc. Ser. B (Methodol.), 190–195 (1979)
Google Scholar
Kundu, D., Murali, G.: Model selection in linear regression. Comput. Stat. Data Anal. 22(5), 461–469 (1996)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (71001016).

Author information

Authors and Affiliations

School of Management Science and Engineering, Dalian University of Technology, Dalian, China
Guangfei Yang & Jian Huang

Authors

Guangfei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guangfei Yang .

Editor information

Editors and Affiliations

Institute of Infocomm Research, Singapore, Singapore
Xiao-Li Li
Ho Chi Minh City University of Tech, Ho Chi Minh City, Vietnam
Tru Cao
School of Information Systems, Singapore Management University, Singapore, Singapore
Ee-Peng Lim
Nanjing University, Nanjing, China
Zhi-Hua Zhou
Science & Technology, Japan Advanced Institute of, Nomi-shi, Ishikawa, Japan
Tu-Bao Ho
The University of Hong Kong, Hong Kong, China
David Cheung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, G., Huang, J. (2015). Model Selection of Symbolic Regression to Improve the Accuracy of PM_2.5 Concentration Prediction. In: Li, XL., Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D. (eds) Trends and Applications in Knowledge Discovery and Data Mining. Lecture Notes in Computer Science(), vol 9441. Springer, Cham. https://doi.org/10.1007/978-3-319-25660-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-25660-3_16
Published: 26 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25659-7
Online ISBN: 978-3-319-25660-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics