Model Selection

Wang, Yuedong

doi:10.1007/978-3-642-21551-3_16

Yuedong Wang⁴

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

12k Accesses
1 Citations

Abstract

The need for model selection arises when a data-based choice among competing models has to be made. For example, for fitting parametric regression (linear, non-linear and generalized linear) models with multiple independent variables, one needs to decide which variables to include in the model (Chaps. III.7, III.8 and III.12); for fitting non-parametric regression (spline, kernel, local polynomial) models, one needs to decide the amount of smoothing (Chaps.III.5 andIII.10); for unsupervised learning, one needs to decide the number of clusters (Chaps. III.13 andIII.16); and for tree-based regression and classification, one needs to decide the size of a tree (Chap.III.14).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Akaike, H.: Statistical predictor identification. Ann. Inst. Statist. Math. 21, 203–217 (1970)
Article MathSciNet Google Scholar
Akaike, H.: Information theory and the maximum likelihood principle. International Symposium on Information Theory, In: Petrov, V., Csáki, F. (eds.) pp. 267–281 Budapest: Akademiai Kiádo (1973)
Google Scholar
Allen, D.M.: The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16, 125–127 (1974)
Article MathSciNet MATH Google Scholar
Bai, Z.D., Rao, C.R., Wu, Y.: Model selection with data-oriented penalty. J. Stat. Plann. Infer. 77, 103–117 (1999)
Article MathSciNet MATH Google Scholar
Beran, R.: Bootstrap variable selection and confidence sets. In: Rieder, H. (eds.) Robust Statistics, Data Analysis and Computer Intensive Methods, Springer Lecture Notes in Statistics 109 (1996)
Google Scholar
Berger, J.O., Pericchi, L.R.: The intrinsic bayes factor for model selection and prediction. J. Am. Stat. Assoc. 91, 109–122 (1996)
Article MathSciNet MATH Google Scholar
Berger, J.O., Pericchi, L.R.: Objective bayesian methods for model selection: Introduction and comparison. In: Lahiri, P. (eds.) Model Selection, Institute of Mathematical Statistics Lecture Notes – Monograph Series, vol. 38, pp. 135–207. Beachwood Ohio (2001)
Google Scholar
Box, G. E.P.: Science and statistics. J. Am. Stat. Assoc. 71, 791–799 (1976)
Article MathSciNet MATH Google Scholar
Box, G.E.P., Jenkins, G.M.: Time Series Analysis, Holden-Day (1976)
Google Scholar
Burnham, K.P., Anderson, D.R.: Model Selection and Multimodel Inference, (2nd ed.), Springer, New York (2002)
MATH Google Scholar
Chatfield, C.: Model uncertainty, data mining and statistical inference (with discussion). J. Roy. Stat. Soc. B 158, 419–466 (1995)
Article Google Scholar
Chipman, H., George, E.I., McCulloch, R.E.: The practical implementation of bayesian model selection. In: Lahiri, P. (eds.) Model Selection, Institute of Mathematical Statistics Lecture Notes – Monograph Series, vol. 38, pp. 65–134. Beachwood Ohio (2001)
Google Scholar
Craven, P., Wahba, G.: Smoothing noisy data with spline functions. Numer. Math. 31, 377–403 (1979)
Article MathSciNet MATH Google Scholar
Dette, H., Munk, A., Wagner, T.: Estimating the variance in nonparametric regression – what is a reasonable choice? J. Roy. Stat. Soc. B 60, 751–764 (1998)
Article MathSciNet MATH Google Scholar
Donoho, D.L., Johnston, I.M.: Ideal spatial adaption by wavelet shrinkage. Biometrika 81, 425–456 (1994)
Article MathSciNet MATH Google Scholar
Efron, B.: How biased is the apparent error rate of a prediction rule. J. Am. Stat. Assoc. 81, 461–470 (1986)
Article MathSciNet MATH Google Scholar
Gasser, T., Sroka, L., Jennen-Steinmetz, C.: Residual variance and residual pattern in nonlinear regression. Biometrika 73, 625–633 (1986)
Article MathSciNet MATH Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, Chapman & Hall, Boca Raton (1995)
Google Scholar
George, E.I.: The variable selection problem. J. Am. Stat. Assoc. 95, 1304–1308 (2000)
Article MATH Google Scholar
Golub, G., Heath, M., Wahba, G.: Generalized cross validation as a method for choosing a good ridge parameter. Technometrics 21, 215–224 (1979)
Article MathSciNet MATH Google Scholar
Gu, C.: Model indexing and smoothing parameter selection in nonparametric function estimation (with discussion). Statistica Sinica 8, 632–638 (1998)
Google Scholar
Gu, C.: Smoothing Spline ANOVA Models, Springer, New York (2002)
MATH Google Scholar
Hall, P., Kay, J.W., Titterington, D.M.: Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika 77, 521–528 (1990)
Article MathSciNet Google Scholar
Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman and Hall, London (1990)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, Springer, New York (2002)
Google Scholar
Hinkley, D.: Bootstrap methods for variable selection and shrinkage estimatior confidence sets, Personal Communication (2003)
Google Scholar
Hoeting, J.A., Madigan, D., Raftery, A.E., Volinsky, C.T.: Bayesian model averaging: a tutorial (with discussion). Stat. Sci. 14, 382–417 (1999); Corrected version available athttp://www.stat.washington.edu/www/research/online/hoeting1999.pdf
Hurvich, C.M., Tsai, C.L.: Regression and time series model selection in small samples. Biometrika 76, 297–207 (1989)
Article MathSciNet MATH Google Scholar
Jeffreys, H.: Theorey of Probability, Clarendon Press, Oxford (1961)
Google Scholar
Jeffreys, W., Berger, J.O.: Ockham’s razor and bayesian analysis. Am. Sci. 80, 64–72 (1992)
Google Scholar
Kass, R.E., Raftery, A.: Bayesian factors. J. Am. Stat. Assoc. 90, 773–795 (1995)
MATH Google Scholar
Kass, R.E., Wasserman, L.: A reference bayesian test for nested hypotheses and its relationship to the schwarz criterion. J. Am. Stat. Assoc. 90, 982–934 (1995)
MathSciNet Google Scholar
Ke, C., Wang, Y.: Nonparametric nonlinear regression models, Technical Report # 385, Department of Statistics and Applied Probability, University of California, Santa Barbara (2002)
Google Scholar
Li, K.C.: From Stein’s unbaised risk estimates to the method of generalized cross-validation. Ann. Stat. 13, 1352–1377 (1985)
Article MATH Google Scholar
Li, K.C.: Asymptotic optimality of C _Land generalized cross-validation in ridge regression with application to spline smoothing. Ann. Stat. 14, 1101–1112 (1986)
Article MATH Google Scholar
Li, K.C.: Asymptotic optimality of C _p, C _L, cross-validation and generalized cross-validation: Discrete index set. Ann. Stat. 15, 958–975 (1987)
Article MATH Google Scholar
Linhart, H., Zucchini, W.: Model Selection, Wiley, New York (1986)
Google Scholar
Mallows, C.L.: Some comments on C _p. Technometrics 12, 661–675 (1973)
Google Scholar
Miller, A.: Subset Selection in Regression, (2nd edn.), Chapman & Hall, New York (2002)
Book MATH Google Scholar
Opsomer, J., Wang, Y., Yang, Y.: Nonparametric regression with correlated errors. Stat. Sci. 16, 134–153 (2001)
Article MathSciNet MATH Google Scholar
Rao, C.R., Wu, Y.: A strongly consistent procedure for model selection in a regreesion problem. Biometrika 76, 369–374 (1989)
Article MathSciNet MATH Google Scholar
Rao, J.S.: Bootstrap choice of cost complexity for better subset selection. Statistica Sinica 9, 273–287 (1999)
MathSciNet MATH Google Scholar
Rao, J.S., Tibshirani, R.: Discussion to “an asympototic theory for model selection” by Jun Shao. Statistica Sinica 7, 249–252 (1997)
Google Scholar
Rice, J.A.: Bandwidth choice for nonparametric regression. Ann. Stat. 12, 1215–1230 (1984)
Article MathSciNet MATH Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 12, 1215–1231 (1978)
Google Scholar
Shao, J.: Linear model selection by cross-validation. J. Am. Stat. Assoc. 88, 486–494 (1993)
Article MATH Google Scholar
Shao, J.: An asymptotic theory for linear model selection (with discussion). Statistica Sinica 7, 221–264 (1997)
MathSciNet MATH Google Scholar
Shen, X., Ye, J.: Adaptive model selection. J. Am. Stat. Assoc. 97, 210–221 (2002)
Article MathSciNet MATH Google Scholar
Stone, M.: Cross-validatory choice and assessment of statistical prediction. J. Roy. Stat. Soc. B 36, 111–147 (1974)
MATH Google Scholar
Wahba, G.: Spline Models for Observational Data, CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 59, SIAM, Philadelphia (1990)
Google Scholar
Wahba, G., Wang, Y.: Behavior near zero of the distribution of GCV smoothing parameter estimates for splines. Stat. Probab. Lett. 25, 105–111 (1993)
Article MathSciNet Google Scholar
Wahba, G., Wold, S.: A completely automatic french curve. Comm. Stat. 4, 1–17 (1975)
Article MathSciNet MATH Google Scholar
Wang, Y.: Smoothing spline models with correlated random errors. J. Am. Stat. Assoc. 93, 341–348 (1998)
Article MATH Google Scholar
Wang, Y: Smoothing Splines: Methods and Applications, Chapman and Hall, London (2011)
MATH Google Scholar
Wang, Y., Ke, C.: ASSIST: A suite of s-plus functions implementing spline smoothing techniques, Proceedings of the Hawaii International Conference on Statistics (2002); Available at http://www.pstat.ucsb.edu/faculty/yuedong/software
Wang, Y., Guo, W., Brown, M.B.: Spline smoothing for bivariate data with applications to association between hormones. Statistica Sinica 10, 377–397 (2000)
MathSciNet MATH Google Scholar
Xiang, D., Wahba, G.: A genralized approximate cross validation for smoothing splines with non-gaussian data. Statistica Sinica 6, 675–692 (1996)
MathSciNet MATH Google Scholar
Yang, Y.: Model selection for nonparametric regression. Statistica Sinica 9, 475–499 (1999)
MathSciNet MATH Google Scholar
Ye, J.: On measuring and correcting the effects of data mining and model selection. J. Am. Stat. Assoc. 93, 120–131 (1998)
Article MATH Google Scholar
Zhang, H., Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R., Klein, B.: Variable selection and model building via likelihood basis pursuit, Technical Report No. 1059, Department of Statistics, University of Wisconsin (2002)
Google Scholar
Zhang, P.: Model selection via multifold cross validation. Ann. Stat. 21, 299–313 (1993)
Article Google Scholar
Zhou, H., Huang, J.T.: Minimax estimation with thresholding and its application to wavelet analysis. Ann. Stat. 33, 101–125 (2005)
Article MATH Google Scholar

Download references

Acknowledgements

This work was supported by NIH Grants R01 GM58533.

Author information

Authors and Affiliations

Department of Statistics and Applied Probability, University of California, Santa Barbara, CA, USA
Yuedong Wang

Authors

Yuedong Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuedong Wang .

Editor information

Editors and Affiliations

Dept. Computational & Data, Sciences, George Mason University, University Drive 4400, Fairfax, 22030-4444, Virginia, USA
James E. Gentle
L.v.Bortkiewicz Chair of Statistics, C.A.S.E. Centre f. Appl. Stat. & Econ., Humboldt-Universität zu Berlin, Unter den Linden 6, Berlin, 10099, Germany
Wolfgang Karl Härdle
Dept. Socioinformation, Okayama University, Ridai-cho 1-1, Okayama, 700-0005, Japan
Yuichi Mori

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, Y. (2012). Model Selection. In: Gentle, J., Härdle, W., Mori, Y. (eds) Handbook of Computational Statistics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21551-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-21551-3_16
Published: 21 December 2011
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21550-6
Online ISBN: 978-3-642-21551-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics