Skip to main content

Moving Beyond Linearity

  • Chapter
  • First Online:
An Introduction to Statistical Learning

Part of the book series: Springer Texts in Statistics ((STS,volume 103))

Abstract

So far in this book, we have mostly focused on linear models. Linear models are relatively simple to describe and implement, and have advantages over other approaches in terms of interpretation and inference. However, standard linear regression can have significant limitations in terms of predictive power. This is because the linearity assumption is almost always an approximation, and sometimes a poor one. In Chapter 6 we see that we can improve upon least squares using ridge regression, the lasso, principal components regression, and other techniques. In that setting, the improvement is obtained by reducing the complexity of the linear model, and hence the variance of the estimates. But we are still using a linear model, which can only be improved so far! In this chapter we relax the linearity assumption while still attempting to maintain as much interpretability as possible. We do this by examining very simple extensions of linear models like polynomial regression and step functions, as well as more sophisticated approaches such as splines, local regression, and generalized additive models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    If \(\hat{\mathbf{C}}\) is the 5 ×5 covariance matrix of the \(\hat{\beta }_{j}\), and if \(\boldsymbol{\ell}_{0}^{T} = (1,x_{0},x_{0}^{2},x_{0}^{3},x_{0}^{4})\), then \(\mbox{ Var}[\hat{f}(x_{0})] = \boldsymbol{\ell}_{0}^{T}\hat{\mathbf{C}}\boldsymbol{\ell}_{0}\).

  2. 2.

    We exclude C 0(X) as a predictor in (7.5) because it is redundant with the intercept. This is similar to the fact that we need only two dummy variables to code a qualitative variable with three levels, provided that the model will contain an intercept. The decision to exclude C 0(X) instead of some other C k (X) in (7.5) is arbitrary. Alternatively, we could include C 0(X), C 1(X), …, C K (X), and exclude the intercept.

  3. 3.

    derivative

    cubic spline

    Cubic splines are popular because most human eyes cannot detect the discontinuity at the knots.

  4. 4.

    There are actually five knots, including the two boundary knots. A cubic spline with five knots would have nine degrees of freedom. But natural cubic splines have two additional natural constraints at each boundary to enforce linearity, resulting in \(9 - 4 = 5\) degrees of freedom. Since this includes a constant, which is absorbed in the intercept, we count it as four degrees of freedom.

  5. 5.

    The exact formulas for computing \(\hat{g}(x_{i})\) and S λ are very technical; however, efficient algorithms are available for computing these quantities.

  6. 6.

    backfitting

    A partial residual for X 3, for example, has the form \(r_{i} = y_{i} - f_{1}(x_{i1}) - f_{2}(x_{i2})\). If we know f 1 and f 2, then we can fit f 3 by treating this residual as a response in a non-linear regression on X 3.

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

James, G., Witten, D., Hastie, T., Tibshirani, R. (2013). Moving Beyond Linearity. In: An Introduction to Statistical Learning. Springer Texts in Statistics, vol 103. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7138-7_7

Download citation

Publish with us

Policies and ethics