Skip to main content

A Review of Heteroscedasticity Treatment with Gaussian Processes and Quantile Regression Meta-models

  • Chapter
  • First Online:
Seeing Cities Through Big Data

Part of the book series: Springer Geography ((SPRINGERGEOGR))

Abstract

For regression problems, the general practice is to consider a constant variance of the error term across all data. This aims to simplify an often complicated model and relies on the assumption that this error is independent of the input variables. This property is known as homoscedasticity. On the other hand, in the real world, this is often a naive assumption, as we are rarely able to exhaustively include all true explanatory variables for a regression. While Big Data is bringing new opportunities for regression applications, ignoring this limitation may lead to biased estimators and inaccurate confidence and prediction intervals.

This paper aims to study the treatment of non-constant variance in regression models, also known as heteroscedasticity. We apply two methodologies: integration of conditional variance within the regression model itself; treat the regression model as a black box and use a meta-model that analyzes the error separately. We compare the performance of both approaches using two heteroscedastic data sets.

Although accounting for heteroscedasticity in data increases the complexity of the models used, we show that it can greatly improve the quality of the predictions, and more importantly, it can provide a proper notion of uncertainty or “confidence” associated with those predictions. We also discuss the feasibility of the solutions in a Big Data context.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 279.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    Google Scholar 

  • Boukouvalas A, Barillec R, Cornford D (2012) Gaussian process quantile regression using expectation propagation. In: Proceedings of the 29th international conference on machine learning (ICML-12), pp 1695–1702

    Google Scholar 

  • Breusch TS, Pagan AR (1979) A simple test for heteroscedasticity and random coefficient variation. Econometrica 47(5):1287–1294

    Article  Google Scholar 

  • Chen C, Hu J, Meng T, Zhang Y (2011) Short-time traffic flow prediction with ARIMA-GARCH model. In: Intelligent vehicles symposium (IV), IEEE, pp 607–612

    Google Scholar 

  • Chipman JS (2011) International encyclopedia of statistical science. Springer, Berlin, pp 577–582

    Book  Google Scholar 

  • Cook RD, Weisberg S (1983) Diagnostics for heteroscedasticity in regression. Biometrika 70(1):1–10

    Article  Google Scholar 

  • Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4):987–1007

    Article  Google Scholar 

  • Fargas JA, Ben-Akiva ME, Pereira FC (2014) Prediction interval modeling using gaussian process quantile regression. Master’s Thesis, MIT, pp 1–65

    Google Scholar 

  • Fox CW, Roberts SJ (2012) A tutorial on variational Bayesian inference. Artif Intell Rev 38(2):85–95

    Article  Google Scholar 

  • Goldberg P, Williams C, Bishop C (1998) Regression with input-dependent noise: a Gaussian process treatment. Adv Neural Inf Process Syst 10:493–499

    Google Scholar 

  • Goldfeld SM, Quandt RE (1965) Some tests for homoscedasticity. J Am Stat Assoc 60:539–547

    Article  Google Scholar 

  • Gredilla LG, Titsias MK (2012) Variational heteroscedastic Gaussian process regression. In: 28th international conference on machine learning

    Google Scholar 

  • Hensman J, Fusi N, Lawrence ND (2013) Gaussian processes for big data. In: Proceedings of the 29th conference annual conference on uncertainty in artificial intelligence (UAI-13), pp 282–290

    Google Scholar 

  • Kersting K, Plagemann C, Pfaff P, Burgard W (2007) Most likely heteroscedas-tic Gaussian process regression. In: Proceedings of the International Machine Learning Society, pp 393–400

    Google Scholar 

  • Khosravi A, Mazloumi E, Nahavandi S, Creighton D, Van Lint JWC (2011) Prediction intervals to account for uncertainties in travel time prediction. IEEE Trans Intell Transp Syst 12(2):537–547

    Article  Google Scholar 

  • Koenker R, Hallock KF (2001) Quantile regression. J Econ Perspect 15(4):143–156

    Article  Google Scholar 

  • Lee YS, Scholtes S (2014) Empirical prediction intervals revisited. Int J Forecast 30(2):217–234

    Article  Google Scholar 

  • Leslie DS, Kohn R, Nott DJ (2007) A general approach to heteroscedastic linear regression. Stat Comput 17(2):131–146

    Article  Google Scholar 

  • Long JS, Ervin LH (1998) Correcting for heteroscedasticity with heteroscedasticity-consistent standard errors in the linear regression model: small sample considerations, Working Paper, Department of Statistics, Indiana University

    Google Scholar 

  • MacKinnon JG (2012) Thirty years of heteroskedasticity-robust inference, Working Papers, Queen’s University, Department of Economics

    Google Scholar 

  • MacKinnon JG, White H (1983) Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties, Working Papers, Queen’s University, Department of Economics

    Google Scholar 

  • Osborne J, Waters E (2002) Four assumptions of multiple regression that researchers should always test. Pract Assess Res Eval 8(2):1–9

    Google Scholar 

  • Pereira FC, Antoniou C, Fargas C, Ben-Akiva M (2014) A meta-model for estimating error bounds in real-traffic prediction systems. IEEE Trans Intell Trans Syst 15:1–13

    Article  Google Scholar 

  • Quinonero-Candela J, Rasmussen CE, Williams CKI (2007) Approximation methods for Gaussian process regression, Large-scale kernel machines, pp 203–223

    Google Scholar 

  • Rasmussen CE, Williams C (2006) Gaussian processes for machine learning. MIT Press, Cambridge, MA

    Google Scholar 

  • Robinson PM (1987) Asymptotically efficient estimation in the presence of heteroskedasticity of unknown form. Econometrica 55(4):875–891

    Article  Google Scholar 

  • Silverman BW (1985) Some aspect of the spline smoothing approach to non-parametric regression curve fitting. J R Stat Soc 47(1):1–52

    Google Scholar 

  • Snelson E, Ghahramani Z (2007) Local and global sparse Gaussian process approximations. In: International conference on artificial intelligence and statistics, pp 524–531

    Google Scholar 

  • Taylor JW, Bunn DW (1999) A quantile regression approach to generating prediction intervals. Manag Sci 45(2):225–237

    Article  Google Scholar 

  • Tsekeris T, Stathopoulos A (2006) Real-time traffic volatility forecasting in urban arterial networks. Transp Res Rec 1964:146–156

    Article  Google Scholar 

  • Tzikas DG, Likas AC, Galatsanos NP (2008) The variational approximation for Bayesian inference. IEEE Signal Process Mag 25(6):131–146

    Article  Google Scholar 

  • White H (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4):817–838

    Article  Google Scholar 

  • Zeileis A, Wien W (2004) Econometric computing with HC and HAC covariance matrix estimators. J Stat Softw 11(10):1–17

    Article  Google Scholar 

  • Zhou B, He D, Sun Z (2006) Traffic predictability based on ARIMA/GARCH model. In: 2nd conference on next generation internet design and engineering, pp 207–214

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco Antunes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Antunes, F., O’Sullivan, A., Rodrigues, F., Pereira, F. (2017). A Review of Heteroscedasticity Treatment with Gaussian Processes and Quantile Regression Meta-models. In: Thakuriah, P., Tilahun, N., Zellner, M. (eds) Seeing Cities Through Big Data. Springer Geography. Springer, Cham. https://doi.org/10.1007/978-3-319-40902-3_9

Download citation

Publish with us

Policies and ethics