Abstract
In this article, we apply Bayesian neural networks (BNNs) to time series analysis, and propose a Monte Carlo algorithm for BNN training. In addition, we go a step further in BNN model selection by putting a prior on network connections instead of hidden units as done by other authors. This allows us to treat the selection of hidden units and the selection of input variables uniformly. The BNN model is compared to a number of competitors, such as the Box-Jenkins model, bilinear model, threshold autoregressive model, and traditional neural network model, on a number of popular and challenging data sets. Numerical results show that the BNN model has achieved a consistent improvement over the competitors in forecasting future values. Insights on how to improve the generalization ability of BNNs are revealed in many respects of our implementation, such as the selection of input variables, the specification of prior distributions, and the treatment of outliers.
Similar content being viewed by others
References
Andrieu, C., Freitas, N.D., and Doucet, A. 2000. Reversible jump MCMC Simulated Annealing for Neural Networks. Uncertainty in Artificial Intelligence (UAI2000).
Andrieu, C., Freitas, N.D., and Doucet, A. 2001. Robust full Bayesian learning for radial basis networks. Neural Computation 13: 2359–2407.
Auestad, B. and Tj⊘stheim 1990. Identification of nonlinear time series: First order characterization and order determination. Biometrika 77: 669–687.
Barnett, G., Kohn, R., and Sheather, S.J. 1996. Robust estimation of an autoregressive model using Markov chain Monte Carlo. Journal of Econometrics 74: 237–254.
Bishop, C.M. 1995. Neural Networks for Pattern Recognition. Oxford University Press, Oxford.
Box, G.E.P. and Jenkins, G.M. 1970. Time Series Analysis, Forecast and Control, Holden Day, San Francisco.
Casella, G. and Berger, R.L. 2001. Statistical Inference, 2nd ed. Thomson Learning, Duxbury.
Chatfield, C. 2001. Time-Series Forecasting. Chapman and Hall, London.
Chen, C. and Liu, L.M. 1993. Forecasting time series with outliers. Journal of Forecasting 12: 13–35.
Cybenko, G. 1989. Approximations by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems. 2: 303–314.
Dension, D., Holmes, C., Mallick, B., and Smith, A.F.M. 2002. Bayesian Methods for Nonlinear Classification and Regression. Willey, New York.
Faraway, J. and Chatfield, C. 1998. Time series forecasting with neural networks: A comparative study using the airline data. Appl. Statist. 47: 231–250.
Fernńndez, G., Ley, E., and Steel, M.F.J. 2001. Benchmark priors for Bayesian model averaging. Journal of Econometrics 100: 381–427.
Freitas, N. and Andrieu, C. 2000. Sequential Monte Carlo for Model Selection and Estimation of Neural Networks. ICASSP2000.
Freitas, N., Andrieu, C., H⊘jen-S⊘rensen, P., Niranjan, M., and Gee, A. 2001. Sequential Monte Carlo methods for neural networks. In: Doucet A., de Freitas N., and Gordon N. (Eds.), Sequential Monte Carlo Methods in Practice, Springer-Verlag.
Funahashi, K. 1989. On the approximate realization of continuous mappings by neural networks. Neural Networks 2: 183–192.
Gabr, M.M. and Subba Rao, T. 1981. The estimation and prediction of subset bilinear time series models with applications. Journal of Time Series Analysis 2: 155–171.
Gelman, A., Roberts, R.O. and Gilks, W.R. 1996. Efficient Metropolis jumping rules. In Bernardo J.M.: Berger, J.O., Dawid, A.P., and Smith A.F.M. (Eds.), Bayesian Statistics 5. Oxford University Press, New York.
Gelamn, A. and Rubin, D.B. 1992. Inference from iterative simulation using multiple sequences (with discussion). Statist. Sci. 7: 457–472.
Gerlach, G., Carter, C.K., and Kohn, R. 1999. Diagnostics for time series analysis. Journal of Time Series Analysis.
Geyer, C.J. 1991. Markov chain Monte Carlo maximum likelihood. In: Keramigas E.M. (Ed.), Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface pp. 153–163. Interface Foundation, Fairfax Station.
Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization, & Machine Learning. Addison Wesley.
Green, P.J. 1995. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82: 711–732.
Härdle, W. and Vieu, P. 1992. Kernel regression smoothing of time series. Journal of Time Series 13: 209–232.
Hastings, W.K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57: 97–109.
Higdon, D., Lee, H., and Bi, Z. 2002. A Bayesian approach to characterizing uncertainty in inverse problems using coarse and fine-scale information. IEEE Transactions on Signal Processing 50: 389–399.
Hill, T., O’Connor, M., and Remus, W. 1996. Neural network models for time series forecasts. Management Science 42: 1082–1092.
Holland, J.H. 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor.
Holmes, C.C. and Denison, D. 2002. A Bayesian MARS classifer. Machine Learning, to appear.
Holmes, C.C. and Mallick, B.K. 1998. Bayesian radial basis functions of variable dimension. Neural Computation 10: 1217–1233.
Hornik, K., Stinchcombe, M., and White, H. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2: 359–366.
Hukushima, K. and Nemoto, K. 1996. Exchange Monte Carlo method and application to spin glass simulations. J. Phys. Soc. Jpn. 65 1604–1608.
Kang, S. 1991. An investigation of the use of feedforward neural networks for forecasting. Ph.D. Dissertation, Kent State University, Kent, Ohio.
Liang, F., Truong, Y.K., and Wong, W.H. 2001. Automatic Bayesian model averaging for linear regression and applications in Bayesian curve fitting. Statistica Sinica 11: 1005–1029.
Liang, F. and Wong, W.H. 2000. Evolutionary Monte Carlo: Applications to C p model sampling and change point problem. Statistica Sinica 10: 317–342.
Liang, F. and Wong, W.H. 2001. Real parameter evolutionary Monte Carlo with applications in Bayesian mixture models. J. Amer. Statist. Assoc. 96: 653–666.
Lim, K.S. 1987. A comparative study of various univariate time series models for Canadian lynx data. Journal of Time Series Analysis 8: 161–176.
MacKay, D.J.C. 1992. A practical Bayesian framework for backprop networks. Neural Computation 4: 448–472.
Mallows, C.L. 1973, Some comments on C p . Technometrics 15: 661–676.
Marinari, E. and Parisi, G. 1992. Simulated tempering: A new Monte Carlo scheme. Europhysics Letters 19: 451–458.
Marrs, A.D. 1998. An application of reversible-jump MCMC to multivariate spherical Gaussian mixtures. In Advances in Neural Information Processing Systems 10. Morgan Kaufmann. San Mateo, CA, pp. 577–583.
Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., and Teller, E. 1953. Equation of state calculations by fast computing machines. Journal of Chemical Physics 21: 1087–1091.
Müller, P. and Insua, D.R. 1998. Issues in Bayesian analysis of neural network models. Neural Computation 10: 749–770.
Neal, R.M. 1996. Bayesian Learning for Neural Networks. Springer-Verlag, New York.
Nicholls, D.F. and Quinn, B.G. 1982. Random Coefficient Autoregressive Models: An Introduction. Springer-Verlag, New York.
Park, Y.R., Murray, T.J., and Chen, C. 1996. Predicting sunspots using a layered perceptron neural network. IEEE Trans. Neural Networks 7: 501–505.
Penny, W.D. and Roberts, S.J. 1999. Bayesian neural networks for classification: How useful is the evidence framework? Neural Networks, 12: 877–892.
Penny, W.D. and Roberts, S.J. 2000. Bayesian methods for autoregressive models. In: Proceedings of Neural Networks for Signal Processing, Sydney, Dec. 2000.
Raftery, A.E., Madigan, D., and Hoeting, J.A. 1997. Bayesian model averaging for linear regression models. J. Amer. Statist. Assoc. 92: 179–191.
Rumelhart, D., Hinton, G., and Williams, J. 1986. Learning internal representations by error propagation. In: Rumelhart D. and McClelland J. (Eds.), Parallel Distributed Processing. MIT Press, Cambridge, pp. 318–362.
Rumelhart, D. and McClelland, J. 1986. Parallel Distributed Processing. MIT Press, Cambridge.
Smith, A.F.M. and Roberts, G.O. 1993. Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion). J. Royal Statist. Soc. B, 55: 3–23.
Subba Rao, T. and Gabr, M.M. 1984. An Introduction to Bispectral Analysis and Bilinear Time Series Models. Springer, New York.
Tong, H. 1990. Non-Linear Time Series: A Dynamical System Approach. Oxford University Press, Oxford.
Tong, H. and Lim, K.S. 1980. Threshold autoregression, limit cycles and cyclical data (with discussion). J. R. Statist. Soc. B 42: 245–292.
Waldmeirer, M. 1961. The Sunspot Activity in the Years 1610–1960. Schultheses.
Weigend, A.S., Huberman, B.A., and Rumelhart, D.E. 1990. Predicting the future: A connectionist approach. Int. J. Neural Syst. 1: 193–209.
Weigend, A.S., Rumelhart, D.E., and Huberman, B.A. 1991. Generalization by weight-elimination with application to forecasting. In: Advance in Neural Information Processing Systems 3, Morgan Kaufmann, San Mateo, CA., pp. 875–882.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liang, F. Bayesian neural networks for nonlinear time series forecasting. Stat Comput 15, 13–29 (2005). https://doi.org/10.1007/s11222-005-4786-8
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s11222-005-4786-8