Skip to main content

Probabilistic and Bayesian Networks

  • Chapter
  • First Online:
Neural Networks and Statistical Learning

Abstract

This chapter introduces several important probabilistic models. Bayesian network is a well-known probabilistic model in machine learning. Hidden Markov model is a special case of Bayesian network model for dynamic systems. Important probabilistic methods, including sampling methods, expectation–maximization method, variational Bayesian method, and mixture method, are introduced. Some Bayesian and probabilistic approaches to machine learning are also mentioned in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbeel, P., Koller, D., & Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research, 7, 1743–1788.

    MathSciNet  MATH  Google Scholar 

  2. Ahn, J.-H., Oh, J.-H., & Choi, S. (2007). Learning principal directions: Integrated-squared-error minimization. Neurocomputing, 70, 1372–1381.

    Article  Google Scholar 

  3. Andrieu, C., de Freitas, N., & Doucet, A. (2001). Robust full Bayesian learning for radial basis networks. Neural Computation, 13, 2359–2407.

    Article  MATH  Google Scholar 

  4. Andrieu, C., Doucet, A., & Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B, 72(3), 269–342.

    Article  MathSciNet  MATH  Google Scholar 

  5. Archambeau, C., & Verleysen, M. (2007). Robust Bayesian clustering. Neural Networks, 20, 129–138.

    Article  MATH  Google Scholar 

  6. Attias, H. (1999). Inferring parameters and structure of latent variable models by variational Bayes. In Proceedings of the 15th Annual Conference on Uncertainty in AI (pp. 21–30).

    Google Scholar 

  7. Attias, H. (1999). Independent factor analysis. Neural Computation, 11, 803–851.

    Article  Google Scholar 

  8. Audhkhasi, K., Osoba, O., & Kosko, B. (2013). Noise benefits in backpropagation and deep bidirectional pre-training. In Proccedings of International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Dallas, TX.

    Google Scholar 

  9. Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6, 1345–1382.

    MathSciNet  MATH  Google Scholar 

  10. Bauer, E., Koller, D., & Singer, Y. (1997). Update rules for parameter estimation in Bayesian networks. In Proceedings of Annual Conference on Uncertainty in AI (pp. 3–13).

    Google Scholar 

  11. Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41(1), 164–171.

    Article  MathSciNet  MATH  Google Scholar 

  12. Beinlich, I. A., Suermondt, H. J., Chavez, R. M., & Cooper, G. F. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for Bayesian networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine (pp. 247–256).

    Google Scholar 

  13. Benavent, A. P., Ruiz, F. E., & Saez, J. M. (2009). Learning Gaussian mixture models with entropy-based criteria. IEEE Transactions on Neural Networks, 20(11), 1756–1771.

    Article  Google Scholar 

  14. Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29, 213–244.

    Article  MATH  Google Scholar 

  15. Bouchaert, R. R. (1994). Probabilistic network construction using the minimum description length principle. Technical Report UU-CS-1994-27, Utrecht University, Department of Computer Science, The Netherlands.

    Google Scholar 

  16. Bouguila, N., & Ziou, D. (2007). High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10), 1716–1731.

    Article  Google Scholar 

  17. Boutemedjet, S., Bouguila, N., & Ziou, D. (2009). A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1429–1443.

    Article  Google Scholar 

  18. Bradley, P. S., Fayyad, U. M., & Reina, C. A. (1998). Scaling EM (expectation-maximization) clustering to large databases. MSR-TR-98-35, Microsoft Research.

    Google Scholar 

  19. Breese, J. S., & Heckerman, D. (1996). Decision-theoretic troubleshooting: a framework for repair and experiment. In Proceedings of the 12th Conference on Uncertainty in AI (pp. 124–132). Portland, OR.

    Google Scholar 

  20. Bromberg, F., Margaritis, D., & Honavar, V. (2006). Effcient Markov network structure discovery using independence tests. In Proceedings of the 6th SIAM International Conference on Data Mining (pp. 141–152).

    Google Scholar 

  21. Buntine, W. (1991). Theory refinement of Bayesian networks. In B. D. D’Ambrosio, P. Smets, & P. P. Bonisone (Eds.), Proceedings of the 7th Conference on Uncertainty in AI (pp. 52–60). Burlington: Morgan Kaufmann.

    Google Scholar 

  22. Celeux, G., & Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis, 14(3), 315–332.

    Article  MathSciNet  MATH  Google Scholar 

  23. Cemgil, A. T. (2009). Bayesian Inference for Nonnegative Matrix Factorisation Models. Computational Intelligence and Neuroscience, 2009, 785152, 17 p.

    Google Scholar 

  24. Centeno, T. P., & Lawrence, N. D. (2006). Optimising kernel parameters and regularisation coefficients for non-linear discriminant analysis. Journal of Machine Learning Research, 7, 455–491.

    MathSciNet  MATH  Google Scholar 

  25. Chang, R., & Hancock, J. (1966). On receiver structures for channels having memory. IEEE Transactions on Information Theory, 12(4), 463–468.

    Article  Google Scholar 

  26. Chatzis, S. P., & Demiris, Y. (2011). Echo state Gaussian process. IEEE Transactions on Neural Networks, 22(9), 1435–1445.

    Article  Google Scholar 

  27. Chatzis, S. P., & Kosmopoulos, D. I. (2011). A variational Bayesian methodology for hidden Markov models utilizing Student’s-t mixtures. Pattern Recognition, 44(2), 295–306.

    Article  MATH  Google Scholar 

  28. Chen, H., Tino, P., & Yao, X. (2009). Probabilistic classification vector machines. IEEE Transactions on Neural Networks, 20(6), 901–914.

    Article  Google Scholar 

  29. Chen, X.-W., Anantha, G., & Lin, X. (2008). Improving Bayesian network structure learning with mutual information-based node ordering in the K2 algorithm. IEEE Transactions on Knowledge and Data Engineering, 20(5), 1–13.

    Article  Google Scholar 

  30. Chen, Y., Bornn, L., de Freitas, N., Eskelin, M., Fang, J., & Welling, M. (2016). Herded Gibbs sampling. Journal of Machine Learning Research, 17, 1–29.

    MathSciNet  MATH  Google Scholar 

  31. Chen, Y.-C., Wheeler, T. A., & Kochenderfer, J. (2017). Learning discrete Bayesian networks from continuous data. Journal of Artificial Intelligence Research, 59, 103–132.

    Article  MathSciNet  MATH  Google Scholar 

  32. Cheng, J., Greiner, R., Kelly, J., Bell, D., & Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137(1), 43–90.

    Article  MathSciNet  MATH  Google Scholar 

  33. Cheng, S.-S., Fu, H.-C., & Wang, H.-M. (2009). Model-based clustering by probabilistic self-organizing maps. IEEE Transactions on Neural Networks, 20(5), 805–826.

    Article  Google Scholar 

  34. Cheung, Y. M. (2005). Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection. IEEE Transactions on Knowledge and Data Engineering, 17(6), 750–761.

    Article  Google Scholar 

  35. Chickering, D. M. (1996). Learning Bayesian networks is NP-complete. In D. Fisher & H. Lenz (Eds.), Learning from Data: Artificial Intelligence and Statistics (Vol. 5, pp. 121–130). Berlin: Springer.

    Chapter  Google Scholar 

  36. Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3, 507–554.

    MathSciNet  MATH  Google Scholar 

  37. Chickering, D. M., Heckerman, D., & Meek, C. (2004). Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5, 1287–1330.

    MathSciNet  MATH  Google Scholar 

  38. Chien, J.-T., & Hsieh, H.-L. (2013). Nonstationary source separation using sequential and variational Bayesian learning. IEEE Transactions on Neural Networks and Learning Systems, 24(5), 681–694.

    Article  MathSciNet  Google Scholar 

  39. Choudrey, R. A., & Roberts, S. J. (2003). Variational mixture of Bayesian independent component analyzers. Neural Computation, 15, 213–252.

    Article  MATH  Google Scholar 

  40. Chu, W., & Ghahramani, Z. (2009). Probabilistic models for incomplete multi-dimensional arrays. In Proceedings of the 12nd International Conference on Artificial Intelligence and Statistics (AISTATS) (pp. 89–96). Clearwater Beach, FL.

    Google Scholar 

  41. Chu, W., Keerthi, S. S., & Ong, C. J. (2003). Bayesian trigonometric support vector classifier. Neural Computation, 15, 2227–2254.

    Article  MATH  Google Scholar 

  42. Cohen, I., Bronstein, A., & Cozman, F. G. (2001). Adaptive online learning of Bayesian network parameters. HP Laboratories Palo Alto, HPL-2001-156.

    Google Scholar 

  43. Cohn, I., El-Hay, T., Friedman, N., & Kupferman, R. (2010). Mean field variational approximation for continuous-time Bayesian networks. Journal of Machine Learning Research, 11, 2745–2783.

    MathSciNet  MATH  Google Scholar 

  44. Constantinopoulos, C., Titsias, M. K., & Likas, A. (2006). Bayesian feature and model selection for Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 1013–1018.

    Article  Google Scholar 

  45. Cooper, G. F. (1990). The computational complexity of probabilistic inference using Bayesian Inference. Artificial Intelligence, 42, 393–405.

    Article  MathSciNet  MATH  Google Scholar 

  46. Cooper, G. F., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.

    MATH  Google Scholar 

  47. Cowell, R. (2001). Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In J. Breese, & D. Koller (Eds.), Proceedings of the 17th Conference on Uncertainty in AI (pp. 91–97). Burlington: Morgan Kaufmann.

    Google Scholar 

  48. Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60(1), 141–154.

    Article  MathSciNet  MATH  Google Scholar 

  49. Dagum, P., & Luby, M. (1997). An optimal approximation algorithm for Bayesian inference. Artificial Intelligence, 93, 1–27.

    Article  MathSciNet  MATH  Google Scholar 

  50. Darwiche, A. (2001). Constant-space reasoning in dynamic Bayesian networks. International Journal of Approximate Reasoning, 26(3), 161–178.

    Article  MathSciNet  MATH  Google Scholar 

  51. Dauwels, J., Korl, S., & Loeliger, H.-A. (2005). Expectation maximization as message passing. In Proceedings of IEEE International Symposium on Information Theory (1–4). Adelaide, Australia.

    Google Scholar 

  52. Dawid, A. P. (1992). Applications of a general propagation algorithm for probalilistic expert systems. Statistics and Computing, 2, 25–36.

    Article  Google Scholar 

  53. de Campos, L. M., & Castellano, J. G. (2007). Bayesian network learning algorithms using structural restrictions. International Journal of Approximate Reasoning, 45, 233–254.

    Article  MathSciNet  MATH  Google Scholar 

  54. de Campos, C. P., & Ji, Q. (2011). Efficient structure learning of Bayesian networks using constraints. Journal of Machine Learning Research, 12, 663–689.

    MathSciNet  MATH  Google Scholar 

  55. Del Moral, P., Doucet, A., & Jasra, A. (2006). Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B, 68(3), 411–436.

    Article  MathSciNet  MATH  Google Scholar 

  56. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.

    MathSciNet  MATH  Google Scholar 

  57. Deneve, S. (2008). Bayesian spiking neurons. I: Inference. Neural Computation, 20(1), 91–117.

    Article  MathSciNet  MATH  Google Scholar 

  58. Du, K.-L., & Swamy, M. N. S. (2010). Wireless communication systems. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  59. El-Hay, T., Friedman, N., & Kupferman, R. (2008). Gibbs sampling in factorized continuous-time Markov processes. In Proceedings of the 24th Conference on Uncertainty in AI.

    Google Scholar 

  60. Elidan, G., & Friedman, N. (2005). Learning hidden variable networks: The information bottleneck approach. Journal of Machine Learning Research, 6, 81–127.

    MathSciNet  MATH  Google Scholar 

  61. Engel, A., & Van den Broeck, C. (2001). Statistical mechanics of learning. Cambridge: Cambridge University Press.

    Book  MATH  Google Scholar 

  62. Ephraim, Y., & Merhav, N. (2002). Hidden markov processes. IEEE Transactions on Information Theory, 48(6), 1518–1569.

    Article  MathSciNet  MATH  Google Scholar 

  63. Fan, Y., Xu, J., & Shelton, C. R. (2010). Importance sampling for continuous time Bayesian networks. Journal of Machine Learning Research, 11, 2115–2140.

    MathSciNet  MATH  Google Scholar 

  64. Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 305(5814), 972–976.

    Article  MathSciNet  MATH  Google Scholar 

  65. Friedman, N. (1997). Learning Bayesian networks in the presence of missing values and hidden variables. In D. Fisher (Ed.), Proceedings of the 14th Conference on Uncertainty in AI (pp. 125–133). San Francisco: Morgan Kaufmann.

    Google Scholar 

  66. Gandhi, P., Bromberg, F., & Margaritis, D. (2008). Learning Markov network structure using few independence tests. In Proceedings of SIAM International Conference on Data Mining (pp. 680–691).

    Google Scholar 

  67. Gao, B., Woo, W. L., & Dlay, S. S. (2012). Variational regularized 2-D nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 703–716.

    Article  Google Scholar 

  68. Gaussier, E., & Goutte, C. (2005). Relation between PLSA and NMF and implications. In Proceedings of Annual ACMSIGIR Conference on Research and Development in Information Retrieval (pp. 601–602).

    Google Scholar 

  69. Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409.

    Article  MathSciNet  MATH  Google Scholar 

  70. Gelly, S., & Teytaud, O. (2005). Bayesian networks: A better than frequentist approach for parametrization, and a more accurate structural complexity measure than the number of parameters. In Proceedings of CAP. Nice, France.

    Google Scholar 

  71. Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741.

    Article  MATH  Google Scholar 

  72. Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2002). Learning probabilistic models of link structure. Journal of Machine Learning Research, 3, 679–707.

    MathSciNet  MATH  Google Scholar 

  73. Ghahramani, Z., & Beal, M. (1999). Variational inference for Bayesian mixture of factor analysers. Advances in neural information processing systems (Vol. 12). Cambridge: MIT Press.

    Google Scholar 

  74. Globerson, A., & Jaakkola, T. (2007). Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In Advances in neural information processing systems (Vol. 20, pp. 553–560). Vancouver, Canada.

    Google Scholar 

  75. Gonen, M., Tanugur, A. G., & Alpaydin, E. (2008). Multiclass posterior probability support vector machines. IEEE Transactions on Neural Networks, 19(1), 130–139.

    Article  Google Scholar 

  76. Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732.

    Article  MathSciNet  MATH  Google Scholar 

  77. Handschin, J. E., & Mayne, D. Q. (1969). Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering. International Journal of Control, 9(5), 547–559.

    Article  MathSciNet  MATH  Google Scholar 

  78. Hammersely, J. M., & Morton, K. W. (1954). Poor man’s Monte Carlo. Journal of the Royal Statistical Society: Series B, 16, 23–38.

    MathSciNet  MATH  Google Scholar 

  79. Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97–109.

    Article  MathSciNet  MATH  Google Scholar 

  80. Hazan, T., & Shashua, A. (2010). Norm-product belief propagation: Primal-dual message-passing for approximate inference. IEEE Transactions on Information Theory, 56(12), 6294–6316.

    Article  MathSciNet  MATH  Google Scholar 

  81. Heckerman, D. (1995). A Tutorial on learning with Bayesian networks. Microsoft Technical Report MSR-TR-95-06 (Revised Nov 1996).

    Google Scholar 

  82. Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3), 197–243.

    MATH  Google Scholar 

  83. Hennig, P., & Kiefel, M. (2013). Quasi-Newton methods: A new direction. Journal of Machine Learning Research, 14, 843–865.

    MathSciNet  MATH  Google Scholar 

  84. Heskes, T. (2004). On the uniqueness of loopy belief propagation fixed points. Neural Computation, 16, 2379–2413.

    Article  MATH  Google Scholar 

  85. Hojen-Sorensen, P. A., & d. F. R., Winther, O., & Hansen, L. K., (2002). Mean-field approaches to independent component analysis. Neural Computation, 14, 889–918.

    Google Scholar 

  86. Holmes, C. C., & Mallick, B. K. (1998). Bayesian radial basis functions of variable dimension. Neural Computation, 10(5), 1217–1233.

    Article  Google Scholar 

  87. Huang, Q., Yang, J., & Zhou, Y. (2008). Bayesian nonstationary source separation. Neurocomputing, 71, 1714–1729.

    Article  Google Scholar 

  88. Huang, J. C., & Frey, B. J. (2011). Cumulative distribution networks and the derivative-sum-product algorithm: Models and inference for cumulative distribution functions on graphs. Journal of Machine Learning Research, 12, 301–348.

    MathSciNet  MATH  Google Scholar 

  89. Huang, S., Li, J., Ye, J., Fleisher, A., Chen, K., Wu, T., et al. (2013). A sparse structure learning algorithm for Gaussian Bayesian network identification from high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1328–1342.

    Article  Google Scholar 

  90. Huda, S., Yearwood, J., & Togneri, R. (2009). A stochastic version of expectation maximization algorithm for better estimation of hidden Markov model. Pattern Recognition Letters, 30, 1301–1309.

    Article  Google Scholar 

  91. Ickstadt, K., Bornkamp, B., Grzegorczyk, M., Wieczorek, J., Sheriff, M. R., Grecco, H. E., et al. (2010). Nonparametric Bayesian network. Bayesian. Statistics, 9, 283–316.

    Google Scholar 

  92. Ihler, A. T., Fisher, J. W, I. I. I., & Willsky, A. S. (2005). Loopy belief propagation: Convergence and effects of message errors. Journal of Machine Learning Research, 6, 905–936.

    MathSciNet  MATH  Google Scholar 

  93. Jensen, F. V., Lauritzen, S. L., & Olesen, K. G. (1990). Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quaterly, 4, 269–282.

    MathSciNet  MATH  Google Scholar 

  94. Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183–233.

    Article  MATH  Google Scholar 

  95. Kalisch, M., & Buhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8, 613–636.

    MATH  Google Scholar 

  96. Khan, S. A., Leppaaho, E., & Kaski, S. (2016). Bayesian multi-tensor factorization. Machine Learning, 105(2), 233–253.

    Article  MathSciNet  MATH  Google Scholar 

  97. Khreich, W., Granger, E., Miri, A., & Sabourin, R. (2010). On the memory complexity of the forward-backward algorithm. Pattern Recognition Letters, 31, 91–99.

    Article  MATH  Google Scholar 

  98. Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations (pp. 1–14). Banff, Canada.

    Google Scholar 

  99. Kjaerulff, U. (1995). dHugin: A computational system for dynamic time-sliced Bayesian networks. International Journal of Forecasting, 11(1), 89–113.

    Article  Google Scholar 

  100. Koivisto, M., & Sood, K. (2004). Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5, 549–573.

    MathSciNet  MATH  Google Scholar 

  101. Kolmogorov, V. (2015). A new look at reweighted message passing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(5), 919–930.

    Article  Google Scholar 

  102. Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. Transactions on Information Theory, 47(2), 498–519.

    Article  MathSciNet  MATH  Google Scholar 

  103. Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10(3), 269–293.

    Article  Google Scholar 

  104. Lam, W., & Segre, A. M. (2002). A distributed learning algorithm for Bayesian inference networks. IEEE Transactions on Knowledge and Data Engineering, 14(1), 93–105.

    Article  Google Scholar 

  105. Langari, R., Wang, L., & Yen, J. (1997). Radial basis function networks, regression weights, and the expectation-maximization algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 27(5), 613–623.

    Article  Google Scholar 

  106. Lappalainen, H., & Honkela, A. (2000). Bayesian nonlinear independent component analysis by multilayer perceptron. In M. Girolami (Ed.), Advances in independent component analysis (pp. 93–121). Berlin: Springer.

    Chapter  Google Scholar 

  107. Lauritzen, S. L. (1992). Propagation of probabilities, means and variances in mixed graphical association models. Journal of the American Statistical Association, 87(420), 1098–1108.

    Article  MathSciNet  MATH  Google Scholar 

  108. Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application on expert systems. Journal of the Royal Statistical Society, Series B, 50(2), 157–224.

    MathSciNet  MATH  Google Scholar 

  109. Lazaro, M., Santamaria, I., & Pantaleon, C. (2003). A new EM-based training algorithm for RBF networks. Neural Networks, 16, 69–77.

    Article  Google Scholar 

  110. Lawrence, N. (2005). Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783–1816.

    MathSciNet  MATH  Google Scholar 

  111. Lawrence, N., Seeger, M., & Herbrich, R. (2003). Fast sparse Gaussian process methods: The informative vector machine. In Advances in Neural Information Processing Systems (Vol. 15, pp. 609–616).

    Google Scholar 

  112. Li, J., & Tao, D. (2013). Exponential family factors for Bayesian factor analysis. IEEE Transactions on Neural Networks and Learning Systems, 24(6), 964–976.

    Article  Google Scholar 

  113. Liang, F. (2007). Annealing stochastic approximation Monte Carlo algorithm for neural network training. Machine Learning, 68, 201–233.

    Article  Google Scholar 

  114. Liang, F., Liu, C., & Carroll, R. J. (2007). Stochastic approximation in Monte Carlo computation. Journal of the American Statistical Association, 102, 305–320.

    Article  MathSciNet  MATH  Google Scholar 

  115. Lindsten, F., Jordan, M. I., & Schon, T. B. (2014). Particle Gibbs with ancestor sampling. Journal of Machine Learning Research, 15, 2145–2184.

    MathSciNet  MATH  Google Scholar 

  116. Lopez-Rubio, E. (2009). Multivariate Student-t self-organizing maps. Neural Networks, 22, 1432–1447.

    Article  MATH  Google Scholar 

  117. Lu, X., Wang, Y., & Yuan, Y. (2013). Sparse coding from a Bayesian perspective. IEEE Transactions on Neural Networks and Learning Systems, 24(6), 929–939.

    Article  Google Scholar 

  118. Luis, R., Sucar, L. E., & Morales, E. F. (2010). Inductive transfer for learning Bayesian networks. Machine Learning, 79, 227–255.

    Article  MathSciNet  Google Scholar 

  119. Ma, S., Ji, C., & Farmer, J. (1997). An efficient EM-based training algorithm for feedforward neural networks. Neural Networks, 10, 243–256.

    Article  Google Scholar 

  120. Ma, J., Xu, L., & Jordan, M. I. (2000). Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Computation, 12, 2881–2907.

    Article  Google Scholar 

  121. Mackay, D. J. C. (1992). A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3), 448–472.

    Article  Google Scholar 

  122. Margaritis, D., & Thrun, S. (2000). Bayesian network induction via local neighborhoods. In S. A. Solla, T. K. Leen, & K.-R. Muller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 505–511). Cambridge: MIT Press.

    Google Scholar 

  123. Martins, A. F. T., Figueiredo, M. A. T., Aguiar, P. M. Q., Smith, N. A., & Xing, E. P. (2015). AD\(^3\): Alternating directions dual decomposition for MAP inference in graphical models. Journal of Machine Learning Research, 16, 495–545.

    MathSciNet  MATH  Google Scholar 

  124. Mateescu, R., & Dechter, R. (2009). Mixed deterministic and probabilistic networks: A survey of recent results. Annals of Mathematics and Artificial Intelligence, 54, 3–51.

    Article  MATH  Google Scholar 

  125. Meilijson, I. (1989). A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society: Series B, 51(1), 127–138.

    MathSciNet  MATH  Google Scholar 

  126. Minka, T. (2001). Expectation propagation for approximate Bayesian inference. Doctoral Dissertation, MIT Media Lab.

    Google Scholar 

  127. Miskin, J. W., & MacKay, D. J. C. (2001). Ensemble learning for blind source separation. In S. Roberts & R. Everson (Eds.), Independent component analysis: Principles and practice (pp. 209–233). Cambridge: Cambridge University Press.

    Chapter  Google Scholar 

  128. Mnih, A., & Salakhutdinov, R. R. (2007). Probabilistic matrix factorization. In Advances in neural information processing systems (Vol. 20, pp. 1257–1264). Red Hook: Curran & Associates Inc.

    Google Scholar 

  129. Mongillo, G., & Deneve, S. (2008). Online learning with hidden Markov models. Neural Computation, 20, 1706–1716.

    Article  MathSciNet  MATH  Google Scholar 

  130. Moral, S., Rumi, R., & Salmeron, A. (2001). Mixtures of truncated exponentials in hybrid Bayesian networks. In S. Benferhat, & P. Besnard (Eds.), Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, LNCS 2143 (pp. 156–167). Berlin: Springer.

    Google Scholar 

  131. Nasios, N., & Bors, A. (2006). Variational learning for Gaussian mixtures. IEEE Transactions on Systems Man and Cybernetics, Part B, 36(4), 849–862.

    Article  Google Scholar 

  132. Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, Toronto.

    Google Scholar 

  133. Ngo, L., & Haddawy, P. (1995). Probabilistic logic programming and bayesian networks. In Algorithms, Concurrency and Knowledge (Proceedings of Asian Computing Science Conference), LNCS (Vol. 1023, pp. 286–300). Berlin: Springer.

    Google Scholar 

  134. Nielsen, S. H., & Nielsen, T. D. (2008). Adapting Bayes network structures to non-stationary domains. International Journal of Approximate Reasoning, 49, 379–397.

    Article  MathSciNet  MATH  Google Scholar 

  135. Nodelman, U., Shelton, C. R., & Koller, D. (2002). Continuous time Bayesian networks. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 378–387).

    Google Scholar 

  136. Noorshams, N., & Wainwright, M. J. (2013). Stochastic belief propagation: A low-complexity alternative to the sum-product algorithm. IEEE Transactions on Information Theory, 59(4), 1981–2000.

    Article  MathSciNet  MATH  Google Scholar 

  137. Norton, M., Mafusalov, A., & Uryasev, S. (2017). Soft margin support vector classification as buffered probability minimization. Journal of Machine Learning Research, 18, 1–43.

    MathSciNet  MATH  Google Scholar 

  138. Opper, M. (1998). A Bayesian approach to online learning. In D. Saad (Ed.), On-line learning in neural networks (pp. 363–378). Cambridge: Cambridge University Press.

    MATH  Google Scholar 

  139. Opper, M., & Winther, O. (2000). Gaussian processes for classification: Mean field algorithms. Neural Computation, 12, 2655–2684.

    Article  Google Scholar 

  140. Opper, M., & Winther, O. (2001). Tractable approximations for probabilistic models: The adaptive Thouless-Anderson-Palmer mean field approach. Physical Review Letters, 86, 3695–3699.

    Article  Google Scholar 

  141. Opper, M., & Winther, O. (2005). Expectation consistent approximate inference. Journal of Machine Learning Research, 6, 2177–2204.

    MathSciNet  MATH  Google Scholar 

  142. Osoba, O., & Kosko, B. (2016). The noisy expectation-maximization algorithm for multiplicative noise injection. Fluctuation and Noise Letters, 15(1), paper ID 1650007.

    Google Scholar 

  143. Osoba, O., Mitaim, S., & Kosko, B. (2011). Noise benefits in the expectation-maximization algorithm: NEM theorems and models. In Proccedings of the International Joint Conference on Neural Networks (IJCNN) (pp. 3178–3183). San Jose, CA.

    Google Scholar 

  144. Ott, G. (1967). Compact encoding of stationary markov sources. IEEE Transactions on Information Theory, 13(1), 82–86.

    Article  MATH  Google Scholar 

  145. Park, H., & Ozeki, T. (2009). Singularity and slow convergence of the EM algorithm for Gaussian mixtures. Neural Processing Letters, 29, 45–59.

    Article  Google Scholar 

  146. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo: Morgan Kaufmann.

    MATH  Google Scholar 

  147. Perez, A., Larranaga, P., & Inza, I. (2009). Bayesian classifiers based on kernel density estimation: Flexible classifiers. International Journal of Approximate Reasoning, 50, 341–362.

    Article  MATH  Google Scholar 

  148. Pietra, S. D., Pietra, V. D., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 380–393.

    Article  Google Scholar 

  149. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.

    Article  Google Scholar 

  150. Raviv, J. (1967). Decision making in markov chains applied to the problem of pattern recognition. IEEE Transactions on Information Theory, 13(4), 536–551.

    Article  MathSciNet  Google Scholar 

  151. Richardson, S., & Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: Series B, 59(4), 731–792.

    Article  MathSciNet  MATH  Google Scholar 

  152. Robert, C. P., & Casella, G. (2004). Monte Carlo Statistical Methods. New York: Springer.

    Book  MATH  Google Scholar 

  153. Romero, V., Rumi, R., & Salmeron, A. (2006). Learning hybrid Bayesian networks using mixtures of truncated exponentials. International Journal of Approximate Reasoning, 42, 54–68.

    Article  MathSciNet  MATH  Google Scholar 

  154. Roos, T., Grunwald, P., & Myllymaki, P. (2005). On discriminative Bayesian network classifiers and logistic regression. Machine Learning, 59, 267–296.

    MATH  Google Scholar 

  155. Rosipal, R., & Girolami, M. (2001). An expectation-maximization approach to nonlinear component analysis. Neural Computation, 13, 505–510.

    Article  MATH  Google Scholar 

  156. Roweis, S. (1998). EM algorithms for PCA and SPCA. Advances in neural information processing systems (Vol. 10, pp. 626–632). Cambridge: MIT Press.

    Google Scholar 

  157. Rusakov, D., & Geiger, D. (2005). Asymptotic model selection for naive Bayesian networks. Journal of Machine Learning Research, 6, 1–35.

    MathSciNet  MATH  Google Scholar 

  158. Sarela, J., & Valpola, H. (2005). Denoising source separation. Journal of Machine Learning Research, 6, 233–272.

    MathSciNet  MATH  Google Scholar 

  159. Sato, M. (2001). Online model selection based on the variational Bayes. Neural Computation, 13, 1649–1681.

    Article  MATH  Google Scholar 

  160. Scutari, M. (2010). Learning Bayesian networks with the bnlearn R package. Journal of Statistical Software, 35(3), 1–22.

    Article  MathSciNet  Google Scholar 

  161. Seeger, M. W. (2008). Bayesian inference and optimal design for the sparse linear model. Journal of Machine Learning Research, 9, 759–813.

    MathSciNet  MATH  Google Scholar 

  162. Shashanka, M., Raj, B., & Smaragdis, P. (2008). Probabilistic latent variable models as nonnegative factorizations. Computational Intelligence and Neuroscience, 2008, 947438, 9 p.

    Google Scholar 

  163. Shelton, C. R., Fan, Y., Lam, W., Lee, J., & Xu, J. (2010). Continuous time Bayesian network reasoning and learning engine. Journal of Machine Learning Research, 11, 1137–1140.

    MATH  Google Scholar 

  164. Shutin, D., Zechner, C., Kulkarni, S. R., & Poor, H. V. (2012). Regularized variational Bayesian learning of echo state networks with delay & sum readout. Neural Computation, 24, 967–995.

    Article  MathSciNet  MATH  Google Scholar 

  165. Silander, T., & Myllymaki, P. (2006). A simple approach for finding the globally optimal Bayesian network structure. In Proceedings of the 22th Conference on Uncertainty in AI (pp. 445–452).

    Google Scholar 

  166. Silander, T., Kontkanen, P., & Myllymaki, P. (2007). On sensitivity of the MAP Bayesian network structure to the equivalent sample size parameter. In R. Parr, & L. van der Gaag (Eds.), Proceedings of the 23rd Conference on Uncertainty in AI (pp. 360–367). AUAI Press.

    Google Scholar 

  167. Silander, T., Roos, T., & Myllymaki, P. (2009). Locally minimax optimal predictive modeling with Bayesian networks. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR Proceedings Track (Vol. 5, 504–511). Clearwater Beach, FL.

    Google Scholar 

  168. Smyth, P., Hecherman, D., & Jordan, M. I. (1997). Probabilistic independent networks for hidden Markov probabilities models. Neural Computation, 9(2), 227–269.

    Article  Google Scholar 

  169. Spiegelhalter, D. J., & Lauritzen, S. L. (1990). Sequential updating of conditional probabilities on directed graphical structures. Networks, 20(5), 579–605.

    Article  MathSciNet  MATH  Google Scholar 

  170. Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search (2nd ed.). Cambridge: MIT Press.

    MATH  Google Scholar 

  171. Takeda, A., & M. Sugiyama, (2008). \(\nu \)-support vector machine as conditional value-at-risk minimization. In Proceedings of the ACM 25th international conference on machine learning (pp. 1056–1063).

    Google Scholar 

  172. Takekawa, T., & Fukai, T. (2009). A novel view of the variational Bayesian clustering. Neurocomputing, 72, 3366–3369.

    Article  Google Scholar 

  173. Tamada, Y., Imoto, S., & Miyano, S. (2011). Parallel algorithm for learning optimal Bayesian network structure. Journal of Machine Learning Research, 12, 2437–2459.

    MathSciNet  MATH  Google Scholar 

  174. Tan, X., & Li, J. (2010). Computationally efficient sparse Bayesian learning via belief propagation. IEEE Transactions on Signal Processing, 58(4), 2010–2021.

    Article  MathSciNet  MATH  Google Scholar 

  175. Tanner, M., & Wong, W. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398), 528–540.

    Article  MathSciNet  MATH  Google Scholar 

  176. Tao, Q., Wu, G., Wang, F., & Wang, J. (2005). Posterior probability support vector machines for unbalanced data. IEEE Transactions on Neural Networks, 16(6), 1561–1573.

    Article  Google Scholar 

  177. Tatikonda, S., & Jordan, M. (2002). Loopy belief propagation and Gibbs measures. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (pp. 493–500). San Francisco, CA: Morgan Kaufmann.

    Google Scholar 

  178. Ting, J.-A., D’Souza, A., Vijayakumar, S., & Schaal, S. (2010). Efficient learning and feature selection in high-dimensional regression. Neural Computation, 22, 831–886.

    Article  MathSciNet  MATH  Google Scholar 

  179. Tipping, M. E. (2000). The relevance vector machine. In Advances in neural information processing systems (Vol. 12, pp. 652–658).

    Google Scholar 

  180. Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.

    MathSciNet  MATH  Google Scholar 

  181. Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B, 61(3), 611–622.

    Article  MathSciNet  MATH  Google Scholar 

  182. Tipping, M. E., & Bishop, C. M. (1999). Mixtures of probabilistic principal component analyzers. Neural Computation, 11, 443–482.

    Article  Google Scholar 

  183. Tipping, M. E., & Faul, A. C. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (pp. 1–13). Key West, FL.

    Google Scholar 

  184. Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65(1), 31–78.

    Article  Google Scholar 

  185. Tzikas, D. G., Likas, A. C., & Galatsanos, N. P. (2009). Sparse Bayesian modeling with adaptive kernel learning. IEEE Transactions on Neural Networks, 20(6), 926–937.

    Article  MATH  Google Scholar 

  186. Ueda, N., Nakano, R., Ghahramani, Z., & Hinton, G. E. (2000). SMEM algorithm for mixture models. Neural Computation, 12, 2109–2128.

    Article  Google Scholar 

  187. Valpola, H., & Pajunen, P. (2000). Fast algorithms for Bayesian independent component analysis. In Proceedings of the 2nd International Workshop on Independent Component Analysis and Signal Separation (pp. 233–237). Helsinki, Finland.

    Google Scholar 

  188. Valpola, H. (2000). Nonlinear independent component analysis using ensemble learning: Theory. In Proceedings of the 2nd International Workshop on Independent Component Analysis and Signal Separation (pp. 251–256). Helsinki, Finland.

    Google Scholar 

  189. Valpola, H., & Karhunen, J. (2002). An unsupervised ensemble learning for nonlinear dynamic state-space models. Neural Computation, 141(11), 2647–2692.

    Article  MATH  Google Scholar 

  190. Verma, T., & Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the 6th Conference on Uncertainty in AI (255–268). Cambridge, MA.

    Google Scholar 

  191. Wang, N., Yao, T., Wang, J., & Yeung, D.-Y. (2012). A probabilistic approach to robust matrix factorization. In Proceedings of the 12th European Conference on Computer Vision (pp. 126–139). Florence, Italy.

    Google Scholar 

  192. Watanabe, K., & Watanabe, S. (2006). Stochastic complexities of Gaussian mixtures in variational Bayesian approximation. Journal of Machine Learning Research, 7(4), 625–644.

    MathSciNet  MATH  Google Scholar 

  193. Watanabe, K., & Watanabe, S. (2007). Stochastic complexities of general mixture models in variational Bayesian learning. Neural Networks, 20, 210–219.

    Article  MATH  Google Scholar 

  194. Watanabe, K., Akaho, S., Omachi, S., & Okada, M. (2009). VB mixture model on a subspace of exponential family distributions. IEEE Transactions on Neural Networks, 20(11), 1783–1796.

    Article  Google Scholar 

  195. Weiss, Y., & Freeman, W. T. (2001). Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Computation, 13(10), 2173–2200.

    Article  MATH  Google Scholar 

  196. Welling, M., & Weber, M. (2001). A constrained EM algorithm for independent component analysis. Neural Computation, 13, 677–689.

    Article  MATH  Google Scholar 

  197. Winn, J., & Bishop, C. M. (2005). Variational message passing. Journal of Machine Learning Research, 6, 661–694.

    MathSciNet  MATH  Google Scholar 

  198. Winther, O., & Petersen, K. B. (2007). Flexible and efficient implementations of Bayesian independent component analysis. Neurocomputing, 71, 221–233.

    Article  Google Scholar 

  199. Xiang, Y. (2000). Belief updating in multiply sectioned Bayesian networks without repeated local propagations. International Journal of Approximate Reasoning, 23, 1–21.

    Article  MathSciNet  MATH  Google Scholar 

  200. Xie, X., & Geng, Z. (2008). A recursive method for structural learning of directed acyclic graphs. Journal of Machine Learning Research, 9, 459–483.

    MathSciNet  MATH  Google Scholar 

  201. Xie, X., Yan, S., Kwok, J., & Huang, T. (2008). Matrix-variate factor analysis and its applications. IEEE Transactions on Neural Networks, 19(10), 1821–1826.

    Article  Google Scholar 

  202. Xu, L., Jordan, M. I., & Hinton, G. E. (1995). An alternative model for mixtures of experts. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 633–640). Cambridge: MIT Press.

    Google Scholar 

  203. Yamazaki, K., & Watanabe, S. (2003). Singularities in mixture models and upper bounds of stochastic complexity. Neural Networks, 16, 1023–1038.

    MATH  Google Scholar 

  204. Yang, Z. R. (2006). A novel radial basis function neural network for discriminant analysis. IEEE Transactions on Neural Networks, 17(3), 604–612.

    Article  Google Scholar 

  205. Yap, G.-E., Tan, A.-H., & Pang, H.-H. (2008). Explaining inferences in Bayesian networks. Applied Intelligence, 29, 263–278.

    Article  Google Scholar 

  206. Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2001). Generalized belief propagation. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 689–695). Cambridge: MIT Press.

    Google Scholar 

  207. Yuille, A. (2002). CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation. Neural Computation, 14, 1691–1722.

    Article  MATH  Google Scholar 

  208. Zhang, B., Zhang, C., & Yi, X. (2004). Competitive EM algorithm for finite mixture models. Pattern Recognition, 37, 131–144.

    Article  MATH  Google Scholar 

  209. Zhao, J., Yu, P. L. H., & Kwok, J. T. (2012). Bilinear probabilistic principal component analysis. IEEE Transactions on Neural Networks and Learning Systems, 23(3), 492–503.

    Article  Google Scholar 

  210. Zhao, Q., Meng, D., Xu, Z., Zuo, W., & Yan, Y. (2015). \(L_1\)-norm low-rank matrix factorization by variational Bayesian method. IEEE Transactions on Neural Networks and Learning Systems, 26(4), 825–839.

    Article  MathSciNet  Google Scholar 

  211. Zhao, Q., Zhou, G., Zhang, L., Cichocki, A., & Amari, S.-I. (2016). Bayesian robust tensor factorization for incomplete multiway data. IEEE Transactions on Neural Networks and Learning Systems, 27(4), 736–748.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag London Ltd., part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Du, KL., Swamy, M.N.S. (2019). Probabilistic and Bayesian Networks. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-7452-3_22

Download citation

Publish with us

Policies and ethics