Probabilistic and Bayesian Networks

Du, Ke-Lin; Swamy, M. N. S.

doi:10.1007/978-1-4471-7452-3_22

Ke-Lin Du^3,4 &
M. N. S. Swamy³

4372 Accesses
1 Citations

Abstract

This chapter introduces several important probabilistic models. Bayesian network is a well-known probabilistic model in machine learning. Hidden Markov model is a special case of Bayesian network model for dynamic systems. Important probabilistic methods, including sampling methods, expectation–maximization method, variational Bayesian method, and mixture method, are introduced. Some Bayesian and probabilistic approaches to machine learning are also mentioned in this chapter.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbeel, P., Koller, D., & Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research, 7, 1743–1788.
MathSciNet MATH Google Scholar
Ahn, J.-H., Oh, J.-H., & Choi, S. (2007). Learning principal directions: Integrated-squared-error minimization. Neurocomputing, 70, 1372–1381.
Article Google Scholar
Andrieu, C., de Freitas, N., & Doucet, A. (2001). Robust full Bayesian learning for radial basis networks. Neural Computation, 13, 2359–2407.
Article MATH Google Scholar
Andrieu, C., Doucet, A., & Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B, 72(3), 269–342.
Article MathSciNet MATH Google Scholar
Archambeau, C., & Verleysen, M. (2007). Robust Bayesian clustering. Neural Networks, 20, 129–138.
Article MATH Google Scholar
Attias, H. (1999). Inferring parameters and structure of latent variable models by variational Bayes. In Proceedings of the 15th Annual Conference on Uncertainty in AI (pp. 21–30).
Google Scholar
Attias, H. (1999). Independent factor analysis. Neural Computation, 11, 803–851.
Article Google Scholar
Audhkhasi, K., Osoba, O., & Kosko, B. (2013). Noise benefits in backpropagation and deep bidirectional pre-training. In Proccedings of International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Dallas, TX.
Google Scholar
Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6, 1345–1382.
MathSciNet MATH Google Scholar
Bauer, E., Koller, D., & Singer, Y. (1997). Update rules for parameter estimation in Bayesian networks. In Proceedings of Annual Conference on Uncertainty in AI (pp. 3–13).
Google Scholar
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41(1), 164–171.
Article MathSciNet MATH Google Scholar
Beinlich, I. A., Suermondt, H. J., Chavez, R. M., & Cooper, G. F. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for Bayesian networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine (pp. 247–256).
Google Scholar
Benavent, A. P., Ruiz, F. E., & Saez, J. M. (2009). Learning Gaussian mixture models with entropy-based criteria. IEEE Transactions on Neural Networks, 20(11), 1756–1771.
Article Google Scholar
Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29, 213–244.
Article MATH Google Scholar
Bouchaert, R. R. (1994). Probabilistic network construction using the minimum description length principle. Technical Report UU-CS-1994-27, Utrecht University, Department of Computer Science, The Netherlands.
Google Scholar
Bouguila, N., & Ziou, D. (2007). High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10), 1716–1731.
Article Google Scholar
Boutemedjet, S., Bouguila, N., & Ziou, D. (2009). A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1429–1443.
Article Google Scholar
Bradley, P. S., Fayyad, U. M., & Reina, C. A. (1998). Scaling EM (expectation-maximization) clustering to large databases. MSR-TR-98-35, Microsoft Research.
Google Scholar
Breese, J. S., & Heckerman, D. (1996). Decision-theoretic troubleshooting: a framework for repair and experiment. In Proceedings of the 12th Conference on Uncertainty in AI (pp. 124–132). Portland, OR.
Google Scholar
Bromberg, F., Margaritis, D., & Honavar, V. (2006). Effcient Markov network structure discovery using independence tests. In Proceedings of the 6th SIAM International Conference on Data Mining (pp. 141–152).
Google Scholar
Buntine, W. (1991). Theory refinement of Bayesian networks. In B. D. D’Ambrosio, P. Smets, & P. P. Bonisone (Eds.), Proceedings of the 7th Conference on Uncertainty in AI (pp. 52–60). Burlington: Morgan Kaufmann.
Google Scholar
Celeux, G., & Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis, 14(3), 315–332.
Article MathSciNet MATH Google Scholar
Cemgil, A. T. (2009). Bayesian Inference for Nonnegative Matrix Factorisation Models. Computational Intelligence and Neuroscience, 2009, 785152, 17 p.
Google Scholar
Centeno, T. P., & Lawrence, N. D. (2006). Optimising kernel parameters and regularisation coefficients for non-linear discriminant analysis. Journal of Machine Learning Research, 7, 455–491.
MathSciNet MATH Google Scholar
Chang, R., & Hancock, J. (1966). On receiver structures for channels having memory. IEEE Transactions on Information Theory, 12(4), 463–468.
Article Google Scholar
Chatzis, S. P., & Demiris, Y. (2011). Echo state Gaussian process. IEEE Transactions on Neural Networks, 22(9), 1435–1445.
Article Google Scholar
Chatzis, S. P., & Kosmopoulos, D. I. (2011). A variational Bayesian methodology for hidden Markov models utilizing Student’s-t mixtures. Pattern Recognition, 44(2), 295–306.
Article MATH Google Scholar
Chen, H., Tino, P., & Yao, X. (2009). Probabilistic classification vector machines. IEEE Transactions on Neural Networks, 20(6), 901–914.
Article Google Scholar
Chen, X.-W., Anantha, G., & Lin, X. (2008). Improving Bayesian network structure learning with mutual information-based node ordering in the K2 algorithm. IEEE Transactions on Knowledge and Data Engineering, 20(5), 1–13.
Article Google Scholar
Chen, Y., Bornn, L., de Freitas, N., Eskelin, M., Fang, J., & Welling, M. (2016). Herded Gibbs sampling. Journal of Machine Learning Research, 17, 1–29.
MathSciNet MATH Google Scholar
Chen, Y.-C., Wheeler, T. A., & Kochenderfer, J. (2017). Learning discrete Bayesian networks from continuous data. Journal of Artificial Intelligence Research, 59, 103–132.
Article MathSciNet MATH Google Scholar
Cheng, J., Greiner, R., Kelly, J., Bell, D., & Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137(1), 43–90.
Article MathSciNet MATH Google Scholar
Cheng, S.-S., Fu, H.-C., & Wang, H.-M. (2009). Model-based clustering by probabilistic self-organizing maps. IEEE Transactions on Neural Networks, 20(5), 805–826.
Article Google Scholar
Cheung, Y. M. (2005). Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection. IEEE Transactions on Knowledge and Data Engineering, 17(6), 750–761.
Article Google Scholar
Chickering, D. M. (1996). Learning Bayesian networks is NP-complete. In D. Fisher & H. Lenz (Eds.), Learning from Data: Artificial Intelligence and Statistics (Vol. 5, pp. 121–130). Berlin: Springer.
Chapter Google Scholar
Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3, 507–554.
MathSciNet MATH Google Scholar
Chickering, D. M., Heckerman, D., & Meek, C. (2004). Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5, 1287–1330.
MathSciNet MATH Google Scholar
Chien, J.-T., & Hsieh, H.-L. (2013). Nonstationary source separation using sequential and variational Bayesian learning. IEEE Transactions on Neural Networks and Learning Systems, 24(5), 681–694.
Article MathSciNet Google Scholar
Choudrey, R. A., & Roberts, S. J. (2003). Variational mixture of Bayesian independent component analyzers. Neural Computation, 15, 213–252.
Article MATH Google Scholar
Chu, W., & Ghahramani, Z. (2009). Probabilistic models for incomplete multi-dimensional arrays. In Proceedings of the 12nd International Conference on Artificial Intelligence and Statistics (AISTATS) (pp. 89–96). Clearwater Beach, FL.
Google Scholar
Chu, W., Keerthi, S. S., & Ong, C. J. (2003). Bayesian trigonometric support vector classifier. Neural Computation, 15, 2227–2254.
Article MATH Google Scholar
Cohen, I., Bronstein, A., & Cozman, F. G. (2001). Adaptive online learning of Bayesian network parameters. HP Laboratories Palo Alto, HPL-2001-156.
Google Scholar
Cohn, I., El-Hay, T., Friedman, N., & Kupferman, R. (2010). Mean field variational approximation for continuous-time Bayesian networks. Journal of Machine Learning Research, 11, 2745–2783.
MathSciNet MATH Google Scholar
Constantinopoulos, C., Titsias, M. K., & Likas, A. (2006). Bayesian feature and model selection for Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 1013–1018.
Article Google Scholar
Cooper, G. F. (1990). The computational complexity of probabilistic inference using Bayesian Inference. Artificial Intelligence, 42, 393–405.
Article MathSciNet MATH Google Scholar
Cooper, G. F., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.
MATH Google Scholar
Cowell, R. (2001). Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In J. Breese, & D. Koller (Eds.), Proceedings of the 17th Conference on Uncertainty in AI (pp. 91–97). Burlington: Morgan Kaufmann.
Google Scholar
Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60(1), 141–154.
Article MathSciNet MATH Google Scholar
Dagum, P., & Luby, M. (1997). An optimal approximation algorithm for Bayesian inference. Artificial Intelligence, 93, 1–27.
Article MathSciNet MATH Google Scholar
Darwiche, A. (2001). Constant-space reasoning in dynamic Bayesian networks. International Journal of Approximate Reasoning, 26(3), 161–178.
Article MathSciNet MATH Google Scholar
Dauwels, J., Korl, S., & Loeliger, H.-A. (2005). Expectation maximization as message passing. In Proceedings of IEEE International Symposium on Information Theory (1–4). Adelaide, Australia.
Google Scholar
Dawid, A. P. (1992). Applications of a general propagation algorithm for probalilistic expert systems. Statistics and Computing, 2, 25–36.
Article Google Scholar
de Campos, L. M., & Castellano, J. G. (2007). Bayesian network learning algorithms using structural restrictions. International Journal of Approximate Reasoning, 45, 233–254.
Article MathSciNet MATH Google Scholar
de Campos, C. P., & Ji, Q. (2011). Efficient structure learning of Bayesian networks using constraints. Journal of Machine Learning Research, 12, 663–689.
MathSciNet MATH Google Scholar
Del Moral, P., Doucet, A., & Jasra, A. (2006). Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B, 68(3), 411–436.
Article MathSciNet MATH Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
MathSciNet MATH Google Scholar
Deneve, S. (2008). Bayesian spiking neurons. I: Inference. Neural Computation, 20(1), 91–117.
Article MathSciNet MATH Google Scholar
Du, K.-L., & Swamy, M. N. S. (2010). Wireless communication systems. Cambridge: Cambridge University Press.
Book Google Scholar
El-Hay, T., Friedman, N., & Kupferman, R. (2008). Gibbs sampling in factorized continuous-time Markov processes. In Proceedings of the 24th Conference on Uncertainty in AI.
Google Scholar
Elidan, G., & Friedman, N. (2005). Learning hidden variable networks: The information bottleneck approach. Journal of Machine Learning Research, 6, 81–127.
MathSciNet MATH Google Scholar
Engel, A., & Van den Broeck, C. (2001). Statistical mechanics of learning. Cambridge: Cambridge University Press.
Book MATH Google Scholar
Ephraim, Y., & Merhav, N. (2002). Hidden markov processes. IEEE Transactions on Information Theory, 48(6), 1518–1569.
Article MathSciNet MATH Google Scholar
Fan, Y., Xu, J., & Shelton, C. R. (2010). Importance sampling for continuous time Bayesian networks. Journal of Machine Learning Research, 11, 2115–2140.
MathSciNet MATH Google Scholar
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 305(5814), 972–976.
Article MathSciNet MATH Google Scholar
Friedman, N. (1997). Learning Bayesian networks in the presence of missing values and hidden variables. In D. Fisher (Ed.), Proceedings of the 14th Conference on Uncertainty in AI (pp. 125–133). San Francisco: Morgan Kaufmann.
Google Scholar
Gandhi, P., Bromberg, F., & Margaritis, D. (2008). Learning Markov network structure using few independence tests. In Proceedings of SIAM International Conference on Data Mining (pp. 680–691).
Google Scholar
Gao, B., Woo, W. L., & Dlay, S. S. (2012). Variational regularized 2-D nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 703–716.
Article Google Scholar
Gaussier, E., & Goutte, C. (2005). Relation between PLSA and NMF and implications. In Proceedings of Annual ACMSIGIR Conference on Research and Development in Information Retrieval (pp. 601–602).
Google Scholar
Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409.
Article MathSciNet MATH Google Scholar
Gelly, S., & Teytaud, O. (2005). Bayesian networks: A better than frequentist approach for parametrization, and a more accurate structural complexity measure than the number of parameters. In Proceedings of CAP. Nice, France.
Google Scholar
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741.
Article MATH Google Scholar
Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2002). Learning probabilistic models of link structure. Journal of Machine Learning Research, 3, 679–707.
MathSciNet MATH Google Scholar
Ghahramani, Z., & Beal, M. (1999). Variational inference for Bayesian mixture of factor analysers. Advances in neural information processing systems (Vol. 12). Cambridge: MIT Press.
Google Scholar
Globerson, A., & Jaakkola, T. (2007). Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In Advances in neural information processing systems (Vol. 20, pp. 553–560). Vancouver, Canada.
Google Scholar
Gonen, M., Tanugur, A. G., & Alpaydin, E. (2008). Multiclass posterior probability support vector machines. IEEE Transactions on Neural Networks, 19(1), 130–139.
Article Google Scholar
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732.
Article MathSciNet MATH Google Scholar
Handschin, J. E., & Mayne, D. Q. (1969). Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering. International Journal of Control, 9(5), 547–559.
Article MathSciNet MATH Google Scholar
Hammersely, J. M., & Morton, K. W. (1954). Poor man’s Monte Carlo. Journal of the Royal Statistical Society: Series B, 16, 23–38.
MathSciNet MATH Google Scholar
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97–109.
Article MathSciNet MATH Google Scholar
Hazan, T., & Shashua, A. (2010). Norm-product belief propagation: Primal-dual message-passing for approximate inference. IEEE Transactions on Information Theory, 56(12), 6294–6316.
Article MathSciNet MATH Google Scholar
Heckerman, D. (1995). A Tutorial on learning with Bayesian networks. Microsoft Technical Report MSR-TR-95-06 (Revised Nov 1996).
Google Scholar
Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3), 197–243.
MATH Google Scholar
Hennig, P., & Kiefel, M. (2013). Quasi-Newton methods: A new direction. Journal of Machine Learning Research, 14, 843–865.
MathSciNet MATH Google Scholar
Heskes, T. (2004). On the uniqueness of loopy belief propagation fixed points. Neural Computation, 16, 2379–2413.
Article MATH Google Scholar
Hojen-Sorensen, P. A., & d. F. R., Winther, O., & Hansen, L. K., (2002). Mean-field approaches to independent component analysis. Neural Computation, 14, 889–918.
Google Scholar
Holmes, C. C., & Mallick, B. K. (1998). Bayesian radial basis functions of variable dimension. Neural Computation, 10(5), 1217–1233.
Article Google Scholar
Huang, Q., Yang, J., & Zhou, Y. (2008). Bayesian nonstationary source separation. Neurocomputing, 71, 1714–1729.
Article Google Scholar
Huang, J. C., & Frey, B. J. (2011). Cumulative distribution networks and the derivative-sum-product algorithm: Models and inference for cumulative distribution functions on graphs. Journal of Machine Learning Research, 12, 301–348.
MathSciNet MATH Google Scholar
Huang, S., Li, J., Ye, J., Fleisher, A., Chen, K., Wu, T., et al. (2013). A sparse structure learning algorithm for Gaussian Bayesian network identification from high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1328–1342.
Article Google Scholar
Huda, S., Yearwood, J., & Togneri, R. (2009). A stochastic version of expectation maximization algorithm for better estimation of hidden Markov model. Pattern Recognition Letters, 30, 1301–1309.
Article Google Scholar
Ickstadt, K., Bornkamp, B., Grzegorczyk, M., Wieczorek, J., Sheriff, M. R., Grecco, H. E., et al. (2010). Nonparametric Bayesian network. Bayesian. Statistics, 9, 283–316.
Google Scholar
Ihler, A. T., Fisher, J. W, I. I. I., & Willsky, A. S. (2005). Loopy belief propagation: Convergence and effects of message errors. Journal of Machine Learning Research, 6, 905–936.
MathSciNet MATH Google Scholar
Jensen, F. V., Lauritzen, S. L., & Olesen, K. G. (1990). Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quaterly, 4, 269–282.
MathSciNet MATH Google Scholar
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183–233.
Article MATH Google Scholar
Kalisch, M., & Buhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8, 613–636.
MATH Google Scholar
Khan, S. A., Leppaaho, E., & Kaski, S. (2016). Bayesian multi-tensor factorization. Machine Learning, 105(2), 233–253.
Article MathSciNet MATH Google Scholar
Khreich, W., Granger, E., Miri, A., & Sabourin, R. (2010). On the memory complexity of the forward-backward algorithm. Pattern Recognition Letters, 31, 91–99.
Article MATH Google Scholar
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations (pp. 1–14). Banff, Canada.
Google Scholar
Kjaerulff, U. (1995). dHugin: A computational system for dynamic time-sliced Bayesian networks. International Journal of Forecasting, 11(1), 89–113.
Article Google Scholar
Koivisto, M., & Sood, K. (2004). Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5, 549–573.
MathSciNet MATH Google Scholar
Kolmogorov, V. (2015). A new look at reweighted message passing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(5), 919–930.
Article Google Scholar
Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. Transactions on Information Theory, 47(2), 498–519.
Article MathSciNet MATH Google Scholar
Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10(3), 269–293.
Article Google Scholar
Lam, W., & Segre, A. M. (2002). A distributed learning algorithm for Bayesian inference networks. IEEE Transactions on Knowledge and Data Engineering, 14(1), 93–105.
Article Google Scholar
Langari, R., Wang, L., & Yen, J. (1997). Radial basis function networks, regression weights, and the expectation-maximization algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 27(5), 613–623.
Article Google Scholar
Lappalainen, H., & Honkela, A. (2000). Bayesian nonlinear independent component analysis by multilayer perceptron. In M. Girolami (Ed.), Advances in independent component analysis (pp. 93–121). Berlin: Springer.
Chapter Google Scholar
Lauritzen, S. L. (1992). Propagation of probabilities, means and variances in mixed graphical association models. Journal of the American Statistical Association, 87(420), 1098–1108.
Article MathSciNet MATH Google Scholar
Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application on expert systems. Journal of the Royal Statistical Society, Series B, 50(2), 157–224.
MathSciNet MATH Google Scholar
Lazaro, M., Santamaria, I., & Pantaleon, C. (2003). A new EM-based training algorithm for RBF networks. Neural Networks, 16, 69–77.
Article Google Scholar
Lawrence, N. (2005). Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783–1816.
MathSciNet MATH Google Scholar
Lawrence, N., Seeger, M., & Herbrich, R. (2003). Fast sparse Gaussian process methods: The informative vector machine. In Advances in Neural Information Processing Systems (Vol. 15, pp. 609–616).
Google Scholar
Li, J., & Tao, D. (2013). Exponential family factors for Bayesian factor analysis. IEEE Transactions on Neural Networks and Learning Systems, 24(6), 964–976.
Article Google Scholar
Liang, F. (2007). Annealing stochastic approximation Monte Carlo algorithm for neural network training. Machine Learning, 68, 201–233.
Article Google Scholar
Liang, F., Liu, C., & Carroll, R. J. (2007). Stochastic approximation in Monte Carlo computation. Journal of the American Statistical Association, 102, 305–320.
Article MathSciNet MATH Google Scholar
Lindsten, F., Jordan, M. I., & Schon, T. B. (2014). Particle Gibbs with ancestor sampling. Journal of Machine Learning Research, 15, 2145–2184.
MathSciNet MATH Google Scholar
Lopez-Rubio, E. (2009). Multivariate Student-t self-organizing maps. Neural Networks, 22, 1432–1447.
Article MATH Google Scholar
Lu, X., Wang, Y., & Yuan, Y. (2013). Sparse coding from a Bayesian perspective. IEEE Transactions on Neural Networks and Learning Systems, 24(6), 929–939.
Article Google Scholar
Luis, R., Sucar, L. E., & Morales, E. F. (2010). Inductive transfer for learning Bayesian networks. Machine Learning, 79, 227–255.
Article MathSciNet Google Scholar
Ma, S., Ji, C., & Farmer, J. (1997). An efficient EM-based training algorithm for feedforward neural networks. Neural Networks, 10, 243–256.
Article Google Scholar
Ma, J., Xu, L., & Jordan, M. I. (2000). Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Computation, 12, 2881–2907.
Article Google Scholar
Mackay, D. J. C. (1992). A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3), 448–472.
Article Google Scholar
Margaritis, D., & Thrun, S. (2000). Bayesian network induction via local neighborhoods. In S. A. Solla, T. K. Leen, & K.-R. Muller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 505–511). Cambridge: MIT Press.
Google Scholar
Martins, A. F. T., Figueiredo, M. A. T., Aguiar, P. M. Q., Smith, N. A., & Xing, E. P. (2015). AD\(^3\): Alternating directions dual decomposition for MAP inference in graphical models. Journal of Machine Learning Research, 16, 495–545.
MathSciNet MATH Google Scholar
Mateescu, R., & Dechter, R. (2009). Mixed deterministic and probabilistic networks: A survey of recent results. Annals of Mathematics and Artificial Intelligence, 54, 3–51.
Article MATH Google Scholar
Meilijson, I. (1989). A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society: Series B, 51(1), 127–138.
MathSciNet MATH Google Scholar
Minka, T. (2001). Expectation propagation for approximate Bayesian inference. Doctoral Dissertation, MIT Media Lab.
Google Scholar
Miskin, J. W., & MacKay, D. J. C. (2001). Ensemble learning for blind source separation. In S. Roberts & R. Everson (Eds.), Independent component analysis: Principles and practice (pp. 209–233). Cambridge: Cambridge University Press.
Chapter Google Scholar
Mnih, A., & Salakhutdinov, R. R. (2007). Probabilistic matrix factorization. In Advances in neural information processing systems (Vol. 20, pp. 1257–1264). Red Hook: Curran & Associates Inc.
Google Scholar
Mongillo, G., & Deneve, S. (2008). Online learning with hidden Markov models. Neural Computation, 20, 1706–1716.
Article MathSciNet MATH Google Scholar
Moral, S., Rumi, R., & Salmeron, A. (2001). Mixtures of truncated exponentials in hybrid Bayesian networks. In S. Benferhat, & P. Besnard (Eds.), Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, LNCS 2143 (pp. 156–167). Berlin: Springer.
Google Scholar
Nasios, N., & Bors, A. (2006). Variational learning for Gaussian mixtures. IEEE Transactions on Systems Man and Cybernetics, Part B, 36(4), 849–862.
Article Google Scholar
Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, Toronto.
Google Scholar
Ngo, L., & Haddawy, P. (1995). Probabilistic logic programming and bayesian networks. In Algorithms, Concurrency and Knowledge (Proceedings of Asian Computing Science Conference), LNCS (Vol. 1023, pp. 286–300). Berlin: Springer.
Google Scholar
Nielsen, S. H., & Nielsen, T. D. (2008). Adapting Bayes network structures to non-stationary domains. International Journal of Approximate Reasoning, 49, 379–397.
Article MathSciNet MATH Google Scholar
Nodelman, U., Shelton, C. R., & Koller, D. (2002). Continuous time Bayesian networks. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 378–387).
Google Scholar
Noorshams, N., & Wainwright, M. J. (2013). Stochastic belief propagation: A low-complexity alternative to the sum-product algorithm. IEEE Transactions on Information Theory, 59(4), 1981–2000.
Article MathSciNet MATH Google Scholar
Norton, M., Mafusalov, A., & Uryasev, S. (2017). Soft margin support vector classification as buffered probability minimization. Journal of Machine Learning Research, 18, 1–43.
MathSciNet MATH Google Scholar
Opper, M. (1998). A Bayesian approach to online learning. In D. Saad (Ed.), On-line learning in neural networks (pp. 363–378). Cambridge: Cambridge University Press.
MATH Google Scholar
Opper, M., & Winther, O. (2000). Gaussian processes for classification: Mean field algorithms. Neural Computation, 12, 2655–2684.
Article Google Scholar
Opper, M., & Winther, O. (2001). Tractable approximations for probabilistic models: The adaptive Thouless-Anderson-Palmer mean field approach. Physical Review Letters, 86, 3695–3699.
Article Google Scholar
Opper, M., & Winther, O. (2005). Expectation consistent approximate inference. Journal of Machine Learning Research, 6, 2177–2204.
MathSciNet MATH Google Scholar
Osoba, O., & Kosko, B. (2016). The noisy expectation-maximization algorithm for multiplicative noise injection. Fluctuation and Noise Letters, 15(1), paper ID 1650007.
Google Scholar
Osoba, O., Mitaim, S., & Kosko, B. (2011). Noise benefits in the expectation-maximization algorithm: NEM theorems and models. In Proccedings of the International Joint Conference on Neural Networks (IJCNN) (pp. 3178–3183). San Jose, CA.
Google Scholar
Ott, G. (1967). Compact encoding of stationary markov sources. IEEE Transactions on Information Theory, 13(1), 82–86.
Article MATH Google Scholar
Park, H., & Ozeki, T. (2009). Singularity and slow convergence of the EM algorithm for Gaussian mixtures. Neural Processing Letters, 29, 45–59.
Article Google Scholar
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo: Morgan Kaufmann.
MATH Google Scholar
Perez, A., Larranaga, P., & Inza, I. (2009). Bayesian classifiers based on kernel density estimation: Flexible classifiers. International Journal of Approximate Reasoning, 50, 341–362.
Article MATH Google Scholar
Pietra, S. D., Pietra, V. D., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 380–393.
Article Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Article Google Scholar
Raviv, J. (1967). Decision making in markov chains applied to the problem of pattern recognition. IEEE Transactions on Information Theory, 13(4), 536–551.
Article MathSciNet Google Scholar
Richardson, S., & Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: Series B, 59(4), 731–792.
Article MathSciNet MATH Google Scholar
Robert, C. P., & Casella, G. (2004). Monte Carlo Statistical Methods. New York: Springer.
Book MATH Google Scholar
Romero, V., Rumi, R., & Salmeron, A. (2006). Learning hybrid Bayesian networks using mixtures of truncated exponentials. International Journal of Approximate Reasoning, 42, 54–68.
Article MathSciNet MATH Google Scholar
Roos, T., Grunwald, P., & Myllymaki, P. (2005). On discriminative Bayesian network classifiers and logistic regression. Machine Learning, 59, 267–296.
MATH Google Scholar
Rosipal, R., & Girolami, M. (2001). An expectation-maximization approach to nonlinear component analysis. Neural Computation, 13, 505–510.
Article MATH Google Scholar
Roweis, S. (1998). EM algorithms for PCA and SPCA. Advances in neural information processing systems (Vol. 10, pp. 626–632). Cambridge: MIT Press.
Google Scholar
Rusakov, D., & Geiger, D. (2005). Asymptotic model selection for naive Bayesian networks. Journal of Machine Learning Research, 6, 1–35.
MathSciNet MATH Google Scholar
Sarela, J., & Valpola, H. (2005). Denoising source separation. Journal of Machine Learning Research, 6, 233–272.
MathSciNet MATH Google Scholar
Sato, M. (2001). Online model selection based on the variational Bayes. Neural Computation, 13, 1649–1681.
Article MATH Google Scholar
Scutari, M. (2010). Learning Bayesian networks with the bnlearn R package. Journal of Statistical Software, 35(3), 1–22.
Article MathSciNet Google Scholar
Seeger, M. W. (2008). Bayesian inference and optimal design for the sparse linear model. Journal of Machine Learning Research, 9, 759–813.
MathSciNet MATH Google Scholar
Shashanka, M., Raj, B., & Smaragdis, P. (2008). Probabilistic latent variable models as nonnegative factorizations. Computational Intelligence and Neuroscience, 2008, 947438, 9 p.
Google Scholar
Shelton, C. R., Fan, Y., Lam, W., Lee, J., & Xu, J. (2010). Continuous time Bayesian network reasoning and learning engine. Journal of Machine Learning Research, 11, 1137–1140.
MATH Google Scholar
Shutin, D., Zechner, C., Kulkarni, S. R., & Poor, H. V. (2012). Regularized variational Bayesian learning of echo state networks with delay & sum readout. Neural Computation, 24, 967–995.
Article MathSciNet MATH Google Scholar
Silander, T., & Myllymaki, P. (2006). A simple approach for finding the globally optimal Bayesian network structure. In Proceedings of the 22th Conference on Uncertainty in AI (pp. 445–452).
Google Scholar
Silander, T., Kontkanen, P., & Myllymaki, P. (2007). On sensitivity of the MAP Bayesian network structure to the equivalent sample size parameter. In R. Parr, & L. van der Gaag (Eds.), Proceedings of the 23rd Conference on Uncertainty in AI (pp. 360–367). AUAI Press.
Google Scholar
Silander, T., Roos, T., & Myllymaki, P. (2009). Locally minimax optimal predictive modeling with Bayesian networks. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR Proceedings Track (Vol. 5, 504–511). Clearwater Beach, FL.
Google Scholar
Smyth, P., Hecherman, D., & Jordan, M. I. (1997). Probabilistic independent networks for hidden Markov probabilities models. Neural Computation, 9(2), 227–269.
Article Google Scholar
Spiegelhalter, D. J., & Lauritzen, S. L. (1990). Sequential updating of conditional probabilities on directed graphical structures. Networks, 20(5), 579–605.
Article MathSciNet MATH Google Scholar
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search (2nd ed.). Cambridge: MIT Press.
MATH Google Scholar
Takeda, A., & M. Sugiyama, (2008). \(\nu \)-support vector machine as conditional value-at-risk minimization. In Proceedings of the ACM 25th international conference on machine learning (pp. 1056–1063).
Google Scholar
Takekawa, T., & Fukai, T. (2009). A novel view of the variational Bayesian clustering. Neurocomputing, 72, 3366–3369.
Article Google Scholar
Tamada, Y., Imoto, S., & Miyano, S. (2011). Parallel algorithm for learning optimal Bayesian network structure. Journal of Machine Learning Research, 12, 2437–2459.
MathSciNet MATH Google Scholar
Tan, X., & Li, J. (2010). Computationally efficient sparse Bayesian learning via belief propagation. IEEE Transactions on Signal Processing, 58(4), 2010–2021.
Article MathSciNet MATH Google Scholar
Tanner, M., & Wong, W. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398), 528–540.
Article MathSciNet MATH Google Scholar
Tao, Q., Wu, G., Wang, F., & Wang, J. (2005). Posterior probability support vector machines for unbalanced data. IEEE Transactions on Neural Networks, 16(6), 1561–1573.
Article Google Scholar
Tatikonda, S., & Jordan, M. (2002). Loopy belief propagation and Gibbs measures. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (pp. 493–500). San Francisco, CA: Morgan Kaufmann.
Google Scholar
Ting, J.-A., D’Souza, A., Vijayakumar, S., & Schaal, S. (2010). Efficient learning and feature selection in high-dimensional regression. Neural Computation, 22, 831–886.
Article MathSciNet MATH Google Scholar
Tipping, M. E. (2000). The relevance vector machine. In Advances in neural information processing systems (Vol. 12, pp. 652–658).
Google Scholar
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
MathSciNet MATH Google Scholar
Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B, 61(3), 611–622.
Article MathSciNet MATH Google Scholar
Tipping, M. E., & Bishop, C. M. (1999). Mixtures of probabilistic principal component analyzers. Neural Computation, 11, 443–482.
Article Google Scholar
Tipping, M. E., & Faul, A. C. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (pp. 1–13). Key West, FL.
Google Scholar
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65(1), 31–78.
Article Google Scholar
Tzikas, D. G., Likas, A. C., & Galatsanos, N. P. (2009). Sparse Bayesian modeling with adaptive kernel learning. IEEE Transactions on Neural Networks, 20(6), 926–937.
Article MATH Google Scholar
Ueda, N., Nakano, R., Ghahramani, Z., & Hinton, G. E. (2000). SMEM algorithm for mixture models. Neural Computation, 12, 2109–2128.
Article Google Scholar
Valpola, H., & Pajunen, P. (2000). Fast algorithms for Bayesian independent component analysis. In Proceedings of the 2nd International Workshop on Independent Component Analysis and Signal Separation (pp. 233–237). Helsinki, Finland.
Google Scholar
Valpola, H. (2000). Nonlinear independent component analysis using ensemble learning: Theory. In Proceedings of the 2nd International Workshop on Independent Component Analysis and Signal Separation (pp. 251–256). Helsinki, Finland.
Google Scholar
Valpola, H., & Karhunen, J. (2002). An unsupervised ensemble learning for nonlinear dynamic state-space models. Neural Computation, 141(11), 2647–2692.
Article MATH Google Scholar
Verma, T., & Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the 6th Conference on Uncertainty in AI (255–268). Cambridge, MA.
Google Scholar
Wang, N., Yao, T., Wang, J., & Yeung, D.-Y. (2012). A probabilistic approach to robust matrix factorization. In Proceedings of the 12th European Conference on Computer Vision (pp. 126–139). Florence, Italy.
Google Scholar
Watanabe, K., & Watanabe, S. (2006). Stochastic complexities of Gaussian mixtures in variational Bayesian approximation. Journal of Machine Learning Research, 7(4), 625–644.
MathSciNet MATH Google Scholar
Watanabe, K., & Watanabe, S. (2007). Stochastic complexities of general mixture models in variational Bayesian learning. Neural Networks, 20, 210–219.
Article MATH Google Scholar
Watanabe, K., Akaho, S., Omachi, S., & Okada, M. (2009). VB mixture model on a subspace of exponential family distributions. IEEE Transactions on Neural Networks, 20(11), 1783–1796.
Article Google Scholar
Weiss, Y., & Freeman, W. T. (2001). Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Computation, 13(10), 2173–2200.
Article MATH Google Scholar
Welling, M., & Weber, M. (2001). A constrained EM algorithm for independent component analysis. Neural Computation, 13, 677–689.
Article MATH Google Scholar
Winn, J., & Bishop, C. M. (2005). Variational message passing. Journal of Machine Learning Research, 6, 661–694.
MathSciNet MATH Google Scholar
Winther, O., & Petersen, K. B. (2007). Flexible and efficient implementations of Bayesian independent component analysis. Neurocomputing, 71, 221–233.
Article Google Scholar
Xiang, Y. (2000). Belief updating in multiply sectioned Bayesian networks without repeated local propagations. International Journal of Approximate Reasoning, 23, 1–21.
Article MathSciNet MATH Google Scholar
Xie, X., & Geng, Z. (2008). A recursive method for structural learning of directed acyclic graphs. Journal of Machine Learning Research, 9, 459–483.
MathSciNet MATH Google Scholar
Xie, X., Yan, S., Kwok, J., & Huang, T. (2008). Matrix-variate factor analysis and its applications. IEEE Transactions on Neural Networks, 19(10), 1821–1826.
Article Google Scholar
Xu, L., Jordan, M. I., & Hinton, G. E. (1995). An alternative model for mixtures of experts. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 633–640). Cambridge: MIT Press.
Google Scholar
Yamazaki, K., & Watanabe, S. (2003). Singularities in mixture models and upper bounds of stochastic complexity. Neural Networks, 16, 1023–1038.
MATH Google Scholar
Yang, Z. R. (2006). A novel radial basis function neural network for discriminant analysis. IEEE Transactions on Neural Networks, 17(3), 604–612.
Article Google Scholar
Yap, G.-E., Tan, A.-H., & Pang, H.-H. (2008). Explaining inferences in Bayesian networks. Applied Intelligence, 29, 263–278.
Article Google Scholar
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2001). Generalized belief propagation. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 689–695). Cambridge: MIT Press.
Google Scholar
Yuille, A. (2002). CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation. Neural Computation, 14, 1691–1722.
Article MATH Google Scholar
Zhang, B., Zhang, C., & Yi, X. (2004). Competitive EM algorithm for finite mixture models. Pattern Recognition, 37, 131–144.
Article MATH Google Scholar
Zhao, J., Yu, P. L. H., & Kwok, J. T. (2012). Bilinear probabilistic principal component analysis. IEEE Transactions on Neural Networks and Learning Systems, 23(3), 492–503.
Article Google Scholar
Zhao, Q., Meng, D., Xu, Z., Zuo, W., & Yan, Y. (2015). \(L_1\)-norm low-rank matrix factorization by variational Bayesian method. IEEE Transactions on Neural Networks and Learning Systems, 26(4), 825–839.
Article MathSciNet Google Scholar
Zhao, Q., Zhou, G., Zhang, L., Cichocki, A., & Amari, S.-I. (2016). Bayesian robust tensor factorization for incomplete multiway data. IEEE Transactions on Neural Networks and Learning Systems, 27(4), 736–748.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Ke-Lin Du & M. N. S. Swamy
Xonlink Inc., Hangzhou, China
Ke-Lin Du

Authors

Ke-Lin Du
View author publications
You can also search for this author in PubMed Google Scholar
M. N. S. Swamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke-Lin Du .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Du, KL., Swamy, M.N.S. (2019). Probabilistic and Bayesian Networks. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-7452-3_22

Download citation

DOI: https://doi.org/10.1007/978-1-4471-7452-3_22
Published: 13 September 2019
Publisher Name: Springer, London
Print ISBN: 978-1-4471-7451-6
Online ISBN: 978-1-4471-7452-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics