Abstract
This chapter introduces several important probabilistic models. Bayesian network is a well-known probabilistic model in machine learning. Hidden Markov model is a special case of Bayesian network model for dynamic systems. Important probabilistic methods, including sampling methods, expectation–maximization method, variational Bayesian method, and mixture method, are introduced. Some Bayesian and probabilistic approaches to machine learning are also mentioned in this chapter.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbeel, P., Koller, D., & Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research, 7, 1743–1788.
Ahn, J.-H., Oh, J.-H., & Choi, S. (2007). Learning principal directions: Integrated-squared-error minimization. Neurocomputing, 70, 1372–1381.
Andrieu, C., de Freitas, N., & Doucet, A. (2001). Robust full Bayesian learning for radial basis networks. Neural Computation, 13, 2359–2407.
Andrieu, C., Doucet, A., & Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. Journal of the Royal Statistical Society: Series B, 72(3), 269–342.
Archambeau, C., & Verleysen, M. (2007). Robust Bayesian clustering. Neural Networks, 20, 129–138.
Attias, H. (1999). Inferring parameters and structure of latent variable models by variational Bayes. In Proceedings of the 15th Annual Conference on Uncertainty in AI (pp. 21–30).
Attias, H. (1999). Independent factor analysis. Neural Computation, 11, 803–851.
Audhkhasi, K., Osoba, O., & Kosko, B. (2013). Noise benefits in backpropagation and deep bidirectional pre-training. In Proccedings of International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). Dallas, TX.
Banerjee, A., Dhillon, I. S., Ghosh, J., & Sra, S. (2005). Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6, 1345–1382.
Bauer, E., Koller, D., & Singer, Y. (1997). Update rules for parameter estimation in Bayesian networks. In Proceedings of Annual Conference on Uncertainty in AI (pp. 3–13).
Baum, L. E., Petrie, T., Soules, G., & Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics, 41(1), 164–171.
Beinlich, I. A., Suermondt, H. J., Chavez, R. M., & Cooper, G. F. (1989). The ALARM monitoring system: A case study with two probabilistic inference techniques for Bayesian networks. In Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine (pp. 247–256).
Benavent, A. P., Ruiz, F. E., & Saez, J. M. (2009). Learning Gaussian mixture models with entropy-based criteria. IEEE Transactions on Neural Networks, 20(11), 1756–1771.
Binder, J., Koller, D., Russell, S., & Kanazawa, K. (1997). Adaptive probabilistic networks with hidden variables. Machine Learning, 29, 213–244.
Bouchaert, R. R. (1994). Probabilistic network construction using the minimum description length principle. Technical Report UU-CS-1994-27, Utrecht University, Department of Computer Science, The Netherlands.
Bouguila, N., & Ziou, D. (2007). High-dimensional unsupervised selection and estimation of a finite generalized Dirichlet mixture model based on minimum message length. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(10), 1716–1731.
Boutemedjet, S., Bouguila, N., & Ziou, D. (2009). A hybrid feature extraction selection approach for high-dimensional non-Gaussian data clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(8), 1429–1443.
Bradley, P. S., Fayyad, U. M., & Reina, C. A. (1998). Scaling EM (expectation-maximization) clustering to large databases. MSR-TR-98-35, Microsoft Research.
Breese, J. S., & Heckerman, D. (1996). Decision-theoretic troubleshooting: a framework for repair and experiment. In Proceedings of the 12th Conference on Uncertainty in AI (pp. 124–132). Portland, OR.
Bromberg, F., Margaritis, D., & Honavar, V. (2006). Effcient Markov network structure discovery using independence tests. In Proceedings of the 6th SIAM International Conference on Data Mining (pp. 141–152).
Buntine, W. (1991). Theory refinement of Bayesian networks. In B. D. D’Ambrosio, P. Smets, & P. P. Bonisone (Eds.), Proceedings of the 7th Conference on Uncertainty in AI (pp. 52–60). Burlington: Morgan Kaufmann.
Celeux, G., & Govaert, G. (1992). A classification EM algorithm for clustering and two stochastic versions. Computational Statistics & Data Analysis, 14(3), 315–332.
Cemgil, A. T. (2009). Bayesian Inference for Nonnegative Matrix Factorisation Models. Computational Intelligence and Neuroscience, 2009, 785152, 17 p.
Centeno, T. P., & Lawrence, N. D. (2006). Optimising kernel parameters and regularisation coefficients for non-linear discriminant analysis. Journal of Machine Learning Research, 7, 455–491.
Chang, R., & Hancock, J. (1966). On receiver structures for channels having memory. IEEE Transactions on Information Theory, 12(4), 463–468.
Chatzis, S. P., & Demiris, Y. (2011). Echo state Gaussian process. IEEE Transactions on Neural Networks, 22(9), 1435–1445.
Chatzis, S. P., & Kosmopoulos, D. I. (2011). A variational Bayesian methodology for hidden Markov models utilizing Student’s-t mixtures. Pattern Recognition, 44(2), 295–306.
Chen, H., Tino, P., & Yao, X. (2009). Probabilistic classification vector machines. IEEE Transactions on Neural Networks, 20(6), 901–914.
Chen, X.-W., Anantha, G., & Lin, X. (2008). Improving Bayesian network structure learning with mutual information-based node ordering in the K2 algorithm. IEEE Transactions on Knowledge and Data Engineering, 20(5), 1–13.
Chen, Y., Bornn, L., de Freitas, N., Eskelin, M., Fang, J., & Welling, M. (2016). Herded Gibbs sampling. Journal of Machine Learning Research, 17, 1–29.
Chen, Y.-C., Wheeler, T. A., & Kochenderfer, J. (2017). Learning discrete Bayesian networks from continuous data. Journal of Artificial Intelligence Research, 59, 103–132.
Cheng, J., Greiner, R., Kelly, J., Bell, D., & Liu, W. (2002). Learning Bayesian networks from data: An information-theory based approach. Artificial Intelligence, 137(1), 43–90.
Cheng, S.-S., Fu, H.-C., & Wang, H.-M. (2009). Model-based clustering by probabilistic self-organizing maps. IEEE Transactions on Neural Networks, 20(5), 805–826.
Cheung, Y. M. (2005). Maximum weighted likelihood via rival penalized EM for density mixture clustering with automatic model selection. IEEE Transactions on Knowledge and Data Engineering, 17(6), 750–761.
Chickering, D. M. (1996). Learning Bayesian networks is NP-complete. In D. Fisher & H. Lenz (Eds.), Learning from Data: Artificial Intelligence and Statistics (Vol. 5, pp. 121–130). Berlin: Springer.
Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3, 507–554.
Chickering, D. M., Heckerman, D., & Meek, C. (2004). Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5, 1287–1330.
Chien, J.-T., & Hsieh, H.-L. (2013). Nonstationary source separation using sequential and variational Bayesian learning. IEEE Transactions on Neural Networks and Learning Systems, 24(5), 681–694.
Choudrey, R. A., & Roberts, S. J. (2003). Variational mixture of Bayesian independent component analyzers. Neural Computation, 15, 213–252.
Chu, W., & Ghahramani, Z. (2009). Probabilistic models for incomplete multi-dimensional arrays. In Proceedings of the 12nd International Conference on Artificial Intelligence and Statistics (AISTATS) (pp. 89–96). Clearwater Beach, FL.
Chu, W., Keerthi, S. S., & Ong, C. J. (2003). Bayesian trigonometric support vector classifier. Neural Computation, 15, 2227–2254.
Cohen, I., Bronstein, A., & Cozman, F. G. (2001). Adaptive online learning of Bayesian network parameters. HP Laboratories Palo Alto, HPL-2001-156.
Cohn, I., El-Hay, T., Friedman, N., & Kupferman, R. (2010). Mean field variational approximation for continuous-time Bayesian networks. Journal of Machine Learning Research, 11, 2745–2783.
Constantinopoulos, C., Titsias, M. K., & Likas, A. (2006). Bayesian feature and model selection for Gaussian mixture models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6), 1013–1018.
Cooper, G. F. (1990). The computational complexity of probabilistic inference using Bayesian Inference. Artificial Intelligence, 42, 393–405.
Cooper, G. F., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9, 309–347.
Cowell, R. (2001). Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In J. Breese, & D. Koller (Eds.), Proceedings of the 17th Conference on Uncertainty in AI (pp. 91–97). Burlington: Morgan Kaufmann.
Dagum, P., & Luby, M. (1993). Approximating probabilistic inference in Bayesian belief networks is NP-hard. Artificial Intelligence, 60(1), 141–154.
Dagum, P., & Luby, M. (1997). An optimal approximation algorithm for Bayesian inference. Artificial Intelligence, 93, 1–27.
Darwiche, A. (2001). Constant-space reasoning in dynamic Bayesian networks. International Journal of Approximate Reasoning, 26(3), 161–178.
Dauwels, J., Korl, S., & Loeliger, H.-A. (2005). Expectation maximization as message passing. In Proceedings of IEEE International Symposium on Information Theory (1–4). Adelaide, Australia.
Dawid, A. P. (1992). Applications of a general propagation algorithm for probalilistic expert systems. Statistics and Computing, 2, 25–36.
de Campos, L. M., & Castellano, J. G. (2007). Bayesian network learning algorithms using structural restrictions. International Journal of Approximate Reasoning, 45, 233–254.
de Campos, C. P., & Ji, Q. (2011). Efficient structure learning of Bayesian networks using constraints. Journal of Machine Learning Research, 12, 663–689.
Del Moral, P., Doucet, A., & Jasra, A. (2006). Sequential Monte Carlo samplers. Journal of the Royal Statistical Society: Series B, 68(3), 411–436.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
Deneve, S. (2008). Bayesian spiking neurons. I: Inference. Neural Computation, 20(1), 91–117.
Du, K.-L., & Swamy, M. N. S. (2010). Wireless communication systems. Cambridge: Cambridge University Press.
El-Hay, T., Friedman, N., & Kupferman, R. (2008). Gibbs sampling in factorized continuous-time Markov processes. In Proceedings of the 24th Conference on Uncertainty in AI.
Elidan, G., & Friedman, N. (2005). Learning hidden variable networks: The information bottleneck approach. Journal of Machine Learning Research, 6, 81–127.
Engel, A., & Van den Broeck, C. (2001). Statistical mechanics of learning. Cambridge: Cambridge University Press.
Ephraim, Y., & Merhav, N. (2002). Hidden markov processes. IEEE Transactions on Information Theory, 48(6), 1518–1569.
Fan, Y., Xu, J., & Shelton, C. R. (2010). Importance sampling for continuous time Bayesian networks. Journal of Machine Learning Research, 11, 2115–2140.
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 305(5814), 972–976.
Friedman, N. (1997). Learning Bayesian networks in the presence of missing values and hidden variables. In D. Fisher (Ed.), Proceedings of the 14th Conference on Uncertainty in AI (pp. 125–133). San Francisco: Morgan Kaufmann.
Gandhi, P., Bromberg, F., & Margaritis, D. (2008). Learning Markov network structure using few independence tests. In Proceedings of SIAM International Conference on Data Mining (pp. 680–691).
Gao, B., Woo, W. L., & Dlay, S. S. (2012). Variational regularized 2-D nonnegative matrix factorization. IEEE Transactions on Neural Networks and Learning Systems, 23(5), 703–716.
Gaussier, E., & Goutte, C. (2005). Relation between PLSA and NMF and implications. In Proceedings of Annual ACMSIGIR Conference on Research and Development in Information Retrieval (pp. 601–602).
Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85, 398–409.
Gelly, S., & Teytaud, O. (2005). Bayesian networks: A better than frequentist approach for parametrization, and a more accurate structural complexity measure than the number of parameters. In Proceedings of CAP. Nice, France.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741.
Getoor, L., Friedman, N., Koller, D., & Taskar, B. (2002). Learning probabilistic models of link structure. Journal of Machine Learning Research, 3, 679–707.
Ghahramani, Z., & Beal, M. (1999). Variational inference for Bayesian mixture of factor analysers. Advances in neural information processing systems (Vol. 12). Cambridge: MIT Press.
Globerson, A., & Jaakkola, T. (2007). Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In Advances in neural information processing systems (Vol. 20, pp. 553–560). Vancouver, Canada.
Gonen, M., Tanugur, A. G., & Alpaydin, E. (2008). Multiclass posterior probability support vector machines. IEEE Transactions on Neural Networks, 19(1), 130–139.
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732.
Handschin, J. E., & Mayne, D. Q. (1969). Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering. International Journal of Control, 9(5), 547–559.
Hammersely, J. M., & Morton, K. W. (1954). Poor man’s Monte Carlo. Journal of the Royal Statistical Society: Series B, 16, 23–38.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57, 97–109.
Hazan, T., & Shashua, A. (2010). Norm-product belief propagation: Primal-dual message-passing for approximate inference. IEEE Transactions on Information Theory, 56(12), 6294–6316.
Heckerman, D. (1995). A Tutorial on learning with Bayesian networks. Microsoft Technical Report MSR-TR-95-06 (Revised Nov 1996).
Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3), 197–243.
Hennig, P., & Kiefel, M. (2013). Quasi-Newton methods: A new direction. Journal of Machine Learning Research, 14, 843–865.
Heskes, T. (2004). On the uniqueness of loopy belief propagation fixed points. Neural Computation, 16, 2379–2413.
Hojen-Sorensen, P. A., & d. F. R., Winther, O., & Hansen, L. K., (2002). Mean-field approaches to independent component analysis. Neural Computation, 14, 889–918.
Holmes, C. C., & Mallick, B. K. (1998). Bayesian radial basis functions of variable dimension. Neural Computation, 10(5), 1217–1233.
Huang, Q., Yang, J., & Zhou, Y. (2008). Bayesian nonstationary source separation. Neurocomputing, 71, 1714–1729.
Huang, J. C., & Frey, B. J. (2011). Cumulative distribution networks and the derivative-sum-product algorithm: Models and inference for cumulative distribution functions on graphs. Journal of Machine Learning Research, 12, 301–348.
Huang, S., Li, J., Ye, J., Fleisher, A., Chen, K., Wu, T., et al. (2013). A sparse structure learning algorithm for Gaussian Bayesian network identification from high-dimensional data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(6), 1328–1342.
Huda, S., Yearwood, J., & Togneri, R. (2009). A stochastic version of expectation maximization algorithm for better estimation of hidden Markov model. Pattern Recognition Letters, 30, 1301–1309.
Ickstadt, K., Bornkamp, B., Grzegorczyk, M., Wieczorek, J., Sheriff, M. R., Grecco, H. E., et al. (2010). Nonparametric Bayesian network. Bayesian. Statistics, 9, 283–316.
Ihler, A. T., Fisher, J. W, I. I. I., & Willsky, A. S. (2005). Loopy belief propagation: Convergence and effects of message errors. Journal of Machine Learning Research, 6, 905–936.
Jensen, F. V., Lauritzen, S. L., & Olesen, K. G. (1990). Bayesian updating in causal probabilistic networks by local computations. Computational Statistics Quaterly, 4, 269–282.
Jordan, M. I., Ghahramani, Z., Jaakkola, T. S., & Saul, L. K. (1999). An introduction to variational methods for graphical models. Machine Learning, 37, 183–233.
Kalisch, M., & Buhlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8, 613–636.
Khan, S. A., Leppaaho, E., & Kaski, S. (2016). Bayesian multi-tensor factorization. Machine Learning, 105(2), 233–253.
Khreich, W., Granger, E., Miri, A., & Sabourin, R. (2010). On the memory complexity of the forward-backward algorithm. Pattern Recognition Letters, 31, 91–99.
Kingma, D. P., & Welling, M. (2014). Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations (pp. 1–14). Banff, Canada.
Kjaerulff, U. (1995). dHugin: A computational system for dynamic time-sliced Bayesian networks. International Journal of Forecasting, 11(1), 89–113.
Koivisto, M., & Sood, K. (2004). Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5, 549–573.
Kolmogorov, V. (2015). A new look at reweighted message passing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(5), 919–930.
Kschischang, F. R., Frey, B. J., & Loeliger, H.-A. (2001). Factor graphs and the sum-product algorithm. Transactions on Information Theory, 47(2), 498–519.
Lam, W., & Bacchus, F. (1994). Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10(3), 269–293.
Lam, W., & Segre, A. M. (2002). A distributed learning algorithm for Bayesian inference networks. IEEE Transactions on Knowledge and Data Engineering, 14(1), 93–105.
Langari, R., Wang, L., & Yen, J. (1997). Radial basis function networks, regression weights, and the expectation-maximization algorithm. IEEE Transactions on Systems, Man, and Cybernetics, Part A, 27(5), 613–623.
Lappalainen, H., & Honkela, A. (2000). Bayesian nonlinear independent component analysis by multilayer perceptron. In M. Girolami (Ed.), Advances in independent component analysis (pp. 93–121). Berlin: Springer.
Lauritzen, S. L. (1992). Propagation of probabilities, means and variances in mixed graphical association models. Journal of the American Statistical Association, 87(420), 1098–1108.
Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application on expert systems. Journal of the Royal Statistical Society, Series B, 50(2), 157–224.
Lazaro, M., Santamaria, I., & Pantaleon, C. (2003). A new EM-based training algorithm for RBF networks. Neural Networks, 16, 69–77.
Lawrence, N. (2005). Probabilistic non-linear principal component analysis with Gaussian process latent variable models. Journal of Machine Learning Research, 6, 1783–1816.
Lawrence, N., Seeger, M., & Herbrich, R. (2003). Fast sparse Gaussian process methods: The informative vector machine. In Advances in Neural Information Processing Systems (Vol. 15, pp. 609–616).
Li, J., & Tao, D. (2013). Exponential family factors for Bayesian factor analysis. IEEE Transactions on Neural Networks and Learning Systems, 24(6), 964–976.
Liang, F. (2007). Annealing stochastic approximation Monte Carlo algorithm for neural network training. Machine Learning, 68, 201–233.
Liang, F., Liu, C., & Carroll, R. J. (2007). Stochastic approximation in Monte Carlo computation. Journal of the American Statistical Association, 102, 305–320.
Lindsten, F., Jordan, M. I., & Schon, T. B. (2014). Particle Gibbs with ancestor sampling. Journal of Machine Learning Research, 15, 2145–2184.
Lopez-Rubio, E. (2009). Multivariate Student-t self-organizing maps. Neural Networks, 22, 1432–1447.
Lu, X., Wang, Y., & Yuan, Y. (2013). Sparse coding from a Bayesian perspective. IEEE Transactions on Neural Networks and Learning Systems, 24(6), 929–939.
Luis, R., Sucar, L. E., & Morales, E. F. (2010). Inductive transfer for learning Bayesian networks. Machine Learning, 79, 227–255.
Ma, S., Ji, C., & Farmer, J. (1997). An efficient EM-based training algorithm for feedforward neural networks. Neural Networks, 10, 243–256.
Ma, J., Xu, L., & Jordan, M. I. (2000). Asymptotic convergence rate of the EM algorithm for Gaussian mixtures. Neural Computation, 12, 2881–2907.
Mackay, D. J. C. (1992). A practical Bayesian framework for backpropagation networks. Neural Computation, 4(3), 448–472.
Margaritis, D., & Thrun, S. (2000). Bayesian network induction via local neighborhoods. In S. A. Solla, T. K. Leen, & K.-R. Muller (Eds.), Advances in neural information processing systems (Vol. 12, pp. 505–511). Cambridge: MIT Press.
Martins, A. F. T., Figueiredo, M. A. T., Aguiar, P. M. Q., Smith, N. A., & Xing, E. P. (2015). AD\(^3\): Alternating directions dual decomposition for MAP inference in graphical models. Journal of Machine Learning Research, 16, 495–545.
Mateescu, R., & Dechter, R. (2009). Mixed deterministic and probabilistic networks: A survey of recent results. Annals of Mathematics and Artificial Intelligence, 54, 3–51.
Meilijson, I. (1989). A fast improvement to the EM algorithm on its own terms. Journal of the Royal Statistical Society: Series B, 51(1), 127–138.
Minka, T. (2001). Expectation propagation for approximate Bayesian inference. Doctoral Dissertation, MIT Media Lab.
Miskin, J. W., & MacKay, D. J. C. (2001). Ensemble learning for blind source separation. In S. Roberts & R. Everson (Eds.), Independent component analysis: Principles and practice (pp. 209–233). Cambridge: Cambridge University Press.
Mnih, A., & Salakhutdinov, R. R. (2007). Probabilistic matrix factorization. In Advances in neural information processing systems (Vol. 20, pp. 1257–1264). Red Hook: Curran & Associates Inc.
Mongillo, G., & Deneve, S. (2008). Online learning with hidden Markov models. Neural Computation, 20, 1706–1716.
Moral, S., Rumi, R., & Salmeron, A. (2001). Mixtures of truncated exponentials in hybrid Bayesian networks. In S. Benferhat, & P. Besnard (Eds.), Proceedings of the 6th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty, LNCS 2143 (pp. 156–167). Berlin: Springer.
Nasios, N., & Bors, A. (2006). Variational learning for Gaussian mixtures. IEEE Transactions on Systems Man and Cybernetics, Part B, 36(4), 849–862.
Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto, Toronto.
Ngo, L., & Haddawy, P. (1995). Probabilistic logic programming and bayesian networks. In Algorithms, Concurrency and Knowledge (Proceedings of Asian Computing Science Conference), LNCS (Vol. 1023, pp. 286–300). Berlin: Springer.
Nielsen, S. H., & Nielsen, T. D. (2008). Adapting Bayes network structures to non-stationary domains. International Journal of Approximate Reasoning, 49, 379–397.
Nodelman, U., Shelton, C. R., & Koller, D. (2002). Continuous time Bayesian networks. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI) (pp. 378–387).
Noorshams, N., & Wainwright, M. J. (2013). Stochastic belief propagation: A low-complexity alternative to the sum-product algorithm. IEEE Transactions on Information Theory, 59(4), 1981–2000.
Norton, M., Mafusalov, A., & Uryasev, S. (2017). Soft margin support vector classification as buffered probability minimization. Journal of Machine Learning Research, 18, 1–43.
Opper, M. (1998). A Bayesian approach to online learning. In D. Saad (Ed.), On-line learning in neural networks (pp. 363–378). Cambridge: Cambridge University Press.
Opper, M., & Winther, O. (2000). Gaussian processes for classification: Mean field algorithms. Neural Computation, 12, 2655–2684.
Opper, M., & Winther, O. (2001). Tractable approximations for probabilistic models: The adaptive Thouless-Anderson-Palmer mean field approach. Physical Review Letters, 86, 3695–3699.
Opper, M., & Winther, O. (2005). Expectation consistent approximate inference. Journal of Machine Learning Research, 6, 2177–2204.
Osoba, O., & Kosko, B. (2016). The noisy expectation-maximization algorithm for multiplicative noise injection. Fluctuation and Noise Letters, 15(1), paper ID 1650007.
Osoba, O., Mitaim, S., & Kosko, B. (2011). Noise benefits in the expectation-maximization algorithm: NEM theorems and models. In Proccedings of the International Joint Conference on Neural Networks (IJCNN) (pp. 3178–3183). San Jose, CA.
Ott, G. (1967). Compact encoding of stationary markov sources. IEEE Transactions on Information Theory, 13(1), 82–86.
Park, H., & Ozeki, T. (2009). Singularity and slow convergence of the EM algorithm for Gaussian mixtures. Neural Processing Letters, 29, 45–59.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo: Morgan Kaufmann.
Perez, A., Larranaga, P., & Inza, I. (2009). Bayesian classifiers based on kernel density estimation: Flexible classifiers. International Journal of Approximate Reasoning, 50, 341–362.
Pietra, S. D., Pietra, V. D., & Lafferty, J. (1997). Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4), 380–393.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–286.
Raviv, J. (1967). Decision making in markov chains applied to the problem of pattern recognition. IEEE Transactions on Information Theory, 13(4), 536–551.
Richardson, S., & Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). Journal of the Royal Statistical Society: Series B, 59(4), 731–792.
Robert, C. P., & Casella, G. (2004). Monte Carlo Statistical Methods. New York: Springer.
Romero, V., Rumi, R., & Salmeron, A. (2006). Learning hybrid Bayesian networks using mixtures of truncated exponentials. International Journal of Approximate Reasoning, 42, 54–68.
Roos, T., Grunwald, P., & Myllymaki, P. (2005). On discriminative Bayesian network classifiers and logistic regression. Machine Learning, 59, 267–296.
Rosipal, R., & Girolami, M. (2001). An expectation-maximization approach to nonlinear component analysis. Neural Computation, 13, 505–510.
Roweis, S. (1998). EM algorithms for PCA and SPCA. Advances in neural information processing systems (Vol. 10, pp. 626–632). Cambridge: MIT Press.
Rusakov, D., & Geiger, D. (2005). Asymptotic model selection for naive Bayesian networks. Journal of Machine Learning Research, 6, 1–35.
Sarela, J., & Valpola, H. (2005). Denoising source separation. Journal of Machine Learning Research, 6, 233–272.
Sato, M. (2001). Online model selection based on the variational Bayes. Neural Computation, 13, 1649–1681.
Scutari, M. (2010). Learning Bayesian networks with the bnlearn R package. Journal of Statistical Software, 35(3), 1–22.
Seeger, M. W. (2008). Bayesian inference and optimal design for the sparse linear model. Journal of Machine Learning Research, 9, 759–813.
Shashanka, M., Raj, B., & Smaragdis, P. (2008). Probabilistic latent variable models as nonnegative factorizations. Computational Intelligence and Neuroscience, 2008, 947438, 9 p.
Shelton, C. R., Fan, Y., Lam, W., Lee, J., & Xu, J. (2010). Continuous time Bayesian network reasoning and learning engine. Journal of Machine Learning Research, 11, 1137–1140.
Shutin, D., Zechner, C., Kulkarni, S. R., & Poor, H. V. (2012). Regularized variational Bayesian learning of echo state networks with delay & sum readout. Neural Computation, 24, 967–995.
Silander, T., & Myllymaki, P. (2006). A simple approach for finding the globally optimal Bayesian network structure. In Proceedings of the 22th Conference on Uncertainty in AI (pp. 445–452).
Silander, T., Kontkanen, P., & Myllymaki, P. (2007). On sensitivity of the MAP Bayesian network structure to the equivalent sample size parameter. In R. Parr, & L. van der Gaag (Eds.), Proceedings of the 23rd Conference on Uncertainty in AI (pp. 360–367). AUAI Press.
Silander, T., Roos, T., & Myllymaki, P. (2009). Locally minimax optimal predictive modeling with Bayesian networks. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR Proceedings Track (Vol. 5, 504–511). Clearwater Beach, FL.
Smyth, P., Hecherman, D., & Jordan, M. I. (1997). Probabilistic independent networks for hidden Markov probabilities models. Neural Computation, 9(2), 227–269.
Spiegelhalter, D. J., & Lauritzen, S. L. (1990). Sequential updating of conditional probabilities on directed graphical structures. Networks, 20(5), 579–605.
Spirtes, P., Glymour, C., & Scheines, R. (2000). Causation, prediction, and search (2nd ed.). Cambridge: MIT Press.
Takeda, A., & M. Sugiyama, (2008). \(\nu \)-support vector machine as conditional value-at-risk minimization. In Proceedings of the ACM 25th international conference on machine learning (pp. 1056–1063).
Takekawa, T., & Fukai, T. (2009). A novel view of the variational Bayesian clustering. Neurocomputing, 72, 3366–3369.
Tamada, Y., Imoto, S., & Miyano, S. (2011). Parallel algorithm for learning optimal Bayesian network structure. Journal of Machine Learning Research, 12, 2437–2459.
Tan, X., & Li, J. (2010). Computationally efficient sparse Bayesian learning via belief propagation. IEEE Transactions on Signal Processing, 58(4), 2010–2021.
Tanner, M., & Wong, W. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82(398), 528–540.
Tao, Q., Wu, G., Wang, F., & Wang, J. (2005). Posterior probability support vector machines for unbalanced data. IEEE Transactions on Neural Networks, 16(6), 1561–1573.
Tatikonda, S., & Jordan, M. (2002). Loopy belief propagation and Gibbs measures. In Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (pp. 493–500). San Francisco, CA: Morgan Kaufmann.
Ting, J.-A., D’Souza, A., Vijayakumar, S., & Schaal, S. (2010). Efficient learning and feature selection in high-dimensional regression. Neural Computation, 22, 831–886.
Tipping, M. E. (2000). The relevance vector machine. In Advances in neural information processing systems (Vol. 12, pp. 652–658).
Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society: Series B, 61(3), 611–622.
Tipping, M. E., & Bishop, C. M. (1999). Mixtures of probabilistic principal component analyzers. Neural Computation, 11, 443–482.
Tipping, M. E., & Faul, A. C. (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (pp. 1–13). Key West, FL.
Tsamardinos, I., Brown, L. E., & Aliferis, C. F. (2006). The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65(1), 31–78.
Tzikas, D. G., Likas, A. C., & Galatsanos, N. P. (2009). Sparse Bayesian modeling with adaptive kernel learning. IEEE Transactions on Neural Networks, 20(6), 926–937.
Ueda, N., Nakano, R., Ghahramani, Z., & Hinton, G. E. (2000). SMEM algorithm for mixture models. Neural Computation, 12, 2109–2128.
Valpola, H., & Pajunen, P. (2000). Fast algorithms for Bayesian independent component analysis. In Proceedings of the 2nd International Workshop on Independent Component Analysis and Signal Separation (pp. 233–237). Helsinki, Finland.
Valpola, H. (2000). Nonlinear independent component analysis using ensemble learning: Theory. In Proceedings of the 2nd International Workshop on Independent Component Analysis and Signal Separation (pp. 251–256). Helsinki, Finland.
Valpola, H., & Karhunen, J. (2002). An unsupervised ensemble learning for nonlinear dynamic state-space models. Neural Computation, 141(11), 2647–2692.
Verma, T., & Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the 6th Conference on Uncertainty in AI (255–268). Cambridge, MA.
Wang, N., Yao, T., Wang, J., & Yeung, D.-Y. (2012). A probabilistic approach to robust matrix factorization. In Proceedings of the 12th European Conference on Computer Vision (pp. 126–139). Florence, Italy.
Watanabe, K., & Watanabe, S. (2006). Stochastic complexities of Gaussian mixtures in variational Bayesian approximation. Journal of Machine Learning Research, 7(4), 625–644.
Watanabe, K., & Watanabe, S. (2007). Stochastic complexities of general mixture models in variational Bayesian learning. Neural Networks, 20, 210–219.
Watanabe, K., Akaho, S., Omachi, S., & Okada, M. (2009). VB mixture model on a subspace of exponential family distributions. IEEE Transactions on Neural Networks, 20(11), 1783–1796.
Weiss, Y., & Freeman, W. T. (2001). Correctness of belief propagation in Gaussian graphical models of arbitrary topology. Neural Computation, 13(10), 2173–2200.
Welling, M., & Weber, M. (2001). A constrained EM algorithm for independent component analysis. Neural Computation, 13, 677–689.
Winn, J., & Bishop, C. M. (2005). Variational message passing. Journal of Machine Learning Research, 6, 661–694.
Winther, O., & Petersen, K. B. (2007). Flexible and efficient implementations of Bayesian independent component analysis. Neurocomputing, 71, 221–233.
Xiang, Y. (2000). Belief updating in multiply sectioned Bayesian networks without repeated local propagations. International Journal of Approximate Reasoning, 23, 1–21.
Xie, X., & Geng, Z. (2008). A recursive method for structural learning of directed acyclic graphs. Journal of Machine Learning Research, 9, 459–483.
Xie, X., Yan, S., Kwok, J., & Huang, T. (2008). Matrix-variate factor analysis and its applications. IEEE Transactions on Neural Networks, 19(10), 1821–1826.
Xu, L., Jordan, M. I., & Hinton, G. E. (1995). An alternative model for mixtures of experts. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems (Vol. 7, pp. 633–640). Cambridge: MIT Press.
Yamazaki, K., & Watanabe, S. (2003). Singularities in mixture models and upper bounds of stochastic complexity. Neural Networks, 16, 1023–1038.
Yang, Z. R. (2006). A novel radial basis function neural network for discriminant analysis. IEEE Transactions on Neural Networks, 17(3), 604–612.
Yap, G.-E., Tan, A.-H., & Pang, H.-H. (2008). Explaining inferences in Bayesian networks. Applied Intelligence, 29, 263–278.
Yedidia, J. S., Freeman, W. T., & Weiss, Y. (2001). Generalized belief propagation. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 689–695). Cambridge: MIT Press.
Yuille, A. (2002). CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation. Neural Computation, 14, 1691–1722.
Zhang, B., Zhang, C., & Yi, X. (2004). Competitive EM algorithm for finite mixture models. Pattern Recognition, 37, 131–144.
Zhao, J., Yu, P. L. H., & Kwok, J. T. (2012). Bilinear probabilistic principal component analysis. IEEE Transactions on Neural Networks and Learning Systems, 23(3), 492–503.
Zhao, Q., Meng, D., Xu, Z., Zuo, W., & Yan, Y. (2015). \(L_1\)-norm low-rank matrix factorization by variational Bayesian method. IEEE Transactions on Neural Networks and Learning Systems, 26(4), 825–839.
Zhao, Q., Zhou, G., Zhang, L., Cichocki, A., & Amari, S.-I. (2016). Bayesian robust tensor factorization for incomplete multiway data. IEEE Transactions on Neural Networks and Learning Systems, 27(4), 736–748.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer-Verlag London Ltd., part of Springer Nature
About this chapter
Cite this chapter
Du, KL., Swamy, M.N.S. (2019). Probabilistic and Bayesian Networks. In: Neural Networks and Statistical Learning. Springer, London. https://doi.org/10.1007/978-1-4471-7452-3_22
Download citation
DOI: https://doi.org/10.1007/978-1-4471-7452-3_22
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-7451-6
Online ISBN: 978-1-4471-7452-3
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)