Abstract
The investigation of the interplay between genes, proteins, metabolites and diseases plays a central role in molecular and cellular biology. Whole genome sequencing has made it possible to examine the behavior of all the genes in a genome by high-throughput experimental techniques and to pinpoint molecular interactions on a genome-wide scale, which form the backbone of systems biology. In particular, Bayesian network (BN) is a powerful tool for the ab-initial identification of causal and non-causal relationships between biological factors directly from experimental data. However, scalability is a crucial issue when we try to apply BNs to infer such interactions. In this paper, we not only introduce the Bayesian network formalism and its applications in systems biology, but also review recent technical developments for scaling up or speeding up the structural learning of BNs, which is important for the discovery of causal knowledge from large-scale biological datasets. Specifically, we highlight the basic idea, relative pros and cons of each technique and discuss possible ways to combine different algorithms towards making BN learning more accurate and much faster.
Similar content being viewed by others
References
Akaike H (1974). A new look at the statistical model identification. IEEE Trans Automat Control, 19(6): 716–723
Chickering D M (1995). A Transformational Characterization of Equivalent Bayesian Network Structures. Proc 11th Ann Conf Uncertainty Artif Intell, 87–98
Cvijovic D, Klinowski J (1995). Taboo search — an approach to the multiple minima problem. Science, 267: 664–666
Efron B, Hastie T, Johnstone I, Tibshirani R (2004). Least angle regression. Ann Statis, 32: 407–499
Friedman N (1997). Learning Belief Networks in the Presence of Missing Values and Hidden Variables. Proc 14th Intl Conf Mach Learn, 125–133
Fu S, Desmarais M (2008). Fast Markov Blanket Discovery Algorithm Via Local Learning within Single Pass. Canadian Conf AI, 96–107
Geiger D, Heckerman D (1995). Learning Gaussian Networks. Proc 10th Ann Conf Uncertainty Artif Intell, 235–243
Geman S, Geman D (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Automat Control, 6(6): 721–741
Giudici P, Castelo R (2003). Improving Markov chain Monte Carlo Model search for data mining. Mach Learn, 50(1–2): 127–158
Grünwald P (2007). The Minimum Description Length principle. Cambridge, MA: MIT Press
Heckerman D (1999). A Tutorial on Learning with Bayesian Networks. In: Jordan M, ed. Learning in Graphical Models. Cambridge, MA: MIT Press
Heckerman D, Geiger D, Chickering D M (1995). Learning Bayesian Networks: The Combination of Knowledge and Statistical Data. Machine Learning, 20(3): 197–243
Koivisto M (2006). Advances in Exact Bayesian Structure Discovery in Bayesian Networks. Proc 22nd Conf Uncertainty Artif Intell
Koivisto M, Sood K (2004). Exact Bayesian structure discovery in Bayesian networks. J Mach Learn Res, 5: 549–573
Koller D, Friedman N (2009). Probabilistic Graphical Models: Principles and Techniques. Cambridge, MA: MIT Press
Lauritzen S L, Spiegelhalter D J (1988). Local computations with probabilities on graphical structures and their application to expert systems. J Royal Statist Society. Series B (Methodological), 50(2): 157–224
Meek C (1995). Causal inference and causal explanation with background knowledge. Proc 11th Ann Conf Uncertainty Artif Intell: 403–410
Moore A W, Lee M S (1998). Cached sufficient statistics for efficient machine learning with large datasets. J Artif Intell Res (JAIR)8: 67–91
Peña J M, Nilsson R, Björkegren J, Tegnér J (2007). Towards scalable and data efficient learning of Markov boundaries. Intl J Approx Reasoning, 45(2): 211–232
Pearl J (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Fransisco, CA: Morgan Kaufmann Publishers
Pearl J, Verma T (1991). A Theory of Inferred Causation. Proc 2nd Intl Conf Princip Knowledge Representation and Reasoning (KR’91): 441–452
Schwarz G E (1978). Estimating the dimension of a model. Ann Statis, 6(2): 461–464
Silander T, Myllymäki P (2006). A Simple Approach for Finding the Globally Optimal Bayesian Network Structure. Proc 22nd Conf Uncertainty Artif Intell
Spirtes P, Glymour C, Scheines R (2001). Causation, Prediction, and Search, 2nd ed. Cambridge, MA: MIT Press
Tsamardinos I, Brown L E, Aliferis C F (2006). The max-min hillclimbing Bayesian network structure learning algorithm. Mach Learn, 65(1): 31–78
van Steensel B, Braunschweig U, Filion G J, Chen M, van Bemmel J G, Ideker T (2010). Bayesian network analysis of targeting interactions in chromatin. Genome Res, 20: 190–200
Verma T, Pearl J (1991). Equivalence and synthesis of causal models. Proc Sixth Ann Conf Uncertainty Artif Intell, 255-270
Xie X, Geng Z (2008). A recursive method for structural learning of directed acyclic graphs. J Mach Learn Res, 9: 459–483
Yu H, Zhu S S, Zhou B, Xue H L, Han J D J (2008). Inferring causal relationships among different histone modifications and gene expression. Genome Res, 18(8): 1314–1324
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Liu, Y., Han, JD.J. Application of Bayesian networks on large-scale biological data. Front. Biol. 5, 98–104 (2010). https://doi.org/10.1007/s11515-010-0023-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11515-010-0023-8