Synonyms
Data mining in bioinformatics; Data mining in computational biology; Data mining in systems biology; Machine learning in bioinformatics; Machine learning in systems biology
Definition
Advances in high throughput sequencing and “omics” technologies and the resulting exponential growth in the amount of macromolecular sequence, structure, gene expression measurements, have unleashed a transformation of biology from a data-poor science into an increasingly data-rich science. Despite these advances, biology today, much like physics was before Newton and Leibnitz, has remained a largely descriptive science. Machine learning [6] currently offers some of the most cost-effective tools for building predictive models from biological data, e.g., for annotating new genomic sequences, for predicting macromolecular function, for identifying functionally important sites in proteins, for identifying genetic markers of diseases, and for discovering the networks of genetic interactions that...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Andorf C, Dobbs D, Honavar V. Exploring inconsistencies in genome-wide protein function annotations: a machine learning approach. BMC Bioinform. 2007;8(1):284.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. Nat Gene. 2000;25(1):25–9.
Baldi P, Brunak S. Bioinformatics: the machine learning approach. Cambridge, MA: MIT; 2001.
Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. Genbank. Nucleic Acids Res. 2007;35D(Database issue):21–D25.
Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The protein data bank. Nucleic Acids Res. 2000;28(1):235–42.
Bishop CM. Pattern recognition and machine learning. Berlin: Springer; 2006.
Boutell MR, Luo J, Shen X, Brown CM. Learning multi-label scene classification. Pattern Recogn. 2004;37(9):1757–71.
Bruggeman FJ, Westerhoff HV. The nature of systems biology. Trends Microbiol. 2007;15(1):15–50.
Caragea C, Sinapov J, Dobbs D, and Honavar V. Assessing the performance of macromolecular sequence classifiers. In: Proceedings of the IEEE 7th International Symposium on Bioinformatics and Bioengineering; 2007. p. 320–6.
de Jong H. Modeling and simulation of genetic regulatory systems: a literature review. J Comput Biol. 2002;9(1):67–103.
Diettrich TG. Ensemble methods in machine learning. Springer, Berlin. In: Proceedings of the 1st International Workshop on Multiple Classifier Systems; 2000. p. 1–15.
Diettrich TG. Machine learning for sequential data: a review. In: Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition; 2002. p. 15–30.
El-Manzalawy Y, Dobbs D, Honavar V. On evaluating MHC-II binding peptide prediction methods. PLoS One. 2008;3(9):e3268.
El-Manzalawy Y., Dobbs D., Honavar V. Predicting linear B-cell epitopes using string kernels. J Mole Recogn. 2008; 21(4):243–255.
Friedman N, Linial M, Nachman I, Pe’er D. Using bayesian networks to analyze expression data. J Comput Biol. 2000;7(3–4):601–20.
Galperin MY. The molecular biology database collection: 2008 update. Nucleic Acids Res. 2008;36(Database issue):D2–4.
Guyon I, Elisseeff A. An introduction to variable and feature selection. J Mach Learn Res. 2003;3(7–8):1157–82.
Hecker L, Alcon T, Honavar V, Greenlee H. Querying multiple large-scale gene expression datasets from the developing retina using a seed network to prioritize experimental targets. Bioinform Biol Insights. 2008;2:91–102.
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi A-L. The large-scale organization of metabolic networks. Nature. 1987;407(6804):651–4.
Lahdesmaki H, Shmulevich I, Yli-Harja O. On learning gene regulatory networks under the boolean network model. Mach Learn. 2007;52(1–2):147–67.
Terribilini M, Lee J-H, Yan C, Jernigan RL, Honavar V, Dobbs D. Predicting RNA-binding sites from amino acid sequence. RNA J. 2006;12(8):1450–62.
Yan C, Terribilini M, Wu F, Jernigan RL, Dobbs D, Honavar V. Predicting DNA-binding sites of proteins from amino acid sequence. BMC Bioinform. 2006;7:262.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Caragea, C., Honavar, V. (2018). Machine Learning in Computational Biology. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_636
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_636
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering