GA-Facilitated Knowledge Discovery and Pattern Recognition Optimization Applied to the Biochemistry of Protein Solvation

Peterson, Michael R.; Doom, Travis E.; Raymer, Michael L.

doi:10.1007/978-3-540-24854-5_43

Michael R. Peterson¹⁶,
Travis E. Doom¹⁶ &
Michael L. Raymer¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3102))

Included in the following conference series:

Genetic and Evolutionary Computation Conference

1727 Accesses
6 Citations

Abstract

The authors present a GA optimization technique for cosine-based k-nearest neighbors classification that improves predictive accuracy in a class-balanced manner while simultaneously enabling knowledge discovery. The GA performs feature selection and extraction by searching for feature weights and offsets maximizing cosine classifier performance. GA-selected feature weights determine the relevance of each feature to the classification task. This hybrid GA/classifier provides insight to a notoriously difficult problem in molecular biology, the correct treatment of water molecules mediating ligand binding to proteins. In distinguishing patterns of water conservation and displacement, this method achieves higher accuracy than previous techniques. The data mining capabilities of the hybrid system improve the understanding of the physical and chemical determinants governing favored protein-water binding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Trunk, G.V.: A problem of dimensionality: A simple example. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 306–307 (1979)
Article Google Scholar
Liu, H., Motodata, H.: Feature Selection for Knowledge Discovery and Data Mining, pp. 73–95. Kulwer Academic Publishers, Boston (1998)
MATH Google Scholar
Kelly, J.D., Davis, L.: Hybridizing the genetic algorithm and the k nearest neighbors classification algorithm. In: Proceedings of the Fourth International Conference on Genetic Algorithms and their Applications, pp. 377–383 (1991)
Google Scholar
Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection. Pattern Recognition Letters 10, 335–347 (1989)
Article MATH Google Scholar
Punch, W.F., Goodman, E.D., Pei, M., Chia-Shun, L., Hovland, P., Enbody, R.: Further research on feature selection and classification using genetic algorithms. In: Proc. International Conference on Genetic Algorithms 93, pp. 557–564 (1993)
Google Scholar
Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 4(5), 164–171 (2000)
Article Google Scholar
Han, E., Karypis, G.: Centroid-based document classification: Analysis & results. In: Principles of Data Mining and Knowledge Discovery: fourth European Conference, pp. 424–431 (2000)
Google Scholar
Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., M.A. Jr., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Science 97, 262–267 (2000)
Google Scholar
Han, E., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification. In: Advances in Knowledge Discovery and Data Mining: fifth Pacific-Asia Conference, pp. 53–65 (2001)
Google Scholar
Raymer, M.L., Sanschagrin, P.C., Punch, W.F., Venkataraman, S., Goodman, E.D., Kuhn, L.A.: Predicting conserved water-mediated and polar ligand interactions in proteins using a k-nearest-neighbors genetic algorithm. J. Mol. Biol. 265, 445–464 (1997)
Article Google Scholar
Vedani, A., Huhta, D.W.: An algorithm for the systematic solvation of proteins based on the directionality of hydrogen bonds. J. Am. Chem. Soc. 113, 5860–5862 (1991)
Article Google Scholar
Pitt, W.R., Murray-Rust, J., Goodfellow, J.M.: AQUARIUS2: Knowledgebased modeling of solvent sites around proteins. J. Comp. Chem. 14(9), 1007–1018 (1993)
Article Google Scholar
Kuramochi, M., Karypis, G.: Gene classification using expression profiles: a feasibility study. In: Proceedings of the Second Annual IEEE International Symposium on Bioinformatics and Bioengineering, pp. 191–200 (2001)
Google Scholar
Jain, A.K., Dubes, R.C., Chen, C.C.: Bootstrap techniques for error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 9, 628–633 (1987)
Article MATH Google Scholar
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society Press, Los Alamitos (1988)
Google Scholar
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques with Java Implementations, pp. 265–319. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Wright State University, Dayton, OH, 45345, USA
Michael R. Peterson, Travis E. Doom & Michael L. Raymer

Authors

Michael R. Peterson
View author publications
You can also search for this author in PubMed Google Scholar
Travis E. Doom
View author publications
You can also search for this author in PubMed Google Scholar
Michael L. Raymer
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Indian Institute of Technology Kanpur, Department of Mechanical Engineering, Kanpur Genetic Algorithms Laboratory (KanGAL), 208016, Kanpur, Uttar Pradesh, India
Kalyanmoy Deb

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Peterson, M.R., Doom, T.E., Raymer, M.L. (2004). GA-Facilitated Knowledge Discovery and Pattern Recognition Optimization Applied to the Biochemistry of Protein Solvation. In: Deb, K. (eds) Genetic and Evolutionary Computation – GECCO 2004. GECCO 2004. Lecture Notes in Computer Science, vol 3102. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24854-5_43

Download citation

DOI: https://doi.org/10.1007/978-3-540-24854-5_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22344-3
Online ISBN: 978-3-540-24854-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics