Skip to main content

GA-Facilitated Knowledge Discovery and Pattern Recognition Optimization Applied to the Biochemistry of Protein Solvation

  • Conference paper
Genetic and Evolutionary Computation – GECCO 2004 (GECCO 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3102))

Included in the following conference series:

Abstract

The authors present a GA optimization technique for cosine-based k-nearest neighbors classification that improves predictive accuracy in a class-balanced manner while simultaneously enabling knowledge discovery. The GA performs feature selection and extraction by searching for feature weights and offsets maximizing cosine classifier performance. GA-selected feature weights determine the relevance of each feature to the classification task. This hybrid GA/classifier provides insight to a notoriously difficult problem in molecular biology, the correct treatment of water molecules mediating ligand binding to proteins. In distinguishing patterns of water conservation and displacement, this method achieves higher accuracy than previous techniques. The data mining capabilities of the hybrid system improve the understanding of the physical and chemical determinants governing favored protein-water binding.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Trunk, G.V.: A problem of dimensionality: A simple example. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 306–307 (1979)

    Article  Google Scholar 

  2. Liu, H., Motodata, H.: Feature Selection for Knowledge Discovery and Data Mining, pp. 73–95. Kulwer Academic Publishers, Boston (1998)

    MATH  Google Scholar 

  3. Kelly, J.D., Davis, L.: Hybridizing the genetic algorithm and the k nearest neighbors classification algorithm. In: Proceedings of the Fourth International Conference on Genetic Algorithms and their Applications, pp. 377–383 (1991)

    Google Scholar 

  4. Siedlecki, W., Sklansky, J.: A note on genetic algorithms for large-scale feature selection. Pattern Recognition Letters 10, 335–347 (1989)

    Article  MATH  Google Scholar 

  5. Punch, W.F., Goodman, E.D., Pei, M., Chia-Shun, L., Hovland, P., Enbody, R.: Further research on feature selection and classification using genetic algorithms. In: Proc. International Conference on Genetic Algorithms 93, pp. 557–564 (1993)

    Google Scholar 

  6. Raymer, M.L., Punch, W.F., Goodman, E.D., Kuhn, L.A., Jain, A.K.: Dimensionality reduction using genetic algorithms. IEEE Transactions on Evolutionary Computation 4(5), 164–171 (2000)

    Article  Google Scholar 

  7. Han, E., Karypis, G.: Centroid-based document classification: Analysis & results. In: Principles of Data Mining and Knowledge Discovery: fourth European Conference, pp. 424–431 (2000)

    Google Scholar 

  8. Brown, M.P.S., Grundy, W.N., Lin, D., Cristianini, N., Sugnet, C.W., Furey, T.S., M.A. Jr., Haussler, D.: Knowledge-based analysis of microarray gene expression data by using support vector machines. Proceedings of the National Academy of Science 97, 262–267 (2000)

    Google Scholar 

  9. Han, E., Karypis, G., Kumar, V.: Text categorization using weight adjusted k-nearest neighbor classification. In: Advances in Knowledge Discovery and Data Mining: fifth Pacific-Asia Conference, pp. 53–65 (2001)

    Google Scholar 

  10. Raymer, M.L., Sanschagrin, P.C., Punch, W.F., Venkataraman, S., Goodman, E.D., Kuhn, L.A.: Predicting conserved water-mediated and polar ligand interactions in proteins using a k-nearest-neighbors genetic algorithm. J. Mol. Biol. 265, 445–464 (1997)

    Article  Google Scholar 

  11. Vedani, A., Huhta, D.W.: An algorithm for the systematic solvation of proteins based on the directionality of hydrogen bonds. J. Am. Chem. Soc. 113, 5860–5862 (1991)

    Article  Google Scholar 

  12. Pitt, W.R., Murray-Rust, J., Goodfellow, J.M.: AQUARIUS2: Knowledgebased modeling of solvent sites around proteins. J. Comp. Chem. 14(9), 1007–1018 (1993)

    Article  Google Scholar 

  13. Kuramochi, M., Karypis, G.: Gene classification using expression profiles: a feasibility study. In: Proceedings of the Second Annual IEEE International Symposium on Bioinformatics and Bioengineering, pp. 191–200 (2001)

    Google Scholar 

  14. Jain, A.K., Dubes, R.C., Chen, C.C.: Bootstrap techniques for error estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 9, 628–633 (1987)

    Article  MATH  Google Scholar 

  15. Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., Johannes, R.S.: Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In: Proceedings of the Symposium on Computer Applications and Medical Care, pp. 261–265. IEEE Computer Society Press, Los Alamitos (1988)

    Google Scholar 

  16. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  17. Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques with Java Implementations, pp. 265–319. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Peterson, M.R., Doom, T.E., Raymer, M.L. (2004). GA-Facilitated Knowledge Discovery and Pattern Recognition Optimization Applied to the Biochemistry of Protein Solvation. In: Deb, K. (eds) Genetic and Evolutionary Computation – GECCO 2004. GECCO 2004. Lecture Notes in Computer Science, vol 3102. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24854-5_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24854-5_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22344-3

  • Online ISBN: 978-3-540-24854-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics