Skip to main content

Advertisement

Log in

kNNsim: k-Nearest neighbors similarity with genetic algorithm features optimization enhances the prediction of activity classes for small molecules

  • Original Paper
  • Published:
Journal of Molecular Modeling Aims and scope Submit manuscript

Abstract

Protein targets specificity classification is an important step in computational drug development and design efforts. The enhanced classification models of small chemical molecules enable the rapid scanning of large compounds databases. Here, we present the k-nearest neighbors with genetic algorithm feature optimization approach for selection of small molecule protein inhibitors. The method is trained on selected, diverse activity classes of the MDL drug data report (MDDR) with ligands described using simple atom pairs two dimensional chemical descriptors. The accuracy of inhibitors identification is presented in confusion tables with calculated recall and precision values. The precision for selected types of targets exceeded 70%, and the recall reaches 40%. As a consequence, the method can be easily applied to large commercial compounds collections in a drug development campaign in order to significantly reduce the number of ligands for further costly experimental validation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Hert J, Keiser MJ, Irwin JJ, Oprea TI, Shoichet BK (2008) J Chem Inf Model 48(4):755–765

    Article  CAS  Google Scholar 

  2. Keiser MJ, Roth BL, Armbruster BN, Ernsberger P, Irwin JJ, Shoichet BK (2007) Nat Biotechnol 25(2):197–206

    Article  CAS  Google Scholar 

  3. Bender A, Scheiber J, Glick M, Davies JW, Azzaoui K, Hamon J, Urban L, Whitebread S, Jenkins JL (2007) ChemMedChem 2(6):861–873

    Article  CAS  Google Scholar 

  4. Ji ZL, Wang Y, Yu L, Han LY, Zheng CJ, Chen YZ (2006) Toxicol Lett 164(2):104–112

    Article  CAS  Google Scholar 

  5. Plewczynski D, von Grotthuss M, Spieser SA, Rychlewski L, Wyrwicz LS, Ginalski K, Koch U (2007) Comb Chem High Throughput Screen 10(3):189–196

    Article  CAS  Google Scholar 

  6. Fang J, Dong Y, Lushington GH, Ye QZ, Georg GI (2006) J Biomol Screen 11(2):138–144

    Article  CAS  Google Scholar 

  7. Briem H, Gunther J (2005) Chembiochem 6(3):558–566

    Article  CAS  Google Scholar 

  8. Sheridan RP, Nachbar RB, Bush BL (1994) J Comput Aided Mol Des 8(3):323–340

    Article  CAS  Google Scholar 

  9. Wilton D, Willett P, Lawson K, Mullier G (2003) J Chem Inf Comput Sci 43(2):469–474

    CAS  Google Scholar 

  10. MDL, MDL Drug Data Report (2004) Coverage: 1988-present; updated monthly. Focus: Drugs launched or under development, as referenced in the patent literature, conference proceedings, and other sources; descriptions of therapeutic action and biological activity; tracking of compounds through development phases. Size: 132726 molecules,129459 models. Updates add approximately 10,000 new compounds per year. 2004

  11. Plewczynski D, Spieser SA, Koch U (2006) J Chem Inf Model 46(3):1098–1106

    Article  CAS  Google Scholar 

  12. Bender A, Glen RC (2005) J Chem Inf Model 45(5):1369–1375

    Article  CAS  Google Scholar 

  13. Nidhi, Glick M, Davies JW, Jenkins JL (2006) J Chem Inf Model 46(3):1124–1133

  14. Nettles JH, Jenkins JL, Bender A, Deng Z, Davies JW, Glick M (2006) J Med Chem 49(23):6802–6810

    Article  CAS  Google Scholar 

  15. Sheridan RP (2000) J Chem Inf Comput Sci 40(6):1456–1469

    CAS  Google Scholar 

  16. Miller MD, Sheridan RP, Kearsley SK (1999) J Med Chem 42(9):1505–1514

    Article  CAS  Google Scholar 

  17. Kauffman GW, Jurs PC (2001) J Chem Inf Comput Sci 41(6):1553–1560

    CAS  Google Scholar 

  18. Raymer ML, Sanschagrin PC, Punch WF, Venkataraman S, Goodman ED, Kuhn LA (1997) J Mol Biol 265(4):445–464

    Article  CAS  Google Scholar 

  19. Itskowitz P, Tropsha A (2005) J Chem Inf Model 45(3):777–785

    Article  CAS  Google Scholar 

  20. Zheng W, Tropsha A (2000) J Chem Inf Comput Sci 40(1):185–194

    CAS  Google Scholar 

  21. Burbidge R, Trotter M, Buxton B, Holden S (2001) Comput Chem 26(1):5–14

    Article  CAS  Google Scholar 

  22. Byvatov E, Fechner U, Sadowski J, Schneider G (2003) J Chem Inf Comput Sci 43(6):1882–1889

    CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by EC within BioSapiens (LHSG-CT-2003–503265) and SEPSDA (SP22-CT-2004–003831) 6FP projects and the Polish Ministry of Education and Science (PBZ-MNiI-2/1/2005 and MNII ordinary research grant to DP).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dariusz Plewczynski.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Plewczynski, D. kNNsim: k-Nearest neighbors similarity with genetic algorithm features optimization enhances the prediction of activity classes for small molecules. J Mol Model 15, 591–596 (2009). https://doi.org/10.1007/s00894-008-0349-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00894-008-0349-1

Keywords

Navigation