Skip to main content

Using Recursive Partitioning Analysis to Evaluate Compound Selection Methods

  • Protocol
Chemoinformatics

Part of the book series: Methods in Molecular Biology™ ((MIMB,volume 275))

Abstract

The design and analysis of a screening set for high throughput screening is complex. We examine three statistical strategies for compound selection, random, clustering, and space-filling. We examine two types of chemical descriptors, BCUTs and principal components of Dragon Constitutional descriptors. Based on the predictive power of multiple tree recursive partitioning, we reached the following tentative conclusions. Random designs appear to be as good as clustering and space-filling designs. For analysis, BCUTs appear to be better than principal components scores based upon Constitutional Descriptors. We confirm previous results that model-based selection of compounds can lead to improved screening hit rates.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ishikawa, K. (1986) Guide to quality control, Productivity, Inc., Shelton, CT. See also, http://www.hci.com.au/hcisite2/toolkit/causeand/htm.

    Google Scholar 

  2. Lam, R. L. H., Welch, W. J., and Young, S. S. (2002) Uniform coverage designs for molecule selection. Technometrics 44, 99–109.

    Article  Google Scholar 

  3. Hawkins, D. M., Young, S. S., and Rusinko, A. (1997) Analysis of a large structure-activity data set using recursive partitioning. Quantitaive Structure-Activity Relationship 16, 296–302.

    Article  CAS  Google Scholar 

  4. Rusinko, A. III, Farmen, M. W., Lambert, C. G., Brown, P. L., and Young, S. S. (1999) Analysis of a large structure/biological activity data set using recursive partitioning. J. Chem. Inf. Comput. Sci. 39, 1017–1026.

    PubMed  CAS  Google Scholar 

  5. van Rhee, A. M., Stocker, J., Printzenhoff, D., Creeh, C., Wagoner, P. K., and Spear, K. L. (2001) Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning. J. Comb. Chem. 3, 267–277.

    Article  PubMed  Google Scholar 

  6. Abt, M., Lim, Y-B., Sacks, J., Xie, M., and Young, S. S. (2001) A sequential approach for identifying lead compounds in large chemical databases. Stat. Sci. 16, 154–168.

    Article  Google Scholar 

  7. Engels, M. F., and Venkatarangan, P. (2001) Smart screening: approaches to efficient HTS. Current Opinion Drug Discovery & Development 4, 275–283.

    CAS  Google Scholar 

  8. Xu, J. and Hagler, A. (2002) Review: chemoinformatics and drug discovery. Molecules 7, 566–600.

    Article  CAS  Google Scholar 

  9. Hawkins, D. M. and Kass, G. V. (1982) Automatic interaction detection. In Topics in applied multivariate analysis, Hawkins, D. M. (ed.), Cambridge Univ. Press, pp. 269–302.

    Google Scholar 

  10. Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J. (1984) Classification and regression trees. Wadsworth, New York, NY.

    Google Scholar 

  11. Quinlan, J. R. (1992) C4.5 programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA.

    Google Scholar 

  12. Burden, F. R. (1989) Molecular identification number for substructure searches. J. Chem. Inf. Comput. Sci. 29, 225–227.

    CAS  Google Scholar 

  13. Pearlman, R. S. and Smith, K. M. (1999) Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 39, 28–35.

    CAS  Google Scholar 

  14. Westfall, P. H. and Young, S. S. (1993) Resampling-based multiple testing. Wiley, New York, NY.

    Google Scholar 

  15. Hawkins, D. M. and Musser, B. J. (1999) One tree or a forest? Alternative dendrographic models. Computing Science and Statistics 30, 534–542

    Google Scholar 

  16. FIRMPlus® http://www.goldenhelix.com.

  17. Breiman, L. (2001) Statistical modeling: the two cultures. Stat. Sci. 16, 199–231.

    Article  Google Scholar 

  18. Stanton, D. T. (1999) Evaluation and use of BCUT descriptors in QSAR and QSPR studies. Chem. Inf. Comput. Sci. 39, 11–20.

    CAS  Google Scholar 

  19. Lam, R. L. H. (2001) Design and analysis of large chemical databases for drug discovery. Ph.D. Dissertation, University of Waterloo.

    Google Scholar 

  20. Yi, B., Hughes-Oliver, J. M., Zhu, L., and Young, S. S. (2002) A factorial design to optimize cell-based drug discovery analysis. J. Chem. Inf. Comput. Sci. 42, 1221–1229.

    PubMed  CAS  Google Scholar 

  21. Dragon, http://www.disat.unimib.it/chm/Dragon.

  22. Burden, F. R., and Winkler, D. A. (2000) A quantitative structure-activity relationships model for the acute toxicity of substituted benzenes to Tetrahymena pyriformis using Bayesian-regularized neural networks. Chem. Res. Toxicol. 13, 436–440.

    Article  PubMed  CAS  Google Scholar 

  23. Jones-Hertzog, D. K., Mukhopadhyay, P., Keefer, C. E., and Young, S. S. (1999) Use of recursive partitioning in the sequential screening of G-protein-coupled receptors. J. Pharmacol. Toxicol. 42, 207–215.

    Article  CAS  Google Scholar 

  24. Young, S. S., Farmen, M., and Rusinko, A. III. Random versus rational: Which is better for general compound screening? http://www.netsci.org/Science/Screening/feature09.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Humana Press Inc.

About this protocol

Cite this protocol

Young, S.S., Hawkins, D.M. (2004). Using Recursive Partitioning Analysis to Evaluate Compound Selection Methods. In: Bajorath, J. (eds) Chemoinformatics. Methods in Molecular Biology™, vol 275. Humana Press. https://doi.org/10.1385/1-59259-802-1:317

Download citation

  • DOI: https://doi.org/10.1385/1-59259-802-1:317

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-261-2

  • Online ISBN: 978-1-59259-802-1

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics