Abstract
The design and analysis of a screening set for high throughput screening is complex. We examine three statistical strategies for compound selection, random, clustering, and space-filling. We examine two types of chemical descriptors, BCUTs and principal components of Dragon Constitutional descriptors. Based on the predictive power of multiple tree recursive partitioning, we reached the following tentative conclusions. Random designs appear to be as good as clustering and space-filling designs. For analysis, BCUTs appear to be better than principal components scores based upon Constitutional Descriptors. We confirm previous results that model-based selection of compounds can lead to improved screening hit rates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ishikawa, K. (1986) Guide to quality control, Productivity, Inc., Shelton, CT. See also, http://www.hci.com.au/hcisite2/toolkit/causeand/htm.
Lam, R. L. H., Welch, W. J., and Young, S. S. (2002) Uniform coverage designs for molecule selection. Technometrics 44, 99–109.
Hawkins, D. M., Young, S. S., and Rusinko, A. (1997) Analysis of a large structure-activity data set using recursive partitioning. Quantitaive Structure-Activity Relationship 16, 296–302.
Rusinko, A. III, Farmen, M. W., Lambert, C. G., Brown, P. L., and Young, S. S. (1999) Analysis of a large structure/biological activity data set using recursive partitioning. J. Chem. Inf. Comput. Sci. 39, 1017–1026.
van Rhee, A. M., Stocker, J., Printzenhoff, D., Creeh, C., Wagoner, P. K., and Spear, K. L. (2001) Retrospective analysis of an experimental high-throughput screening data set by recursive partitioning. J. Comb. Chem. 3, 267–277.
Abt, M., Lim, Y-B., Sacks, J., Xie, M., and Young, S. S. (2001) A sequential approach for identifying lead compounds in large chemical databases. Stat. Sci. 16, 154–168.
Engels, M. F., and Venkatarangan, P. (2001) Smart screening: approaches to efficient HTS. Current Opinion Drug Discovery & Development 4, 275–283.
Xu, J. and Hagler, A. (2002) Review: chemoinformatics and drug discovery. Molecules 7, 566–600.
Hawkins, D. M. and Kass, G. V. (1982) Automatic interaction detection. In Topics in applied multivariate analysis, Hawkins, D. M. (ed.), Cambridge Univ. Press, pp. 269–302.
Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J. (1984) Classification and regression trees. Wadsworth, New York, NY.
Quinlan, J. R. (1992) C4.5 programs for machine learning. Morgan Kaufmann Publishers, San Mateo, CA.
Burden, F. R. (1989) Molecular identification number for substructure searches. J. Chem. Inf. Comput. Sci. 29, 225–227.
Pearlman, R. S. and Smith, K. M. (1999) Metric validation and the receptor-relevant subspace concept. J. Chem. Inf. Comput. Sci. 39, 28–35.
Westfall, P. H. and Young, S. S. (1993) Resampling-based multiple testing. Wiley, New York, NY.
Hawkins, D. M. and Musser, B. J. (1999) One tree or a forest? Alternative dendrographic models. Computing Science and Statistics 30, 534–542
FIRMPlus® http://www.goldenhelix.com.
Breiman, L. (2001) Statistical modeling: the two cultures. Stat. Sci. 16, 199–231.
Stanton, D. T. (1999) Evaluation and use of BCUT descriptors in QSAR and QSPR studies. Chem. Inf. Comput. Sci. 39, 11–20.
Lam, R. L. H. (2001) Design and analysis of large chemical databases for drug discovery. Ph.D. Dissertation, University of Waterloo.
Yi, B., Hughes-Oliver, J. M., Zhu, L., and Young, S. S. (2002) A factorial design to optimize cell-based drug discovery analysis. J. Chem. Inf. Comput. Sci. 42, 1221–1229.
Burden, F. R., and Winkler, D. A. (2000) A quantitative structure-activity relationships model for the acute toxicity of substituted benzenes to Tetrahymena pyriformis using Bayesian-regularized neural networks. Chem. Res. Toxicol. 13, 436–440.
Jones-Hertzog, D. K., Mukhopadhyay, P., Keefer, C. E., and Young, S. S. (1999) Use of recursive partitioning in the sequential screening of G-protein-coupled receptors. J. Pharmacol. Toxicol. 42, 207–215.
Young, S. S., Farmen, M., and Rusinko, A. III. Random versus rational: Which is better for general compound screening? http://www.netsci.org/Science/Screening/feature09.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Humana Press Inc.
About this protocol
Cite this protocol
Young, S.S., Hawkins, D.M. (2004). Using Recursive Partitioning Analysis to Evaluate Compound Selection Methods. In: Bajorath, J. (eds) Chemoinformatics. Methods in Molecular Biology™, vol 275. Humana Press. https://doi.org/10.1385/1-59259-802-1:317
Download citation
DOI: https://doi.org/10.1385/1-59259-802-1:317
Publisher Name: Humana Press
Print ISBN: 978-1-58829-261-2
Online ISBN: 978-1-59259-802-1
eBook Packages: Springer Protocols