Skip to main content

Integrating Binding Site Predictions Using Non-linear Classification Methods

  • Conference paper
Deterministic and Statistical Methods in Machine Learning (DSMML 2004)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3635))

Abstract

Currently the best algorithms for transcription factor binding site prediction are severely limited in accuracy. There is good reason to believe that predictions from these different classes of algorithms could be used in conjunction to improve the quality of predictions. In this paper, we apply single layer networks, rules sets and support vector machines on predictions from 12 key algorithms. Furthermore, we use a ‘window’ of consecutive results in the input vector in order to contextualise the neighbouring results. Moreover, we improve the classification result with the aid of under- and over- sampling techniques. We find that support vector machines outperform each of the original individual algorithms and other classifiers employed in this work with both type of inputs, in that they maintain a better tradeoff between recall and precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. http://www.hgmp.mrc.ac.uk/Software/EMBOSS/

  2. Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, pp. 28–36. AAAI Press, Menlo Park (1994)

    Google Scholar 

  3. http://family.caltech.edu/SeqComp/index.html

  4. Blanchette, M., Tompa, M.: FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Research 31(13), 3840–3842 (2003)

    Article  Google Scholar 

  5. Markstein, M., Stathopoulos, A., Markstein, V., Markstein, P., Harafuji, N., Keys, D., Lee, B., Richardson, P., Rokshar, D., Levine, M.: Decoding Noncoding Regulatory DNAs in Metazoan Genomes. In: Proceeding of 1st IEEE Computer Society Bioinformatics Conference (CSB 2002), Stanford, CA, USA (August 14-16, 2002)

    Google Scholar 

  6. Arnone, M.I., Davidson, E.H.: The hardwiring of development: Organization and function of genomic regulatory systems. Development 124, 1851–1864 (1997)

    Google Scholar 

  7. Apostolico, A., Bock, M.E., Lonardi, S., Xu, X.: Efficient Detection of Unusual Words. Journal of Computational Biology 7(1/2) (2000)

    Google Scholar 

  8. Rajewsky, N., Vergassola, M., Gaul, U., Siggia, E.D.: Computational detection of genomic cis regulatory modules, applied to body patterning in the early Drosophila embryo, BMC. Bioinformatics 3, 30 (2002)

    Article  Google Scholar 

  9. Thijs, G., Marchal, K., Lescot, M., Rombauts, S., De Moor, B., Rouz, P., Moreau, Y.: A Gibbs Sampling method to detect over-represented motifs in upstream regions of coexpressed genes. In: Proceedings Recomb 2001, pp. 305–312 (2001)

    Google Scholar 

  10. Hughes, J.D., Estep, P.W., Tavazoie, S., Church, G.M.: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. Journal of Molecular Biology 296(5), 1205–1214 (2000)

    Article  Google Scholar 

  11. Japkowicz, N.: Class imbalances: Are we focusing on the right issure? In: Workshop on learning from imbalanced datasets, Washington DC. ICML, vol. II (2003)

    Google Scholar 

  12. Wu, G., Chang, E.Y.: Class-boundary alignment for imbalanced dataset learning. In: Workshop on learning from imbalanced datasets, Washington DC. ICML, vol. II (2003)

    Google Scholar 

  13. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research 16, 321–357 (2002)

    MATH  Google Scholar 

  14. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, New York (1995)

    Google Scholar 

  15. Scholköpf, B., Smola, A.J.: Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. The MIT Press, Cambridge (2002)

    Google Scholar 

  16. Buckland, M., Gey, F.: The relationship between Recall and Precision. Journal of the American Society for Information Science 45(1), 12–19 (1994)

    Article  Google Scholar 

  17. Joshi, M., Kumar, V., Agarwal, R.: Evaluating Boosting algorithms to classify rare classes: Comparison and improvements. In: First IEEE International Conference on Data Mining, San Jose, CA (2001)

    Google Scholar 

  18. Fawcett, T.: ROC graphs: notes and practical considerations for researchers. Kluwer Academic publishers, Dordrecht (2004)

    Google Scholar 

  19. Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982)

    Google Scholar 

  20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kauffman, San Francisco (1993)

    Google Scholar 

  21. Fawcett, T.: Using rule sets to maximize ROC performance. In: Proceedings of the IEEE International Conference on Data Mining (ICDM 2001), pp. 131–138. IEEE Computer Society, Los Alamitos (2001)

    Chapter  Google Scholar 

  22. Wu, T.F., Lin, C.J., Weng, R.C.: Probability Estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research 5, 975–1005 (2004)

    MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sun, Y., Robinson, M., Adams, R., Kaye, P., Rust, A., Davey, N. (2005). Integrating Binding Site Predictions Using Non-linear Classification Methods. In: Winkler, J., Niranjan, M., Lawrence, N. (eds) Deterministic and Statistical Methods in Machine Learning. DSMML 2004. Lecture Notes in Computer Science(), vol 3635. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11559887_14

Download citation

  • DOI: https://doi.org/10.1007/11559887_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29073-5

  • Online ISBN: 978-3-540-31728-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics