Skip to main content

Committee-Based Active Learning to Select Negative Examples for Predicting Protein Functions

  • Conference paper
  • First Online:
Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2018)

Abstract

The Automated Functional Prediction (AFP) of proteins became a challenging problem in bioinformatics and biomedicine aiming at handling and interpreting the extremely large-sized proteomes of several eukaryotic organisms. A central issue in AFP is the absence in public repositories for protein functions, e.g. the Gene Ontology (GO), of well defined sets of negative examples to learn accurate classifiers for AFP. In this paper we investigate the Query by Committee paradigm of active learning to select the negatives most informative for the classifier and the protein function to be inferred. We validated our approach in predicting the Gene Ontology function for the S.cerevisiae proteins.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A balanced seed training set counterbalances the predominance of 0 labels.

References

  1. Ashburner, M., et al.: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat. Genet. 25, 25–29 (2000)

    Article  Google Scholar 

  2. Eisner, R., Poulin, B., Szafron, D., Lu, P.: Improving protein prediction using the hierarchical structure of the gene ontology. In: IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (2005)

    Google Scholar 

  3. Mostafavi, S., Morris, Q.: Using the gene ontology hierarchy when predicting gene function. In: Proceedings of the Twenty-Fifth Annual Conference on Uncertainty in Artificial Intelligence (UAI-09), (Corvallis, Oregon), pp. 419–427. AUAI Press (2009)

    Google Scholar 

  4. Youngs, N., Penfold-Brown, D., Bonneau, R., Shasha, D.: Negative example selection for protein function prediction: the NoGO database. PLoS Comput. Biol. 10, 1–12 (2014)

    Article  Google Scholar 

  5. Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28, 133–168 (1997)

    Article  Google Scholar 

  6. Bertoni, A., Frasca, M., Valentini, G.: COSNet: a cost sensitive neural network for semi-supervised learning in graphs. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6911, pp. 219–234. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-23780-5_24

    Chapter  Google Scholar 

  7. Frasca, M., Lipreri, F., Malchiodi, D.: Analysis of informative features for negative selection in protein function prediction. In: Rojas, I., Ortuño, F. (eds.) IWBBIO 2017, Part II. LNCS, vol. 10209, pp. 267–276. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-56154-7_25

    Chapter  Google Scholar 

  8. Szklarczyk, D., et al.: String v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43(D1), D447–D452 (2015)

    Article  Google Scholar 

  9. Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 150–157. Morgan Kaufmann (1995)

    Google Scholar 

  10. Melville, P., Mooney, R.J.: Diverse ensembles for active learning. In: Proceedings of the Twenty-first International Conference on Machine Learning, ICML 2004, p. 74. ACM, New York (2004)

    Google Scholar 

  11. Abe, N., Mamitsuka, H.: Query learning strategies using boosting and bagging. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, San Francisco, CA, USA, pp. 1–9 (1998)

    Google Scholar 

  12. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  Google Scholar 

  13. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines: and Other Kernel-based Learning Methods. Cambridge University Press, New York (2000)

    Book  Google Scholar 

  14. Breiman, L., Friedman, G., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont (1984)

    MATH  Google Scholar 

  15. Gini, C.: Variabilità e Mutuabilità. Contributo allo Studio delle Distribuzioni e delle Relazioni Statistiche, C. Cuppini, Bologna (1912)

    Google Scholar 

  16. Wilcoxon, F.: Individual comparisons by ranking methods. Biometrics 1, 80–83 (1945)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported by the grant title Machine learning algorithms to handle label imbalance in biomedical taxonomies, code PSR2017\(\_\)DIP\(\_\)010\(\_\)MFRAS, Università degli Studi di Milano.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giorgio Valentini .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Frasca, M., Sepehri, M., Petrini, A., Grossi, G., Valentini, G. (2020). Committee-Based Active Learning to Select Negative Examples for Predicting Protein Functions. In: Raposo, M., Ribeiro, P., Sério, S., Staiano, A., Ciaramella, A. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2018. Lecture Notes in Computer Science(), vol 11925. Springer, Cham. https://doi.org/10.1007/978-3-030-34585-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34585-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34584-6

  • Online ISBN: 978-3-030-34585-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics