Skip to main content

Protein Remote Homology Detection Using Dissimilarity-Based Multiple Instance Learning

  • Conference paper
  • First Online:
Structural, Syntactic, and Statistical Pattern Recognition (S+SSPR 2018)

Abstract

A challenging Pattern Recognition problem in Bioinformatics concerns the detection of a functional relation between two proteins even when they show very low sequence similarity – this is the so-called Protein Remote Homology Detection (PRHD) problem. In this paper we propose a novel approach to PRHD, which casts the problem into a Multiple Instance Learning (MIL) framework, which seems very suitable for this context. Experiments on a standard benchmark show very competitive performances, also in comparison with alternative discriminative methods.

M. Bicego and P. Lovato were partially supported by the University of Verona through the program “Bando di Ateneo per la Ricerca di Base 2015”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Along the text we will refer equivalently to K-mers or N-grams.

  2. 2.

    Available at http://noble.gs.washington.edu/proj/svm-pairwise/.

  3. 3.

    Available at http://bioinformatics.hitsz.edu.cn/main/~binliu/remote.

  4. 4.

    Downloadable from http://www.chibi.ubc.ca/gist/ [14].

References

  1. Chen, J., Guo, M., Wang, X., Liu, B.: A comprehensive review and comparison of different computational methods for protein remote homology detection. Brief. Bioinf. 19, 1–14 (2016)

    Google Scholar 

  2. Chen, Y., Bi, J., Wang, J.Z.: MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)

    Article  Google Scholar 

  3. Cheplygina, V., Tax, D., Loog, M.: Dissimilarity-based ensembles for multiple instance learning. IEEE Trans. Neural Netw. Learn. Syst. 27(6), 1379–1391 (2016)

    Article  Google Scholar 

  4. Cucci, A., Lovato, P., Bicego, M.: Enriched bag of words for protein remote homology detection. In: Robles-Kelly, A., Loog, M., Biggio, B., Escolano, F., Wilson, R. (eds.) S+SSPR 2016. LNCS, vol. 10029, pp. 463–473. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49055-7_41

    Chapter  Google Scholar 

  5. Dietterich, T., Lathrop, R., Lozano-Pérez, T.: Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 89(1–2), 31–71 (1997)

    Article  Google Scholar 

  6. Dong, Q., Lin, L., Wang, X.: Protein remote homology detection based on binary profiles. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS, vol. 4414, pp. 212–223. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-71233-6_17

    Chapter  Google Scholar 

  7. Dong, Q., Wang, X., Lin, L.: Application of latent semantic analysis to protein remote homology detection. Bioinformatics 22(3), 285–290 (2006)

    Article  Google Scholar 

  8. Fung, G., Dundar, M., Krishnapuram, B., Rao, R.: Multiple instance learning for computer aided diagnosis. Proc. Adv. Neural Inf. Process. Syst. 19, 425–432 (2007)

    Google Scholar 

  9. Gribskov, M., Robinson, N.: Use of receiver operating characteristic (ROC) analysis to evaluate sequence matching. Comput. Chem. 20(1), 25–33 (1996)

    Article  Google Scholar 

  10. Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)

    Article  Google Scholar 

  11. Kuang, R., Wang, K., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.: Profile-based string kernels for remote homology detection and motif extraction. J. Bioinf. Comput. Biol. 3(03), 527–550 (2005)

    Article  Google Scholar 

  12. Kuksa, P.P., Pavlovic, V.: Efficient evaluation of large sequence kernels. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 759–767. ACM (2012)

    Google Scholar 

  13. Leslie, C., Eskin, E., Noble, W.: The spectrum kernel: a string kernel for SVM protein classification. In: PSB, pp. 566–575 (2002)

    Google Scholar 

  14. Liao, L., Noble, W.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J. Comput. Biol. 10(6), 857–868 (2003)

    Article  Google Scholar 

  15. Liu, B., Wang, X., Lin, L., Dong, Q., Wang, X.: A discriminative method for protein remote homology detection and fold recognition combining top-n-grams and latent semantic analysis. BMC Bioinf. 9(1), 510 (2008). https://doi.org/10.1186/1471-2105-9-510

    Article  Google Scholar 

  16. Liu, B., et al.: Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection. Bioinformatics 30(4), 472–479 (2014)

    Article  Google Scholar 

  17. Lovato, P., Cristani, M., Bicego, M.: Soft Ngram representation and modeling for protein remote homology detection. IEEE/ACM Trans. Comput. Biol. Bioinf. 14(6), 1482–1488 (2017)

    Article  Google Scholar 

  18. Lovato, P., Giorgetti, A., Bicego, M.: A multimodal approach for protein remote homology detection. IEEE/ACM Trans. Comput. Biol. Bioinf. (TCBB) 12(5), 1193–1198 (2015)

    Article  Google Scholar 

  19. Pekalska, E., Duin, R.P.W.: The Dissimilarity Representation for Pattern Recognition: Foundations and Applications, Machine Perception and Artificial Intelligence, vol. 64. World Scientific, Singapore (2005)

    MATH  Google Scholar 

  20. Rangwala, H., Karypis, G.: Profile-based direct kernels for remote homology detection and fold recognition. Bioinformatics 21(23), 4239–4247 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuele Bicego .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mensi, A., Bicego, M., Lovato, P., Loog, M., Tax, D.M.J. (2018). Protein Remote Homology Detection Using Dissimilarity-Based Multiple Instance Learning. In: Bai, X., Hancock, E., Ho, T., Wilson, R., Biggio, B., Robles-Kelly, A. (eds) Structural, Syntactic, and Statistical Pattern Recognition. S+SSPR 2018. Lecture Notes in Computer Science(), vol 11004. Springer, Cham. https://doi.org/10.1007/978-3-319-97785-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97785-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97784-3

  • Online ISBN: 978-3-319-97785-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics