Skip to main content

Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models

  • Conference paper
Bioinformatics and Biomedical Engineering (IWBBIO 2015)

Abstract

Sub-cellular localization prediction is an important step for inferring protein functions. Several strategies have been developed in the recent years to solve this problem, from alignment-based solutions to feature-based solutions. However, under some identity thesholds, these kind of approaches fail to detect homologous sequences, achieving predictions with low specificity and sensitivity. Here, a novel methodology is proposed for classifying proteins with low identity levels. This approach implements a simple, yet powerful assumption that employs hierarchical clustering and hidden Markov models, obtaining high performance on the prediction of four different sub-cellular localizations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chou, K.-C., Shen, H.-B.: Cell-ploc: a package of web servers for predicting subcellular localization of proteins in various organisms. Nature Protocols 3(2), 153–162 (2008)

    Article  Google Scholar 

  2. Baldi, P., Brunak, S.: Bioinformatics: the machine learning approach. The MIT Press (2001)

    Google Scholar 

  3. Jaramillo-Garzón, J., Perera-Lluna, A., Castellanos-Domiínguez, C.: Predictability of protein subcellular locations by pattern recognition techniques. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5512–5515. IEEE (2010)

    Google Scholar 

  4. Conesa, A., Götz, S.: Blast2go: A comprehensive suite for functional analysis in plant genomics. International Journal of Plant Genomics 2008 (2008)

    Google Scholar 

  5. Hawkins, T., Chitale, M., Luban, S., Kihara, D.: PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data. Proteins 74(3), 566–582 (2009)

    Article  Google Scholar 

  6. Yu, C., Lin, C., Hwang, J.: Predicting subcellular localization of proteins for gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Science 13(5), 1402–1406 (2004)

    Article  Google Scholar 

  7. Shi, J., Zhang, S., Pan, Q., Cheng, Y., Xie, J.: Prediction of protein subcellular localization by support vector machines using multi-scale energy and pseudo amino acid composition. Amino Acids 33(1), 69–74 (2007)

    Article  Google Scholar 

  8. Nanni, L., Lumini, A.: An ensemble of support vector machines for predicting the membrane protein type directly from the amino acid sequence. Amino Acids 35(3), 573–580 (2008)

    Article  Google Scholar 

  9. Ma, J., Liu, W., Gu, H.: Predicting protein subcellular locations for Gram-negative bacteria using neural networks ensemble. In: Proceedings of the 6th Annual IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, pp. 114–120. The Institute of Electrical and Electronics Engineers Inc. (2009)

    Google Scholar 

  10. Shen, Y., Burger, G.: ‘Unite and conquer’: enhanced prediction of protein subcellular localization by integrating multiple specialized tools. BMC Bioinformatics 8(1), 420 (2007)

    Article  Google Scholar 

  11. Shen, H., Yang, J., Chou, K.: Euk-PLoc: an ensemble classifier for large-scale eukaryotic protein subcellular location prediction. Amino Acids 33(1), 57–67 (2007)

    Article  Google Scholar 

  12. Niu, B., Jin, Y., Feng, K., Lu, W., Cai, Y., Li, G.: Using adaboost for the prediction of subcellular location of prokaryotic and eukaryotic proteins. Molecular Diversity 12(1), 41–45 (2008)

    Article  Google Scholar 

  13. Khan, A., Majid, A., Choi, T.: Predicting protein subcellular location: exploiting amino acid based sequence of feature spaces and fusion of diverse classifiers. Amino Acids 38(1), 347–350 (2010)

    Article  Google Scholar 

  14. Punta, M., Coggill, P.C., Eberhardt, R.Y., Mistry, J., Tate, J., Boursnell, C., Pang, N., Forslund, K., Ceric, G., Clements, J., et al.: The pfam protein families database. Nucleic Acids Research 40(D1), D290–D301 (2012)

    Google Scholar 

  15. Arango-Argoty, G., Ruiz-Munoz, J., Jaramillo-Garzon, J., Castellanos-Dominguez, C.: An adaptation of pfam profiles to predict protein sub-cellular localization in gram positive bacteria. In: 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5554–5557. IEEE (2012)

    Google Scholar 

  16. Chou, K.-C., Shen, H.-B.: Plant-mploc: a top-down strategy to augment the power for predicting plant protein subcellular localization. PloS One 5(6), e11335 (2010)

    Article  Google Scholar 

  17. Fu, L., Niu, B., Zhu, Z., Wu, S., Li, W.: Cd-hit: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23), 3150–3152 (2012)

    Article  Google Scholar 

  18. Finn, R.D., Clements, J., Eddy, S.R.: Hmmer web server: interactive sequence similarity searching. Nucleic Acids Research 39(suppl. 2), W29–W37 (2011)

    Google Scholar 

  19. Jaramillo-Garzón, J.A., Gallardo-Chacón, J.J., Castellanos-Domínguez, C.G., Perera-Lluna, A.: Predictability of gene ontology slim-terms from primary structure information in embryophyta plant proteins. BMC Bioinformatics 14(1), 68 (2013)

    Article  Google Scholar 

  20. Yooseph, S., Li, W., Sutton, G.: Gene identification and protein classification in microbial metagenomic sequence data via incremental clustering. BMC Bioinformatics 9(1), 182 (2008)

    Article  Google Scholar 

  21. Sun, S., Chen, J., Li, W., Altintas, I., Lin, A., Peltier, S., Stocks, K., Allen, E.E., Ellisman, M., Grethe, J., et al.: Community cyberinfrastructure for advanced microbial ecology research and analysis: the camera resource. Nucleic Acids Research 39(suppl. 1), D546–D551 (2011)

    Google Scholar 

  22. Rabiner, L.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  23. Freyhult, E.K., Bollback, J.P., Gardner, P.P.: Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding rna. Genome Research 17(1), 117–125 (2007)

    Article  Google Scholar 

  24. Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., et al.: Fast, scalable generation of high-quality protein multiple sequence alignments using clustal omega. Molecular Systems Biology 7(1) (2011)

    Google Scholar 

  25. Jain, E., Bairoch, A., Duvaud, S., Phan, I., Redaschi, N., Suzek, B., Martin, M., McGarvey, P., Gasteiger, E.: Infrastructure for the life sciences: design and implementation of the UniProt website. BMC Bioinformatics 10(1), 136 (2009)

    Article  Google Scholar 

  26. Barrell, D., Dimmer, E., Huntley, R.P., Binns, D., O’Donovan, C., Apweiler, R.: The goa database in 2009-an integrated gene ontology annotation resource. Nucleic Acids Research 37(suppl. 1), D396–D403 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Jaramillo-Garzón, J.A., Castro-Ceballos, J., Castellanos-Dominguez, G. (2015). Predicting Sub-cellular Location of Proteins Based on Hierarchical Clustering and Hidden Markov Models. In: Ortuño, F., Rojas, I. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2015. Lecture Notes in Computer Science(), vol 9044. Springer, Cham. https://doi.org/10.1007/978-3-319-16480-9_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-16480-9_26

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-16479-3

  • Online ISBN: 978-3-319-16480-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics