Skip to main content

An Automated Combination of Kernels for Predicting Protein Subcellular Localization

  • Conference paper
Algorithms in Bioinformatics (WABI 2008)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 5251))

Included in the following conference series:

Abstract

Protein subcellular localization is a crucial ingredient to many important inferences about cellular processes, including prediction of protein function and protein interactions. While many predictive computational tools have been proposed, they tend to have complicated architectures and require many design decisions from the developer.

Here we utilize the multiclass support vector machine (m-SVM) method to directly solve protein subcellular localization without resorting to the common approach of splitting the problem into several binary classification problems. We further propose a general class of protein sequence kernels which considers all motifs, including motifs with gaps. Instead of heuristically selecting one or a few kernels from this family, we utilize a recent extension of SVMs that optimizes over multiple kernels simultaneously. This way, we automatically search over families of possible amino acid motifs.

We compare our automated approach to three other predictors on four different datasets, and show that we perform better than the current state of the art. Further, our method provides some insights as to which sequence motifs are most useful for determining subcellular localization, which are in agreement with biological reasoning. Data files, kernel matrices and open source software are available at http://www.fml.mpg.de/raetsch/projects/protsubloc .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  2. Park, K.J., Kanehisa, M.: Prediction of protein subcellular locations by support vector machines using compositions of amino acids and amino acid pairs. Bioinformatics 19(13), 1656–1663 (2003)

    Article  Google Scholar 

  3. Guda, C., Subramaniam, S.: TARGET: a new method for predicting protein subcellular localization in eukaryotes. Bioinformatics 21(21), 3963–3969 (2005)

    Article  Google Scholar 

  4. Yu, C.-S., Lin, C.-J., Hwang, J.-K.: Predicting subcellular localization of proteins for gram-negative bacteria by support vector machines based on n-peptide compositions. Protein Science 13, 1402–1406 (2004)

    Article  Google Scholar 

  5. Gardy, J.L., Laird, M.R., Chen, F., Rey, S., Walsh, C.J., Ester, M., Brinkman, F.S.L.: PSORTb v.2.0: expanded prediction of bacterial protein subcellular localization and insights gained from comparative proteome analysis. Bioinfomatics 21, 617–623 (2004)

    Article  Google Scholar 

  6. Höglund, A., Dönnes, P., Blum, T., Adolph, H.-W., Kohlbacher, O.: MultiLoc: prediction of protein subcellular localization using N-terminal targeting sequences, sequence motifs, and amino acid composition. Bioinfomatics (2006)

    Google Scholar 

  7. Xie, D., Li, A., Wang, M., Fan, Z., Feng, H.: LOCSVMPSI: a web server for subcellular localization of eukaryotic proteins using SVM and profile of PSI-BLAST. Nucleic Acids Research 33, W105–W110 (2005)

    Article  Google Scholar 

  8. Garg, A., Bhasin, M., Raghava, G.P.S.: Support vector machine-based method for subcellular localization of human proteins using amino acid composition, their order, and similarity search. The Journal of Biological Chemistry 280(15), 14427–14432 (2005)

    Article  Google Scholar 

  9. Zien, A., Ong, C.S.: Multiclass multiple kernel learning. In: International Conference on Machine Learning (2007)

    Google Scholar 

  10. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. In: Proceedings of the National Academy of Sciences, pp. 10915–10919 (1992)

    Google Scholar 

  11. Reinhardt, A., Hubbard, T.: Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research 26, 2230–2236 (1998)

    Article  Google Scholar 

  12. Cui, Q., Jiang, T., Liu, B., Ma, S.: Esub8: A novel tool to predict protein subcellular localizations in eukaryotic organisms. BMC Bioinformatics 5(66) (2004)

    Google Scholar 

  13. Hein, M., Bousquet, O.: Hilbertian metrics and positive definite kernels on probability measures. In: Cowell, R., Ghahramani, Z. (eds.) Proceedings of AISTATS 2005, pp. 136–143 (2005)

    Google Scholar 

  14. Lanckriet, G., De Bie, T., Cristianini, N., Jordan, M.I., Stafford Noble, W.: A statistical framework for genomic data fusion. Bioinfomatics 20(16), 2626–2635 (2004)

    Article  Google Scholar 

  15. Sonnenburg, S., Rätsch, G., Schäfer, C.: A general and efficient multiple kernel learning algorithm. In: Neural Information Processings Systems (2005)

    Google Scholar 

  16. Hettich, R., Kortanek, K.O.: Semi-Infinite Programming: Theory, Methods, and Applications. SIAM Review 35(3), 380–429 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  17. Lee, Y., Kim, Y., Lee, S., Koo, J.-Y.: Structured multicategory support vector machines with analysis of variance decomposition. Biometrika 93(3), 555–571 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  18. Roth, V., Fischer, B.: Improved functional prediction of proteins by learning kernel combinations in multilabel settings. BMC Bioinformatics 8 (suppl. 2), 12 (2007)

    Article  Google Scholar 

  19. Nair, R., Rost, B.: Sequence conserved for subcellular localization. Protein Science 11, 2836–2847 (2002)

    Article  Google Scholar 

  20. Yu, C.-S., Chen, Y.-C., Lu, C.-H., Hwang, J.-K.: Prediction of protein subcellular localization. Proteins: Structure, Function and Bioinformatics 64(3), 643–651 (2006)

    Article  Google Scholar 

  21. Pellegrini, M., Marcotte, E.M., Thompson, M.J., Eisenberg, D., Yeates, T.O.: Assigning protein functions by comparative genome analysis: Protein phylogenetic profiles. Proceedings of the National Academy of Sciences 96(8), 4285–4288 (1999)

    Article  Google Scholar 

  22. Emanuelsson, O., Nielsen, H., Brunak, S., von Heijne, G.: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. Journal of Molecular Biology 300, 1005–1016 (2000)

    Article  Google Scholar 

  23. Marcotte, E.M., Xenarios, I., van der Bliek, A.M., Eisenberg, D.: Localizing proteins in the cell from their phylogenetic profiles. Proceedings of the National Academy of Sciences 97(22), 12115–12120 (2000)

    Article  Google Scholar 

  24. Zien, A., Sonnenburg, S., Philips, P., Rätsch, G.: POIMS: Positional Oligomer Importance Matrices – Understanding Support Vector Machine Based Signal Detectors. In: Proceedings of the 16th International Conference on Intelligent Systems for Molecular Biology (2008)

    Google Scholar 

  25. Höglund, A., Blum, T., Brady, S., Dönnes, P., San Miguel, J., Rocheford, M., Kohlbacher, O., Shatkay, H.: Significantly improved prediction of subcellular localization by integrating text and protein sequence data. In: Pacific Symposium on Biocomputing, pp. 16–27 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Keith A. Crandall Jens Lagergren

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ong, C.S., Zien, A. (2008). An Automated Combination of Kernels for Predicting Protein Subcellular Localization. In: Crandall, K.A., Lagergren, J. (eds) Algorithms in Bioinformatics. WABI 2008. Lecture Notes in Computer Science(), vol 5251. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-87361-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-87361-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-87360-0

  • Online ISBN: 978-3-540-87361-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics