Skip to main content

Enhanced Prediction of Conformational Flexibility and Phosphorylation in Proteins

  • Conference paper
  • First Online:
Advances in Computational Biology

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 680))

Abstract

Many sequence-based predictors of structural and functional properties of proteins have been developed in the past. In this study, we developed new methods for predicting measures of conformational flexibility in proteins, including X-ray structure-derived temperature (B-) factors and the variance within NMR structural ensemble, as effectively measured by the solvent accessibility standard deviations (SASDs). We further tested whether these predicted measures of conformational flexibility in crystal lattices and solution, respectively, can be used to improve the prediction of phosphorylation in proteins. The latter is an example of a common post-translational modification that modulates protein function, e.g., by affecting interactions and conformational flexibility of phosphorylated sites. Using robust epsilon-insensitive support vector regression (ε-SVR) models, we assessed two specific representations of protein sequences: one based on the position-specific scoring matrices (PSSMs) derived from multiple sequence alignments, and an augmented representation that incorporates real-valued solvent accessibility and secondary structure predictions (RSA/SS) as additional measures of local structural propensities. We showed that a combination of PSSMs and real-valued SS/RSA predictions provides systematic improvements in the accuracy of both B-factors and SASD prediction. These intermediate predictions were subsequently combined into an enhanced predictor of phosphorylation that was shown to significantly outperform methods based on PSSM alone. We would like to stress that to the best of our knowledge, this is the first example of using predicted from sequence NMR structure-based measures of conformational flexibility in solution for the prediction of other properties of proteins. Phosphorylation prediction methods typically employ a two-class classification approach with the limitation that the set of negative examples used for training may include some sites that are simply unknown to be phosphorylated. While one-class classification techniques have been considered in the past as a solution to this problem, their performance has not been systematically compared to two-class techniques. In this study, we developed and compared one- and two-class support vector machine (SVM)-based predictors for several commonly used sets of attributes. [These predictors are being made available at http://sable.cchmc.org/].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 292(2):195–202.

    Article  PubMed  CAS  Google Scholar 

  2. Adamczak R, Porollo A, Meller J (2004) Accurate prediction of solvent accessibility using neural networks-based regression. Proteins 56(4):7537–7567.

    Article  Google Scholar 

  3. Pollastri G, Baldi P et al (2002) Prediction of coordination number and relative solvent accessibility in proteins. Proteins 47(2):142–153.

    Article  PubMed  CAS  Google Scholar 

  4. Altschul SF, Lipman DJ et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410.

    PubMed  CAS  Google Scholar 

  5. Liu B, Wang X et al (2009) Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinformatics. 10:381.

    Article  PubMed  Google Scholar 

  6. Blom N, Brunak S et al (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294(5):1351–1362.

    Article  PubMed  CAS  Google Scholar 

  7. Blom N, Brunak S et al (2004) Prediction of posttranslational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4(6):1633–1649.

    Article  PubMed  CAS  Google Scholar 

  8. Berry EA, Dalby AR, Yang ZR (2004) Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms. Comput Biol Chem 28(1):75–85.

    Article  PubMed  CAS  Google Scholar 

  9. Kim JH, Koh I et al (2004) Prediction of phosphorylation sites using SVMs, Bioinformatics 20(17):3179–3184.

    Article  PubMed  CAS  Google Scholar 

  10. Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M (2007) PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol 8(11):R250.

    Article  PubMed  Google Scholar 

  11. Zhou FF, Yao X et al (2004) GPS: a novel group-based phosphorylation predicting and scoring method. Biochem Biophys Res Commun 325(4):1443–1448.

    Article  PubMed  CAS  Google Scholar 

  12. Xue Y, Yao X et al (2008) GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics 7(9):1598–1608.

    Article  PubMed  CAS  Google Scholar 

  13. Dang TH, Laukens K et al (2008) Prediction of kinase-specific phosphorylation sites using conditional random fields. Bioinformatics 24(24):2857–2864.

    Article  PubMed  CAS  Google Scholar 

  14. Yaffe MB, Cantley LC et al (2001) A motif-based profile scanning approach for genome-wide prediction of signaling pathways. Nat Biotechnol 19(4):348–353.

    Article  PubMed  CAS  Google Scholar 

  15. Obenauer JC, Yaffe MB et al (2003) Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31(13):3635–3641.

    Article  PubMed  CAS  Google Scholar 

  16. Li T, Fu H, Zhang X (2007) Prediction of kinase-specific phosphorylation sites by one-class SVMs. IEEE Int Conf Bioinform Biomed 217–222.

    Google Scholar 

  17. Kurgan L, Zhang H et al (2009) On the relation between residue flexibility and local solvent accessibility in proteins. Proteins 76(3):617–636.

    Article  PubMed  Google Scholar 

  18. Schlessinger A, Rost B (2005) Protein flexibility and rigidity predicted from sequence. Proteins 61(1):115–126.

    Article  PubMed  CAS  Google Scholar 

  19. Altschul SF, Lipman D et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402.

    Article  PubMed  CAS  Google Scholar 

  20. Blow D (2002) Outline of crystallography for biologists. Oxford University Press, New York, p 237.

    Google Scholar 

  21. Debye P (1913) Interferenz von Röntgenstrahlen und Wärmebewegung (in German). Ann d Phys 348(1):49–92.

    Article  Google Scholar 

  22. Dunker AK, Obradovic Z (2001) The protein trinity-linking function and disorder. Nat Biotechnol 19:805–806.

    Article  PubMed  CAS  Google Scholar 

  23. Liu J, Tan H, Rost B (2002) Loopy proteins appear conserved in evolution. J Mol Biol 322:53–64.

    Article  PubMed  CAS  Google Scholar 

  24. Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6:197–208.

    Article  PubMed  CAS  Google Scholar 

  25. Yuan Z, Wang ZX et al (2003) Flexibility analysis of enzyme active sites by crystallographic temperature factors. Protein Eng 16:109–114.

    Article  PubMed  CAS  Google Scholar 

  26. Teague SJ (2003) Implications of protein flexibility for drug discovery. Nat Rev Drug Discov. 2:527–541.

    Article  PubMed  CAS  Google Scholar 

  27. Daniel RM, Smith JC et al (2003) The role of dynamics in enzyme activity. Annu Rev Biophys Biomol Str 32:69–92.

    Article  CAS  Google Scholar 

  28. Dunker AK, Obradovic Z et al (2002) Intrinsic disorder and protein function. Biochemistry 41:6573–6582.

    Article  PubMed  CAS  Google Scholar 

  29. Tobi D, Bahar I (2005) Structural changes involved in protein binding correlate with intrinsic motions of proteins in the unbound state. Proc Natl Acad Sci USA 102:18908–18913.

    Article  PubMed  CAS  Google Scholar 

  30. Bhalla J, Storchan GB et al (2006) Local flexibility in molecular function paradigm. Mol Cell Proteomics 5:1212–1223.

    Article  PubMed  CAS  Google Scholar 

  31. Yuan Z, Bailey TL, Teasdale RD (2005) Prediction of protein B-factor profiles. Proteins 58(4):905–912.

    Article  PubMed  CAS  Google Scholar 

  32. Grzesiek S, Sass HJ (2009) From biomolecular structure to functional understanding: new NMR developments narrow the gap. Curr Opin Struct Biol 19(5):585–595.

    Article  PubMed  CAS  Google Scholar 

  33. Siegel GJ, Agranoff BW et al (1998) Basic Neurochem. – Molecular Cellular & Medical Aspects. LWW Publishers.

    Google Scholar 

  34. Structural Genomics Consortium website. http://www.sgc.ox.ac.uk/research/pds.html.

  35. Cohen P (2002) Protein kinases – the major drug targets of the twenty-first century?. Nat Rev Drug Discov 1(4):309–315.

    Article  PubMed  CAS  Google Scholar 

  36. Sardari S, Nam NH et al (2003) Protein kinases and their modulation in the central nervous system. Curr Med Chem 3(4):341–364.

    Article  CAS  Google Scholar 

  37. Lu KP, Zhou XZ et al (2002) Pinning down proline-directed phosphorylation signaling Trends Cell Biol 12(4):164–172.

    Article  PubMed  CAS  Google Scholar 

  38. Secko DM. Protein Phosphorylation: http://www.bioteach.ubc.ca/CellBiology/ProteinPhosphorylation/index.htm.

  39. Manning G, Whyte DB et al (2002) The protein kinase complement of the human genome. Science 298(5600):1912–1934.

    Article  PubMed  CAS  Google Scholar 

  40. www.neb.com/nebecomm/tech reference/protein tools/protein kinase substrate recognition.asp.

  41. Iakoucheva LM, Dunker AK et al (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32(3):1037–1049.

    Article  PubMed  CAS  Google Scholar 

  42. Kreegipuu A, Blom N et al (1999) PhosphoBase, a database of phosphorylation sites. Nucleic Acids Res 27(1):237–239.

    Article  PubMed  CAS  Google Scholar 

  43. Diella F, Gibson TJ et al (2004) Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics 5(1):79.

    Article  PubMed  Google Scholar 

  44. Rychlewski L, Reimer U et al (2004) Target specificity analysis of the Abl kinase using peptide microarray data. J Mol Biol 336(2):307–311.

    Article  PubMed  CAS  Google Scholar 

  45. Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press.

    Google Scholar 

  46. Chen Y, Zhou X et al (2001) One-class svm for learning in image retrieval. Proc Intl Conf Image Process 1:34–37.

    Google Scholar 

  47. Manevitz LM, Yousef M (2002) One-class svms for document classification, J Mach-Learn Res 2:139–154.

    Google Scholar 

  48. Berman HM, Bourne PE et al (2000) The Protein Data Bank. Nucleic Acids Res 235–242.

    Google Scholar 

  49. Altschul SF, Lipman DJ et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410.

    PubMed  CAS  Google Scholar 

  50. Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 22(12):2577–2637.

    Article  PubMed  CAS  Google Scholar 

  51. Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines, Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.

  52. Cao B, Meller J et al (2006) Enhanced recognition of protein transmembrane domains with prediction-based structural profiles. Bioinformatics 22(3):303–309.

    Article  PubMed  CAS  Google Scholar 

  53. Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105.

    Article  PubMed  CAS  Google Scholar 

  54. Available online at: http://folding.chmcc.org/online/am_acids/am_acids.html.

Download references

Acknowledgments

This work was supported in part by NIH grants GM067823 and P30-ES006096. Computational resources were made available by Cincinnati Children’s Hospital Research Foundation and University of Cincinnati College of Medicine.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media, LLC

About this paper

Cite this paper

Swaminathan, K., Adamczak, R., Porollo, A., Meller, J. (2010). Enhanced Prediction of Conformational Flexibility and Phosphorylation in Proteins. In: Arabnia, H. (eds) Advances in Computational Biology. Advances in Experimental Medicine and Biology, vol 680. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5913-3_35

Download citation

Publish with us

Policies and ethics