Abstract
Many sequence-based predictors of structural and functional properties of proteins have been developed in the past. In this study, we developed new methods for predicting measures of conformational flexibility in proteins, including X-ray structure-derived temperature (B-) factors and the variance within NMR structural ensemble, as effectively measured by the solvent accessibility standard deviations (SASDs). We further tested whether these predicted measures of conformational flexibility in crystal lattices and solution, respectively, can be used to improve the prediction of phosphorylation in proteins. The latter is an example of a common post-translational modification that modulates protein function, e.g., by affecting interactions and conformational flexibility of phosphorylated sites. Using robust epsilon-insensitive support vector regression (ε-SVR) models, we assessed two specific representations of protein sequences: one based on the position-specific scoring matrices (PSSMs) derived from multiple sequence alignments, and an augmented representation that incorporates real-valued solvent accessibility and secondary structure predictions (RSA/SS) as additional measures of local structural propensities. We showed that a combination of PSSMs and real-valued SS/RSA predictions provides systematic improvements in the accuracy of both B-factors and SASD prediction. These intermediate predictions were subsequently combined into an enhanced predictor of phosphorylation that was shown to significantly outperform methods based on PSSM alone. We would like to stress that to the best of our knowledge, this is the first example of using predicted from sequence NMR structure-based measures of conformational flexibility in solution for the prediction of other properties of proteins. Phosphorylation prediction methods typically employ a two-class classification approach with the limitation that the set of negative examples used for training may include some sites that are simply unknown to be phosphorylated. While one-class classification techniques have been considered in the past as a solution to this problem, their performance has not been systematically compared to two-class techniques. In this study, we developed and compared one- and two-class support vector machine (SVM)-based predictors for several commonly used sets of attributes. [These predictors are being made available at http://sable.cchmc.org/].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J Mol Biol. 292(2):195–202.
Adamczak R, Porollo A, Meller J (2004) Accurate prediction of solvent accessibility using neural networks-based regression. Proteins 56(4):7537–7567.
Pollastri G, Baldi P et al (2002) Prediction of coordination number and relative solvent accessibility in proteins. Proteins 47(2):142–153.
Altschul SF, Lipman DJ et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410.
Liu B, Wang X et al (2009) Prediction of protein binding sites in protein structures using hidden Markov support vector machine. BMC Bioinformatics. 10:381.
Blom N, Brunak S et al (1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J Mol Biol 294(5):1351–1362.
Blom N, Brunak S et al (2004) Prediction of posttranslational glycosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4(6):1633–1649.
Berry EA, Dalby AR, Yang ZR (2004) Reduced bio basis function neural network for identification of protein phosphorylation sites: comparison with pattern recognition algorithms. Comput Biol Chem 28(1):75–85.
Kim JH, Koh I et al (2004) Prediction of phosphorylation sites using SVMs, Bioinformatics 20(17):3179–3184.
Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M (2007) PHOSIDA (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol 8(11):R250.
Zhou FF, Yao X et al (2004) GPS: a novel group-based phosphorylation predicting and scoring method. Biochem Biophys Res Commun 325(4):1443–1448.
Xue Y, Yao X et al (2008) GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy. Mol Cell Proteomics 7(9):1598–1608.
Dang TH, Laukens K et al (2008) Prediction of kinase-specific phosphorylation sites using conditional random fields. Bioinformatics 24(24):2857–2864.
Yaffe MB, Cantley LC et al (2001) A motif-based profile scanning approach for genome-wide prediction of signaling pathways. Nat Biotechnol 19(4):348–353.
Obenauer JC, Yaffe MB et al (2003) Scansite 2.0: Proteome-wide prediction of cell signaling interactions using short sequence motifs. Nucleic Acids Res 31(13):3635–3641.
Li T, Fu H, Zhang X (2007) Prediction of kinase-specific phosphorylation sites by one-class SVMs. IEEE Int Conf Bioinform Biomed 217–222.
Kurgan L, Zhang H et al (2009) On the relation between residue flexibility and local solvent accessibility in proteins. Proteins 76(3):617–636.
Schlessinger A, Rost B (2005) Protein flexibility and rigidity predicted from sequence. Proteins 61(1):115–126.
Altschul SF, Lipman D et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402.
Blow D (2002) Outline of crystallography for biologists. Oxford University Press, New York, p 237.
Debye P (1913) Interferenz von Röntgenstrahlen und Wärmebewegung (in German). Ann d Phys 348(1):49–92.
Dunker AK, Obradovic Z (2001) The protein trinity-linking function and disorder. Nat Biotechnol 19:805–806.
Liu J, Tan H, Rost B (2002) Loopy proteins appear conserved in evolution. J Mol Biol 322:53–64.
Dyson HJ, Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6:197–208.
Yuan Z, Wang ZX et al (2003) Flexibility analysis of enzyme active sites by crystallographic temperature factors. Protein Eng 16:109–114.
Teague SJ (2003) Implications of protein flexibility for drug discovery. Nat Rev Drug Discov. 2:527–541.
Daniel RM, Smith JC et al (2003) The role of dynamics in enzyme activity. Annu Rev Biophys Biomol Str 32:69–92.
Dunker AK, Obradovic Z et al (2002) Intrinsic disorder and protein function. Biochemistry 41:6573–6582.
Tobi D, Bahar I (2005) Structural changes involved in protein binding correlate with intrinsic motions of proteins in the unbound state. Proc Natl Acad Sci USA 102:18908–18913.
Bhalla J, Storchan GB et al (2006) Local flexibility in molecular function paradigm. Mol Cell Proteomics 5:1212–1223.
Yuan Z, Bailey TL, Teasdale RD (2005) Prediction of protein B-factor profiles. Proteins 58(4):905–912.
Grzesiek S, Sass HJ (2009) From biomolecular structure to functional understanding: new NMR developments narrow the gap. Curr Opin Struct Biol 19(5):585–595.
Siegel GJ, Agranoff BW et al (1998) Basic Neurochem. – Molecular Cellular & Medical Aspects. LWW Publishers.
Structural Genomics Consortium website. http://www.sgc.ox.ac.uk/research/pds.html.
Cohen P (2002) Protein kinases – the major drug targets of the twenty-first century?. Nat Rev Drug Discov 1(4):309–315.
Sardari S, Nam NH et al (2003) Protein kinases and their modulation in the central nervous system. Curr Med Chem 3(4):341–364.
Lu KP, Zhou XZ et al (2002) Pinning down proline-directed phosphorylation signaling Trends Cell Biol 12(4):164–172.
Secko DM. Protein Phosphorylation: http://www.bioteach.ubc.ca/CellBiology/ProteinPhosphorylation/index.htm.
Manning G, Whyte DB et al (2002) The protein kinase complement of the human genome. Science 298(5600):1912–1934.
www.neb.com/nebecomm/tech reference/protein tools/protein kinase substrate recognition.asp.
Iakoucheva LM, Dunker AK et al (2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res 32(3):1037–1049.
Kreegipuu A, Blom N et al (1999) PhosphoBase, a database of phosphorylation sites. Nucleic Acids Res 27(1):237–239.
Diella F, Gibson TJ et al (2004) Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics 5(1):79.
Rychlewski L, Reimer U et al (2004) Target specificity analysis of the Abl kinase using peptide microarray data. J Mol Biol 336(2):307–311.
Schölkopf B, Smola AJ (2002) Learning with kernels. MIT Press.
Chen Y, Zhou X et al (2001) One-class svm for learning in image retrieval. Proc Intl Conf Image Process 1:34–37.
Manevitz LM, Yousef M (2002) One-class svms for document classification, J Mach-Learn Res 2:139–154.
Berman HM, Bourne PE et al (2000) The Protein Data Bank. Nucleic Acids Res 235–242.
Altschul SF, Lipman DJ et al (1990) Basic local alignment search tool. J Mol Biol 215:403–410.
Kabsch W, Sander C (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 22(12):2577–2637.
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines, Software available at http://www.csie.ntu.edu.tw/cjlin/libsvm.
Cao B, Meller J et al (2006) Enhanced recognition of protein transmembrane domains with prediction-based structural profiles. Bioinformatics 22(3):303–309.
Kyte J, Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein. J Mol Biol 157:105.
Available online at: http://folding.chmcc.org/online/am_acids/am_acids.html.
Acknowledgments
This work was supported in part by NIH grants GM067823 and P30-ES006096. Computational resources were made available by Cincinnati Children’s Hospital Research Foundation and University of Cincinnati College of Medicine.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this paper
Cite this paper
Swaminathan, K., Adamczak, R., Porollo, A., Meller, J. (2010). Enhanced Prediction of Conformational Flexibility and Phosphorylation in Proteins. In: Arabnia, H. (eds) Advances in Computational Biology. Advances in Experimental Medicine and Biology, vol 680. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-5913-3_35
Download citation
DOI: https://doi.org/10.1007/978-1-4419-5913-3_35
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-5912-6
Online ISBN: 978-1-4419-5913-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)