Skip to main content
Log in

Prediction of phosphorylation sites based on granular support vector machine

  • Published:
Granular Computing Aims and scope Submit manuscript

Abstract

Protein phosphorylation is the most extensive and important post-translational modification in eukaryotes, regulating the activity of almost all cells. Experimental methods used to identify phosphorylation sites, such as mass spectrometry, are costly and time-consuming. A number of algorithms have been developed to predict phosphorylation sites. However, they often select small data volume by random sampling. This cannot make full use of the characteristics of the entire data set to build a prediction model. According to the granularity calculation combined with the kernel fuzzy C-means clustering, this paper maps the massive raw data to a high-dimensional kernel space, and then divides the grains by clustering to obtain high-dimensional equilibrium grains. In particular, a specific granular support vector machine (KFCC–GSVM) prediction model is built in equilibrium grain data. This novel model improves the rationality and reliability of phosphorylation site data compression, so that the compressed data has the same distribution in the kernel space as the pre-compression data when applying the traditional SVM algorithm classification. Experimental results demonstrate that our method is better than the SVM-based non-kinase-specific phosphorylation site prediction method—Musite and the traditional GSVM method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Altschul SF, Madden TL, SchFfer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucl Acids Res 25:3389–402

    Article  Google Scholar 

  • Biswas AK, Noman N, Sikder AR (2010) Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. Bmc Bioinform 11(1):273

    Article  Google Scholar 

  • Blom N, Gammeltoft S, Brunak S (1999a) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites 1. J Mol Biol 294(5):1351–62

    Article  Google Scholar 

  • Blom N, Kreegipuu A, Brunak S (1999b) Phosphobase: a database of phosphorylation sites. Nucl Acids Res 26(1):237–239

    Google Scholar 

  • Brett T, Anthony K (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27(21):2927–2935

    Article  Google Scholar 

  • Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3):131–159

    Article  MATH  Google Scholar 

  • Chen Q, Wang Y, Chen B, Zhang C, Wang L, Li J (2017) Using propensity scores to predict the kinases of unannotated phosphopeptides. Knowl Based Syst 135:60–76

    Article  Google Scholar 

  • Chen Q, Deng C, Lan W, Liu Z, Zheng R, Liu J, Wang J (2019) Identifying interactions between kinases and substrates based on protein-protein interaction network. J Comput Biol

  • Ding S, Zhang X, An Y, Xue Y (2017) Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification. Pattern Recognit 67:32–46

    Article  Google Scholar 

  • Dou Y, Yao B, Zhang C (2014) Phosphosvm: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids 46(6):1459–1469

    Article  Google Scholar 

  • Dunker AK, Oldfield CJ, Meng J, Romero P, Yang JY, Chen JW, Vacic V, Obradovic Z, Uversky VN (2008) The unfoldomics decade: an update on intrinsically disordered proteins. Bmc Genom 9(Suppl 2):S1–S1

    Article  Google Scholar 

  • Francesca D, Gould CM, Claudia C, Allegra V, Gibson TJ (2008) Phospho.elm: a database of phosphorylation sites–update 2008. Nucl Acids Res 36(Database issue):240–4

    Google Scholar 

  • Francesca D, Gould CM, Claudia C, Allegra V, Gibson TJ (2011) Phospho.elm: a database of phosphorylation sites–update 2011. Nucl Acids Res 39(Database issue):D261–D267

    Google Scholar 

  • Gao J, Agrawal GK, Thelen JJ, Obradovic Z, Dunker AK, Dong X (2009) A new machine learning approach for protein phosphorylation site prediction in plants. Lect Notes Comput Sci 5462/2009:18–29

    Article  Google Scholar 

  • Gao J, Thelen JJ, Dunker AK, Xu D (2010) Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol Cell Proteom 9(12):2586–2600

    Article  Google Scholar 

  • Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13(3):780–4

    Article  Google Scholar 

  • Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M (2007) Phosida (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol 8(11):R250

    Article  Google Scholar 

  • Grabiec AM, Korchynskyi O, Tak PP, Reedquist KA (2012) Histone deacetylase inhibitors suppress rheumatoid arthritis fibroblast-like synoviocyte and macrophage il-6 production by accelerating mrna decay. Ann Rheum Dis 71(3):424

    Article  Google Scholar 

  • Hasan MM, Khatun MS (2018) Prediction of protein post-translational modification sites: an overview. Ann Proteom Bioinform 2:049–057

    Google Scholar 

  • Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–9

    Article  Google Scholar 

  • Hjerrild M, Stensballe A, Rasmussen TE, Kofoed CB, Blom N, Sicheritz-Ponten T, Larsen MR, Brunak S, Jensen ON, Gammeltoft S (2004) Identification of phosphorylation sites in protein kinase a substrates using artificial neural networks and mass spectrometry. J Proteome Res 3(3):426

    Article  Google Scholar 

  • Hsieh CJ, Si S, Dhillon I (2014) A divide-and-conquer solver for kernel support vector machines. In: International conference on machine learning, pp 566–574

  • Hsien-Da H, Tzong-Yi L, Shih-Wei T, Jorng-Tzong H (2005) Kinasephos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucl Acids Res 33(Web Server issue):226–9

    Google Scholar 

  • Iakoucheva LM, Predrag R, Brown CJ, O’Connor TR, Sikes JG, Zoran O, Keith AD (2004) The importance of intrinsic disorder for protein phosphorylation. Nucl Acids Res 32(3):1037–49

    Article  Google Scholar 

  • Kennelly PJ, Krebs EG (1991) Consensus sequences as substrate specificity determinants for protein kinases and protein phosphatases. J Biol Chem 266(24):15555–15558

    Article  Google Scholar 

  • Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform 38(5):404–415

    Article  Google Scholar 

  • Li Y, Cai YZ, Li YG, Xu XM (2004) Rough sets method for svm data preprocessing. In: IEEE conference on cybernetics & intelligent systems

  • Liu H, Cocea M (2017) Granular computing-based approach for classification towards reduction of bias in ensemble learning. Granul Comput 2(3):1–9

    Article  Google Scholar 

  • Liu P, You X (2017) Probabilistic linguistic todim approach for multiple attribute decision-making. Granul Comput 12:1–10

    Google Scholar 

  • Livi L, Sadeghian A (2016) Granular computing, computational intelligence, and the analysis of non-geometric input spaces. Granul Comput 1(1):13–20

    Article  Google Scholar 

  • Obradovic PKVSRPDAZ (2008) Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 61(Suppl 7):176–182

    Google Scholar 

  • Shim J, Sohn I, Kim S, Lee JW, Green PE, Hwang C (2009) Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine. Comput Stat Data Anal 53(5):1736–1742

    Article  MathSciNet  MATH  Google Scholar 

  • Sweet RM, Eisenberg D (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 171(4):479–488

    Article  Google Scholar 

  • Tang Y (2006) Granular support vector machines based on granular computing, soft computing and statistical learning

  • Tang Y, Jin B, Zhang YQ (2005) Granular support vector machines with association rules mining for protein homology prediction. Artif Intell Med 35(1):121–134

    Article  Google Scholar 

  • Tuo Z, Hua Z, Ke C, Shiyi S, Jishou R, Lukasz K (2008) Accurate sequence-based prediction of catalytic residues. Bioinformatics 24(20):2329–2338

    Article  Google Scholar 

  • Wang G, Yang J, Xu J (2017) Granular computing: from granularity optimization to multi-granularity joint problem solving. Granul Comput 2(3):1–16

    Google Scholar 

  • Wang W, Guo H (2009) Granular support vector machine learning model. J Shanxi Univ (Natural Science Edition) 4:11

    Google Scholar 

  • Wilke G, Portmann E (2016) Granular computing as a basis of human-data interaction: a cognitive cities use case. Granul Comput 1(3):181–197

    Article  Google Scholar 

  • Wu KP, Wang SD (2006) Choosing the kernel parameters of support vector machines according to the inter-cluster distance. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 1205–1211

  • Wu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recognit 42(5):710–717

    Article  MathSciNet  MATH  Google Scholar 

  • Wu ZD, Xie WX, Yu JP (2003) Fuzzy c-means clustering algorithm based on kernel method. In: International conference on computational intelligence & multimedia applications

  • Xue Y, Li A, Wang L, Feng H, Yao X (2006) Ppsp: prediction of pk-specific phosphorylation site with bayesian decision theory. Bmc Bioinform 7(1):163

    Article  Google Scholar 

  • Yu H, Yang J, Han J, Li X (2005) Making svms scalable to large data sets using hierarchical cluster indexing. Data Min Knowl Discov 11(3):295–321

    Article  MathSciNet  Google Scholar 

  • Zavialova MG, Zgoda VG, Nikolaev EN (2017) Analysis of the role of protein phosphorylation in the development of diseases. Biochem Suppl 11(3):203–218

    Google Scholar 

  • Zhang X (1999) Using class-center vectors to build support vector machines. In: Neural networks for signal processing IX, IEEE signal processing society workshop

  • Zhao H, Wang Z, Men J (2007) Facial complex expression recognition based on fuzzy kernel clustering and support vector machines. In: Third international conference on natural computation (ICNC 2007), vol 1. IEEE, pp 562–566

  • Zhong C, Pedrycz W, Wang D, Li L, Li Z (2016) Granular data imputation: a framework of granular computing. Appl Soft Comput 46:307–316

    Article  Google Scholar 

  • Zulawski M, Braginets R, Schulze WX (2013) Phosphat goes kinases-searchable protein kinase target information in the plant phosphorylation site database phosPhAt. Nucl Acids Res 41(D1):D1176–D1184

    Article  Google Scholar 

Download references

Acknowledgements

The work reported in this paper was partially supported by a National Natural Science Foundation of China project 61751314, a National Natural Science Foundation of China project 61963004, and a key project of Natural Science Foundation of Guangxi 2017GXNSFDA198033 and a key research and development plan of Guangxi AB17195055.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingfeng Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cheng, G., Chen, Q. & Zhang, R. Prediction of phosphorylation sites based on granular support vector machine. Granul. Comput. 6, 107–117 (2021). https://doi.org/10.1007/s41066-019-00202-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41066-019-00202-5

Keywords

Navigation