Abstract
Protein phosphorylation is the most extensive and important post-translational modification in eukaryotes, regulating the activity of almost all cells. Experimental methods used to identify phosphorylation sites, such as mass spectrometry, are costly and time-consuming. A number of algorithms have been developed to predict phosphorylation sites. However, they often select small data volume by random sampling. This cannot make full use of the characteristics of the entire data set to build a prediction model. According to the granularity calculation combined with the kernel fuzzy C-means clustering, this paper maps the massive raw data to a high-dimensional kernel space, and then divides the grains by clustering to obtain high-dimensional equilibrium grains. In particular, a specific granular support vector machine (KFCC–GSVM) prediction model is built in equilibrium grain data. This novel model improves the rationality and reliability of phosphorylation site data compression, so that the compressed data has the same distribution in the kernel space as the pre-compression data when applying the traditional SVM algorithm classification. Experimental results demonstrate that our method is better than the SVM-based non-kinase-specific phosphorylation site prediction method—Musite and the traditional GSVM method.
Similar content being viewed by others
References
Altschul SF, Madden TL, SchFfer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped blast and psi-blast: a new generation of protein database search programs. Nucl Acids Res 25:3389–402
Biswas AK, Noman N, Sikder AR (2010) Machine learning approach to predict protein phosphorylation sites by incorporating evolutionary information. Bmc Bioinform 11(1):273
Blom N, Gammeltoft S, Brunak S (1999a) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites 1. J Mol Biol 294(5):1351–62
Blom N, Kreegipuu A, Brunak S (1999b) Phosphobase: a database of phosphorylation sites. Nucl Acids Res 26(1):237–239
Brett T, Anthony K (2011) Computational prediction of eukaryotic phosphorylation sites. Bioinformatics 27(21):2927–2935
Chapelle O, Vapnik V, Bousquet O, Mukherjee S (2002) Choosing multiple parameters for support vector machines. Mach Learn 46(1–3):131–159
Chen Q, Wang Y, Chen B, Zhang C, Wang L, Li J (2017) Using propensity scores to predict the kinases of unannotated phosphopeptides. Knowl Based Syst 135:60–76
Chen Q, Deng C, Lan W, Liu Z, Zheng R, Liu J, Wang J (2019) Identifying interactions between kinases and substrates based on protein-protein interaction network. J Comput Biol
Ding S, Zhang X, An Y, Xue Y (2017) Weighted linear loss multiple birth support vector machine based on information granulation for multi-class classification. Pattern Recognit 67:32–46
Dou Y, Yao B, Zhang C (2014) Phosphosvm: prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. Amino Acids 46(6):1459–1469
Dunker AK, Oldfield CJ, Meng J, Romero P, Yang JY, Chen JW, Vacic V, Obradovic Z, Uversky VN (2008) The unfoldomics decade: an update on intrinsically disordered proteins. Bmc Genom 9(Suppl 2):S1–S1
Francesca D, Gould CM, Claudia C, Allegra V, Gibson TJ (2008) Phospho.elm: a database of phosphorylation sites–update 2008. Nucl Acids Res 36(Database issue):240–4
Francesca D, Gould CM, Claudia C, Allegra V, Gibson TJ (2011) Phospho.elm: a database of phosphorylation sites–update 2011. Nucl Acids Res 39(Database issue):D261–D267
Gao J, Agrawal GK, Thelen JJ, Obradovic Z, Dunker AK, Dong X (2009) A new machine learning approach for protein phosphorylation site prediction in plants. Lect Notes Comput Sci 5462/2009:18–29
Gao J, Thelen JJ, Dunker AK, Xu D (2010) Musite, a tool for global prediction of general and kinase-specific phosphorylation sites. Mol Cell Proteom 9(12):2586–2600
Girolami M (2002) Mercer kernel-based clustering in feature space. IEEE Trans Neural Netw 13(3):780–4
Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M (2007) Phosida (phosphorylation site database): management, structural and evolutionary investigation, and prediction of phosphosites. Genome Biol 8(11):R250
Grabiec AM, Korchynskyi O, Tak PP, Reedquist KA (2012) Histone deacetylase inhibitors suppress rheumatoid arthritis fibroblast-like synoviocyte and macrophage il-6 production by accelerating mrna decay. Ann Rheum Dis 71(3):424
Hasan MM, Khatun MS (2018) Prediction of protein post-translational modification sites: an overview. Ann Proteom Bioinform 2:049–057
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci USA 89(22):10915–9
Hjerrild M, Stensballe A, Rasmussen TE, Kofoed CB, Blom N, Sicheritz-Ponten T, Larsen MR, Brunak S, Jensen ON, Gammeltoft S (2004) Identification of phosphorylation sites in protein kinase a substrates using artificial neural networks and mass spectrometry. J Proteome Res 3(3):426
Hsieh CJ, Si S, Dhillon I (2014) A divide-and-conquer solver for kernel support vector machines. In: International conference on machine learning, pp 566–574
Hsien-Da H, Tzong-Yi L, Shih-Wei T, Jorng-Tzong H (2005) Kinasephos: a web tool for identifying protein kinase-specific phosphorylation sites. Nucl Acids Res 33(Web Server issue):226–9
Iakoucheva LM, Predrag R, Brown CJ, O’Connor TR, Sikes JG, Zoran O, Keith AD (2004) The importance of intrinsic disorder for protein phosphorylation. Nucl Acids Res 32(3):1037–49
Kennelly PJ, Krebs EG (1991) Consensus sequences as substrate specificity determinants for protein kinases and protein phosphatases. J Biol Chem 266(24):15555–15558
Lasko TA, Bhagwat JG, Zou KH, Ohno-Machado L (2005) The use of receiver operating characteristic curves in biomedical informatics. J Biomed Inform 38(5):404–415
Li Y, Cai YZ, Li YG, Xu XM (2004) Rough sets method for svm data preprocessing. In: IEEE conference on cybernetics & intelligent systems
Liu H, Cocea M (2017) Granular computing-based approach for classification towards reduction of bias in ensemble learning. Granul Comput 2(3):1–9
Liu P, You X (2017) Probabilistic linguistic todim approach for multiple attribute decision-making. Granul Comput 12:1–10
Livi L, Sadeghian A (2016) Granular computing, computational intelligence, and the analysis of non-geometric input spaces. Granul Comput 1(1):13–20
Obradovic PKVSRPDAZ (2008) Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 61(Suppl 7):176–182
Shim J, Sohn I, Kim S, Lee JW, Green PE, Hwang C (2009) Selecting marker genes for cancer classification using supervised weighted kernel clustering and the support vector machine. Comput Stat Data Anal 53(5):1736–1742
Sweet RM, Eisenberg D (1983) Correlation of sequence hydrophobicities measures similarity in three-dimensional protein structure. J Mol Biol 171(4):479–488
Tang Y (2006) Granular support vector machines based on granular computing, soft computing and statistical learning
Tang Y, Jin B, Zhang YQ (2005) Granular support vector machines with association rules mining for protein homology prediction. Artif Intell Med 35(1):121–134
Tuo Z, Hua Z, Ke C, Shiyi S, Jishou R, Lukasz K (2008) Accurate sequence-based prediction of catalytic residues. Bioinformatics 24(20):2329–2338
Wang G, Yang J, Xu J (2017) Granular computing: from granularity optimization to multi-granularity joint problem solving. Granul Comput 2(3):1–16
Wang W, Guo H (2009) Granular support vector machine learning model. J Shanxi Univ (Natural Science Edition) 4:11
Wilke G, Portmann E (2016) Granular computing as a basis of human-data interaction: a cognitive cities use case. Granul Comput 1(3):181–197
Wu KP, Wang SD (2006) Choosing the kernel parameters of support vector machines according to the inter-cluster distance. In: The 2006 IEEE international joint conference on neural network proceedings. IEEE, pp 1205–1211
Wu KP, Wang SD (2009) Choosing the kernel parameters for support vector machines by the inter-cluster distance in the feature space. Pattern Recognit 42(5):710–717
Wu ZD, Xie WX, Yu JP (2003) Fuzzy c-means clustering algorithm based on kernel method. In: International conference on computational intelligence & multimedia applications
Xue Y, Li A, Wang L, Feng H, Yao X (2006) Ppsp: prediction of pk-specific phosphorylation site with bayesian decision theory. Bmc Bioinform 7(1):163
Yu H, Yang J, Han J, Li X (2005) Making svms scalable to large data sets using hierarchical cluster indexing. Data Min Knowl Discov 11(3):295–321
Zavialova MG, Zgoda VG, Nikolaev EN (2017) Analysis of the role of protein phosphorylation in the development of diseases. Biochem Suppl 11(3):203–218
Zhang X (1999) Using class-center vectors to build support vector machines. In: Neural networks for signal processing IX, IEEE signal processing society workshop
Zhao H, Wang Z, Men J (2007) Facial complex expression recognition based on fuzzy kernel clustering and support vector machines. In: Third international conference on natural computation (ICNC 2007), vol 1. IEEE, pp 562–566
Zhong C, Pedrycz W, Wang D, Li L, Li Z (2016) Granular data imputation: a framework of granular computing. Appl Soft Comput 46:307–316
Zulawski M, Braginets R, Schulze WX (2013) Phosphat goes kinases-searchable protein kinase target information in the plant phosphorylation site database phosPhAt. Nucl Acids Res 41(D1):D1176–D1184
Acknowledgements
The work reported in this paper was partially supported by a National Natural Science Foundation of China project 61751314, a National Natural Science Foundation of China project 61963004, and a key project of Natural Science Foundation of Guangxi 2017GXNSFDA198033 and a key research and development plan of Guangxi AB17195055.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cheng, G., Chen, Q. & Zhang, R. Prediction of phosphorylation sites based on granular support vector machine. Granul. Comput. 6, 107–117 (2021). https://doi.org/10.1007/s41066-019-00202-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41066-019-00202-5