An Approach to Find Proper Execution Parameters of n-Gram Encoding Method Based on Protein Sequence Classification

Saha, Suprativ; Bhattacharya, Tanmay

doi:10.1007/978-981-13-9942-8_28

Suprativ Saha¹³ &
Tanmay Bhattacharya¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1046))

Included in the following conference series:

International Conference on Advances in Computing and Data Sciences

1647 Accesses
3 Citations

Abstract

Various protein sequence classification approaches are developed to classify unknown sequences in to its classes or familes with an certain accuracy. Features extraction from protein sequence is a key technique to implement all approaches. N-gram encoding method is a popular feature extraction procedure. But to maintain the low computational time and high accuracy level of classification, it requires to fix up the upper limit of ‘N’ of N-gram encoding method. On the other hand, the standard deviation value of protein sequence is one of the important feature value which is extracted by N-gram encoding method. This feature can be extracted by two different ways like standard deviation calculation using standard mean value and using floating mean value. It is also important to find proper method to calculate the value of standard deviation. In this paper, an investigational proof has done to find upper limit of N-gram encoding method as well as find the proper technique to calculate the standard deviation value as a feature which are extracted from unknown protein sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bentley, D.R.: The human genome project-an overview. Med. Res. Rev. 20(3), 189–196 (2000)
Article Google Scholar
Apweiler, R., et al.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 32(DATABASE ISS.), D115–D119 (2004)
Article Google Scholar
Vipsita, S., Shee, B.K., Rath, S.K.: An efficient technique for protein classification using feature extraction by artificial neural networks. In: Proceedings of the Annual IEEE India Conference (INDICON), Kolkata, India, pp. 1–5 (2010)
Google Scholar
Wang, J.T.L., Ma, Q.H., Shasha, D., Wu, C.H.: Application of neural networks to biological data mining: a case study in protein sequence classification. In: KDD, Boston, pp. 305–309 (2000)
Google Scholar
Zainuddin, Z., et al.: Radial basic function neural networks in protein sequence classification. Malays. J. Math. Sci. 2, 195–204 (2008)
Google Scholar
Nageswara Rao, P.V., Uma Devi, T., Kaladhar, D., Sridhar, G., Rao, A.A.: A probabilistic neural network approach for protein superfamily classification. J. Theor. Appl. Inf. Technol. (2009)
Google Scholar
Mohamed, S., Rubin, D., Marwala, T.: Multi-class protein sequence classification using Fuzzy ARTMAP. In: IEEE Conference, pp. 1676–1680 (2006)
Google Scholar
Mansoori, E.G., Zolghadri, M.J., Katebi, S.D., Mohabatkar, H., Boostani, R., Sadreddini, M.H.: Generating fuzzy rules for protein classification. Iran. J. Fuzzy Syst. 5(2), 21–33 (2008)
MathSciNet MATH Google Scholar
Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res. 31, 3692–3697 (2003)
Article Google Scholar
Saha, S., Chaki, R.: Application of data mining in protein sequence classification. IJDMS 4(5), 103–118 (2012)
Article Google Scholar
Saha, S., et al.: A brief review of data mining application involving protein sequence classification. In: Meghanathan, N., Nagamalai, D., Chaki, N. (eds.) Advances in Computing and Information Technology. AISC, vol. 177. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-31552-7_48
Chapter Google Scholar
Spalding, J.D., Hoyle, D.C.: Accuracy of string kernels for protein sequence classification. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 454–460. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_49
Chapter Google Scholar
Zaki, N.M., Deri, S., Illias, R.M.: Protein sequences classification based on string weighting scheme. Int. J. Comput. Internet Manage. 13(1), 50–60 (2005)
Google Scholar
Ali, A.F., Shawky, D.M.: A novel approach for protein classification using fourier transform. Int. J. Eng. Appl. Sci. 6, 4 (2010)
Google Scholar
Boujenfa, K., Essoussi, N., Limam, M.: Tree-kNN: a tree-based algorithm for protein sequence classification. IJCSE 3, 961–968 (2011). ISSN: 0975-3397
Google Scholar
Desai, P.: Sequence classification using hidden markov models. Electronic thesis or Dissertation (2005). https://etd.ohiolink.edu/
Rahman, M.M., Arif Ul Alam, A.-A.-M., Mursalin, T.E.: A more appropriate protein classification using data mining. JATIT, 33–43 (2010)
Google Scholar
Caragea, C., et al.: Protein sequence classification using feature hashing. Proteome Sci. 10(Suppl 1), S14 (2012). https://doi.org/10.1186/1477-5956-10-S1-S14
Article Google Scholar
Zhao, X.-M., Huang, D.-S., Cheung, Y., Wang, H., Huang, X.: A novel hybrid GA/SVM system for protein sequences classification. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 11–16. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28651-6_2
Chapter Google Scholar
Saha, S., Bhattacharya, T.: A novel approach to find the saturation point of n-Gram encoding method for protein sequence classification involving data mining. In: Bhattacharyya, S., Hassanien, A.E., Gupta, D., Khanna, A., Pan, I. (eds.) International Conference on Innovative Computing and Communications. LNNS, vol. 56, pp. 101–108. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-2354-6_12
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Brainware University, Barasat, Kolkata, 700125, India
Suprativ Saha
Department of Information Technology, Techno India, Saltlake, Kolkata, 700091, India
Tanmay Bhattacharya

Authors

Suprativ Saha
View author publications
You can also search for this author in PubMed Google Scholar
Tanmay Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suprativ Saha .

Editor information

Editors and Affiliations

University of KwaZulu-Natal, Durban, South Africa
Mayank Singh
Computer Science and Engineering, Jaypee Institute of Information Technology, Waknaghat, Himachal Pradesh, India
P.K. Gupta
Department of Computer Science and Engineering, Jaypee University of Engineering and Technology, Guna, Madhya Pradesh, India
Vipin Tyagi
ÚTIA AV ČR, Institute of Information Theory and Automation, Prague 8, Praha, Czech Republic
Jan Flusser
School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, ON, Canada
Tuncer Ören
CSE Department, Inderprastha Engineering College, Ghaziabad, Uttar Pradesh, India
Rekha Kashyap

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saha, S., Bhattacharya, T. (2019). An Approach to Find Proper Execution Parameters of n-Gram Encoding Method Based on Protein Sequence Classification. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T., Kashyap, R. (eds) Advances in Computing and Data Sciences. ICACDS 2019. Communications in Computer and Information Science, vol 1046. Springer, Singapore. https://doi.org/10.1007/978-981-13-9942-8_28

Download citation

DOI: https://doi.org/10.1007/978-981-13-9942-8_28
Published: 19 July 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-9941-1
Online ISBN: 978-981-13-9942-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics