Skip to main content

An Approach to Find Proper Execution Parameters of n-Gram Encoding Method Based on Protein Sequence Classification

  • Conference paper
  • First Online:
Advances in Computing and Data Sciences (ICACDS 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1046))

Included in the following conference series:

Abstract

Various protein sequence classification approaches are developed to classify unknown sequences in to its classes or familes with an certain accuracy. Features extraction from protein sequence is a key technique to implement all approaches. N-gram encoding method is a popular feature extraction procedure. But to maintain the low computational time and high accuracy level of classification, it requires to fix up the upper limit of ‘N’ of N-gram encoding method. On the other hand, the standard deviation value of protein sequence is one of the important feature value which is extracted by N-gram encoding method. This feature can be extracted by two different ways like standard deviation calculation using standard mean value and using floating mean value. It is also important to find proper method to calculate the value of standard deviation. In this paper, an investigational proof has done to find upper limit of N-gram encoding method as well as find the proper technique to calculate the standard deviation value as a feature which are extracted from unknown protein sequence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bentley, D.R.: The human genome project-an overview. Med. Res. Rev. 20(3), 189–196 (2000)

    Article  Google Scholar 

  2. Apweiler, R., et al.: UniProt: the universal protein knowledgebase. Nucleic Acids Res. 32(DATABASE ISS.), D115–D119 (2004)

    Article  Google Scholar 

  3. Vipsita, S., Shee, B.K., Rath, S.K.: An efficient technique for protein classification using feature extraction by artificial neural networks. In: Proceedings of the Annual IEEE India Conference (INDICON), Kolkata, India, pp. 1–5 (2010)

    Google Scholar 

  4. Wang, J.T.L., Ma, Q.H., Shasha, D., Wu, C.H.: Application of neural networks to biological data mining: a case study in protein sequence classification. In: KDD, Boston, pp. 305–309 (2000)

    Google Scholar 

  5. Zainuddin, Z., et al.: Radial basic function neural networks in protein sequence classification. Malays. J. Math. Sci. 2, 195–204 (2008)

    Google Scholar 

  6. Nageswara Rao, P.V., Uma Devi, T., Kaladhar, D., Sridhar, G., Rao, A.A.: A probabilistic neural network approach for protein superfamily classification. J. Theor. Appl. Inf. Technol. (2009)

    Google Scholar 

  7. Mohamed, S., Rubin, D., Marwala, T.: Multi-class protein sequence classification using Fuzzy ARTMAP. In: IEEE Conference, pp. 1676–1680 (2006)

    Google Scholar 

  8. Mansoori, E.G., Zolghadri, M.J., Katebi, S.D., Mohabatkar, H., Boostani, R., Sadreddini, M.H.: Generating fuzzy rules for protein classification. Iran. J. Fuzzy Syst. 5(2), 21–33 (2008)

    MathSciNet  MATH  Google Scholar 

  9. Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Res. 31, 3692–3697 (2003)

    Article  Google Scholar 

  10. Saha, S., Chaki, R.: Application of data mining in protein sequence classification. IJDMS 4(5), 103–118 (2012)

    Article  Google Scholar 

  11. Saha, S., et al.: A brief review of data mining application involving protein sequence classification. In: Meghanathan, N., Nagamalai, D., Chaki, N. (eds.) Advances in Computing and Information Technology. AISC, vol. 177. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-31552-7_48

    Chapter  Google Scholar 

  12. Spalding, J.D., Hoyle, D.C.: Accuracy of string kernels for protein sequence classification. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 454–460. Springer, Heidelberg (2005). https://doi.org/10.1007/11551188_49

    Chapter  Google Scholar 

  13. Zaki, N.M., Deri, S., Illias, R.M.: Protein sequences classification based on string weighting scheme. Int. J. Comput. Internet Manage. 13(1), 50–60 (2005)

    Google Scholar 

  14. Ali, A.F., Shawky, D.M.: A novel approach for protein classification using fourier transform. Int. J. Eng. Appl. Sci. 6, 4 (2010)

    Google Scholar 

  15. Boujenfa, K., Essoussi, N., Limam, M.: Tree-kNN: a tree-based algorithm for protein sequence classification. IJCSE 3, 961–968 (2011). ISSN: 0975-3397

    Google Scholar 

  16. Desai, P.: Sequence classification using hidden markov models. Electronic thesis or Dissertation (2005). https://etd.ohiolink.edu/

  17. Rahman, M.M., Arif Ul Alam, A.-A.-M., Mursalin, T.E.: A more appropriate protein classification using data mining. JATIT, 33–43 (2010)

    Google Scholar 

  18. Caragea, C., et al.: Protein sequence classification using feature hashing. Proteome Sci. 10(Suppl 1), S14 (2012). https://doi.org/10.1186/1477-5956-10-S1-S14

    Article  Google Scholar 

  19. Zhao, X.-M., Huang, D.-S., Cheung, Y., Wang, H., Huang, X.: A novel hybrid GA/SVM system for protein sequences classification. In: Yang, Z.R., Yin, H., Everson, R.M. (eds.) IDEAL 2004. LNCS, vol. 3177, pp. 11–16. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28651-6_2

    Chapter  Google Scholar 

  20. Saha, S., Bhattacharya, T.: A novel approach to find the saturation point of n-Gram encoding method for protein sequence classification involving data mining. In: Bhattacharyya, S., Hassanien, A.E., Gupta, D., Khanna, A., Pan, I. (eds.) International Conference on Innovative Computing and Communications. LNNS, vol. 56, pp. 101–108. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-2354-6_12

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Suprativ Saha .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Saha, S., Bhattacharya, T. (2019). An Approach to Find Proper Execution Parameters of n-Gram Encoding Method Based on Protein Sequence Classification. In: Singh, M., Gupta, P., Tyagi, V., Flusser, J., Ören, T., Kashyap, R. (eds) Advances in Computing and Data Sciences. ICACDS 2019. Communications in Computer and Information Science, vol 1046. Springer, Singapore. https://doi.org/10.1007/978-981-13-9942-8_28

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-9942-8_28

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-9941-1

  • Online ISBN: 978-981-13-9942-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics