Skip to main content
Log in

Classification of Textual Sentiment Using Ensemble Technique

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

In recent years, the widespread use of the Internet has resulted in a revolutionary way for people to share their feelings or sentiment on blogs, social media, e-commerce sites, and online platforms. Most of the feelings expressed on the online platforms are in textual forms (such as status, tweets, comments, and reviews). These textual expressions are unstructured, laborious, and time-consuming to organize, manipulate, or efficient storage due to their messy forms. Textual sentiment analysis refers to the automatic process of assigning an expression or text to an appropriate polarity (positive, negative, and neutral). Although Bengali is ranked seventh most popular language globally and the second famous Indic language, the development of language processing tools is minimal to date. This paper proposes an ensemble-based technique to classify Bengali textual sentiment into two categories: positive and negative. Due to the unavailability of the Bengali sentiment corpus, this work also developed a dataset (called ‘Bengali Sentiment Analysis Dataset or BSaD’) containing 8122 text expressions. This work investigates eight popular baseline classifiers [such as Logistic Regression (LR), Randon Forest (RF), Decision Tree (DT), K-nearest Neighbor (KNN), Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), Stochastic Gradient Descent, and AdaBoost] with Term frequency-Inverse document frequency (TF-IDF) and Bag-of-words (BoW) feature for textual sentiment analysis on three datasets. This work also investigates the four ensemble methods (LR + RF, RF + SVM, LR + SVM, and LR + RF + SVM) developed by combining three best-performing base classifiers (LR, RF, and SVM). Experimental results show that the ensemble approach (i.e., LR + RF + SVM) with TF-IDF (uni-gram + bi-gram + tri-gram) features outperformed the other classifier models achieving the highest accuracy 82% on the developed dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. https://www.wordclouds.com/.

References

  1. Akhtar MS, Ekbal A, Cambria E. How intense are you? Predicting intensities of emotions and sentiments using stacked ensemble [application notes]. Comput Intell Mag. 2020;15(1):64–75. https://doi.org/10.1109/MCI.2019.2954667.

    Article  Google Scholar 

  2. Akhtar MS, Gupta D, Ekbal A, Bhattacharyya P. Feature selection and ensemble construction. Knowl Based Syst. 2017;125(C):116–35. https://doi.org/10.1016/j.knosys.2017.03.020.

    Article  Google Scholar 

  3. Amrani YA, Lazaar M, Kadiria KEE. Random forest and support vector machine based hybrid approach to sentiment analysis. Procedia Comput Sci. 2018;127:511–20.

    Article  Google Scholar 

  4. Bakar A, Razi MF, Norisma I, Liyana S, Norazlina K. Sentiment analysis of noisy Malay text: state of art, challenges and future work. IEEE Access. 2020;8:24687–96.

    Article  Google Scholar 

  5. Banglapedia: Bangla language. 2019. https://www.kaggle.com/tazimhoque/bengali-sentiment-text. Accessed 23 Mar 2020.

  6. Chowdhury RR, Hossain MS, Hossain S, Andersson K. Analyzing sentiment of movie reviews in Bangla by applying machine learning techniques. In: International conference on Bangla speech and language processing (ICBSLP). IEEE; 2019. p. 1–6.

  7. Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960;20(1):37–46.

    Article  Google Scholar 

  8. Das A, Iqbal MA, Sharif O, Hoque MM. BEmoD: development of Bengali emotion dataset for classifying expressions of emotion in texts. In: Intelligent computing and optimization. ICO 2020. Advances in intelligent systems and computing, vol. 1324. Berlin: Springer; 2021. p. 1124–36.

  9. Das A, Sharif O, Hoque MM, Sarker IH. Emotion classification in a resource constrained language using transformer-based approach. In: Proceedings of the 2021 conference of the North American chapter of the association for computational linguistics: student research workshop. Association for Computational Linguistics; 2021. p. 150–8 (Online). https://doi.org/10.18653/v1/2021.naacl-srw.19. https://aclanthology.org/2021.naacl-srw.19

  10. Dashtipour K, Ieracitano C, Morabito FC, Raza A, Hussain A. An ensemble based classification approach for persian sentiment analysis. In: Progresses in artificial intelligence and neural systems. Singapore: Springer; 2021. p. 207–15.

  11. Gamal D, Alfonse M, El-Horbaty ESM, Salem ABM. Analysis of machine learning algorithms for opinion mining in different domains. Mach Learn Knowl Extr. 2019;1(1):224–34.

    Article  Google Scholar 

  12. Garg K, Lobiyal DK. Hindi EmotionNet: a scalable emotion lexicon for sentiment classification of Hindi text. ACM Trans Asian Low Resour Lang Inf Process. 2020;19(4):1–35.

    Article  Google Scholar 

  13. Hossain E, Sharif O, Hoque MM. Sentiment polarity detection on Bengali book reviews using multinomial naive Bayes. 2020. arXiv preprint arXiv:2007.02758.

  14. Hossain E, Sharif O, Hoque MM. NLP-CUET@LT-EDI-EACL2021: multilingual code-mixed hope speech detection using cross-lingual representation learner. In: Proceedings of the first workshop on language technology for equality, diversity and inclusion. Kyiv: Association for Computational Linguistics; 2021. p. 168–74. https://aclanthology.org/2021.ltedi-1.25.

  15. Hossain E, Sharif O, Hoque MM, Sarker IH. SentiLSTM: a deep learning approach for sentiment analysis of restaurant reviews. 2020. arXiv preprint arXiv:2011.09684.

  16. Islam MS, Islam MA, Hossain MA, Dey JJ. Supervised approach of sentimentality extraction from Bengali facebook status. In: 2016 19th International conference on computer and information technology (ICCIT). IEEE; 2016. p. 383–7.

  17. Lai Y, Zhang L, Han D, Zhou R, Wang G. Fine-grained emotion classification of Chinese microblogs based on graph convolution networks. World Wide Web. 2020;23(5):2771–87.

    Article  Google Scholar 

  18. Le CC, Prasad P, Alsadoon A, Pham L, Elchouemi A. Text classification: Naïve Bayes classifier with sentiment lexicon. IAENG Int J Comput Sci. 2019;46(2):141–8.

    Google Scholar 

  19. Luo L. Network text sentiment analysis method combining LDA text representation and GRU-CNN. Pers Ubiquitous Comput. 2019;23:405–12.

    Article  Google Scholar 

  20. Magatti D, Calegari S, Ciucci D, Stella F. Automatic labeling of topics. In: 2009 Ninth international conference on intelligent systems design and applications. IEEE; 2009. p. 1227–32.

  21. Mamta AE, Bhattacharyya P, Srivastava S, Kumar A, Saha T. Multi-domain tweet corpora for sentiment analysis: resource creation and evaluation. In: Proceedings of the 12th LREC. Marseille: European Language Resources Association; 2020. p. 5046–54.

  22. Prabowo R, Thelwall M. Sentiment analysis: a combined approach. J Informetr. 2009;3(2):143–57.

    Article  Google Scholar 

  23. Pranckevičius T, Marcinkevičius V. Application of logistic regression with part-of-the-speech tagging for multi-class text classification. In: 2016 IEEE 4th workshop on advances in information, electronic and electrical engineering (AIEEE). IEEE; 2016. p. 1–5.

  24. Rahman M, Kumar Dey E, et al. Datasets for aspect-based sentiment analysis in Bangla and its baseline evaluation. Data. 2018;3(2):15.

    Article  Google Scholar 

  25. Sarkar K. Sentiment polarity detection in Bengali tweets using LSTM recurrent neural networks. In: 2019 Second international conference on advanced computational and communication paradigms (ICACCP). IEEE; 2019. p. 1–6.

  26. Sarkar K. Heterogeneous classifier ensemble for sentiment analysis of Bengali and Hindi tweets. Sādhanā. 2020;45(1):1–17.

    Article  Google Scholar 

  27. Sarkar K, Bhowmick M. Sentiment polarity detection in Bengali tweets using multinomial naïve Bayes and support vector machines. In: 2017 IEEE Calcutta conference (CALCON). IEEE; 2017. p. 31–6.

  28. Schapire RE. Explaining adaboost. 2013. https://doi.org/10.1007/978-3-642-41136-6_5.

  29. Sharif O, Hoque MM. Identification and classification of textual aggression in social media: resource creation and evaluation. In: Chakraborty T, Shu K, Bernard HR, Liu H, Akhtar MS, editors. Combating online hostile posts in regional languages during emergency situation. Cham: Springer; 2021. p. 9–20.

    Chapter  Google Scholar 

  30. Sharif O, Hoque MM, Hossain E. Sentiment analysis of Bengali texts on online restaurant reviews using multinomial naïve Bayes. In: International conference on advances in science, engineering and robotics technology (ICASERT). IEEE; 2019. p. 1–6.

  31. Sharif O, Hoque MM, Kayes ASM, Nowrozy R, Sarker IH. Detecting suspicious texts using machine learning techniques. Appl Sci. 2020;10(18). https://doi.org/10.3390/app10186527.

  32. Sharif O, Hossain E, Hoque MM. Combating hostility: Covid-19 fake news and hostile post detection in social media. 2021. arXiv preprint arXiv:2101.03291.

  33. Tabassum N, Khan MI. Design an empirical framework for sentiment analysis from Bangla text using machine learning. In: Proceedings of ECCE. IEEE; 2019. p. 1–5.

  34. Taher S, Akhter K, Hasan KM. Bangla dataset for opinion mining. 2018. https://doi.org/10.13140/RG.2.2.20214.96327.

  35. Taher SA, Akhter KA, Hasan KA. N-gram based sentiment mining for Bangla text using support vector machine. In: 2018 International conference on Bangla speech and language processing (ICBSLP). IEEE; 2018. p. 1–5.

  36. Tan S. An effective refinement strategy for KNN text classifier. Expert Syst Appl. 2006;30(2):290–8. https://doi.org/10.1016/j.eswa.2005.07.019.

    Article  Google Scholar 

  37. Tokunaga T, Makoto I. Text categorization based on weighted inverse document frequency. In: Special interest groups and information process Society of Japan (SIG-IPSJ). Citeseer; 1994.

  38. Wahid MF, Hasan MJ, Alom MS. Cricket sentiment analysis from Bangla text using recurrent neural network with long short term memory model. In: International conference on Bangla speech and language processing (ICBSLP). IEEE; 2019. p. 1–4.

  39. Xia H, Yang Y, Pan X, Zhang Z, An W. Sentiment analysis for online reviews using conditional random fields and support vector machines. Electron Commer Res. 2020;20(2):343–60.

    Article  Google Scholar 

  40. Xu G, Yu Z, Yao H, Li F, Meng Y, Wu X. Chinese text sentiment analysis based on extended sentiment dictionary. IEEE Access. 2019;7:43749–62.

    Article  Google Scholar 

  41. Zhang T. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the twenty-first international conference on machine learning, ICML ’04. New York: Association for Computing Machinery; 2004. p. 116. https://doi.org/10.1145/1015330.1015332.

  42. Zhang Y, Jin R, Zhou ZH. Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern. 2010;1(1–4):43–52.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Moshiul Hoque.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Enabling Innovative Computational Intelligence Technologies for IOT” guest edited by Omer Rana, Rajiv Misra, Alexander Pfeiffer, Luigi Troiano, and Nishtha Kesswani.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mamun, M.M.R., Sharif, O. & Hoque, M.M. Classification of Textual Sentiment Using Ensemble Technique. SN COMPUT. SCI. 3, 49 (2022). https://doi.org/10.1007/s42979-021-00922-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-021-00922-z

Keywords

Navigation