Skip to main content
Log in

A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

The success factor of sentimental analysis lies in identifying the most occurring and relevant opinions among users relating to the particular topic. In this paper, we develop a framework to analyze users’ sentiments on Twitter on natural disasters using the data pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach. We choose TF-IDF and K-means for sentiment classification among affinitive and hierarchical clustering. Latent Dirichlet Allocation, a pipeline of Doc2Vec and K-means used to capture themes, then perform multi-level polarity indices classification and its time series analysis. In our study, we draw insights from 243,746 tweets for Kerala’s 2018 natural disasters in India. The key findings of the study are the classification of sentiments based on similarity and polarity indices and identifying themes among the topics discussed on Twitter. We observe different sets of emotions and influencers, among others. Through this case example of Kerala floods, it shows how the government and other organizations could track the positive/negative sentiments concerning time and location; gain a better understanding of the topic of discussion trending among the public, and collaborate with crucial Twitter users/influencers to spread and figure out the gaps in the implementation of schemes in terms of design and execution. This research’s uniqueness is the streamlined and efficient combination of algorithms and techniques embedded in the framework used in achieving the above output, which can be integrated into a platform with GUI for further automation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Similar content being viewed by others

References

  • Abedin, B., & Babar, A. (2018). Institutional vs. non-institutional use of social media during emergency response: A case of Twitter in 2014 Australian bush fire. Information Systems Frontiers, 20(4), 729–740.

  • Alotaibi, F. S., & Gupta, V. (2018). A cognitive inspired unsupervised language-independent text stemmer for information retrieval. Cognitive Systems Research, 52, 291–300.

    Article  Google Scholar 

  • Araque, O., Corcuera-Platas, I., Sanchez-Rada, J. F., & Iglesias, C. A. (2017). Enhancing in-depth learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77, 236–246.

    Article  Google Scholar 

  • Arroyo-Fernández, I., Méndez-Cruz, C. F., Sierra, G., Torres-Moreno, J. M., & Sidorov, G. (2019). Unsupervised sentence representations as word information series: Revisiting TF–IDF. Computer Speech & Language, 56, 107–129.

    Article  Google Scholar 

  • Ben-Lhachemi, N., & Nfaoui, E. H. (2018). Using tweets embeddings for hashtag recommendation on twitter. Procedia Computer Science, 127, 7–15.

    Article  Google Scholar 

  • Bhuvana, N., & Aram, I. A. (2019). Facebook and Whatsapp as disaster management tools during the Chennai (India) floods of 2015. International Journal of Disaster Risk Reduction, 101135.

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

    Google Scholar 

  • Bandyopadhyay, A., Ganguly, D., Mitra, M., Saha, S. K., & Jones, G. J. (2018). An embedding based IR model for disaster situations. Information Systems Frontiers, 20(5), 925–932.

    Article  Google Scholar 

  • Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., & Song, A. (2015). Efficient agglomerative hierarchical clustering. Expert Systems with Applications, 42(5), 2785–2797.

    Article  Google Scholar 

  • Calabrese, B. (2018). Data Cleaning. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, 472.

  • Dehkharghani, R., Mercan, H., Javeed, A., & Saygin, Y. (2014). Sentimental causal rule discovery from twitter. Expert Systems with Applications, 41(10), 4950–4958.

    Article  Google Scholar 

  • Deveaud, R., SanJuan, E., & Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique, 17(1), 61–84.

    Article  Google Scholar 

  • Fang, J., Hu, J., Shi, X., & Zhao, L. (2019). Assessing disaster impacts and response using social media data in China: A case study of 2016 Wuhan rainstorm. International Journal of Disaster Risk Reduction, 34, 275–282.

    Article  Google Scholar 

  • Fersini, E., Messina, E., & Pozzi, F. A. (2016). Expressive signals in social media languages to improve polarity detection. Information Processing & Management, 52(1), 20–35.

    Article  Google Scholar 

  • Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.

    Article  Google Scholar 

  • Gerber, M. S. (2014). Predicting crime using twitter and kernel density estimation. Decision Support Systems, 61, 115–125.

    Article  Google Scholar 

  • Hong, L., Fu, C., Wu, J., & Frias-Martinez, V. (2018). Information needs and communication gaps between citizens and local governments online during natural disasters. Information Systems Frontiers, 20(5), 1027–1039.

    Article  Google Scholar 

  • Indian Express, 483-dead-in-Kerala-floods-and-landslides-losses-more-than-annual-plan-outlay-pinarayi-vijayan, 30 August 2018.

  • Kankanamge, N., Yigitcanlar, T., Goonetilleke, A., & Kamruzzaman, M. (2019). Determining disaster severity through social media analysis: Testing the methodology with south East Queensland flood tweets. International Journal of Disaster Risk Reduction, 101360.

  • Kapoor, K. K., Tamilmani, K., Rana, N. P., Patil, P., Dwivedi, Y. K., & Nerur, S. (2018). Advances in social media research: Past, present and future. Information Systems Frontiers, 20(3), 531–558.

    Article  Google Scholar 

  • Kastrati, Z., & Imran, A. S. (2019). Performance analysis of machine learning classifiers on improved concept vector space models. Future Generation Computer Systems, 96, 552–562.

    Article  Google Scholar 

  • Kauer, A. U., & Moreira, V. P. (2016). Using information retrieval for sentiment polarity prediction. Expert Systems with Applications, 61, 282–289.

    Article  Google Scholar 

  • Khan, F. H., Bashir, S., & Qamar, U. (2014). TOM: Twitter opinion mining framework using hybrid classification scheme. Decision Support Systems, 57, 245–257.

    Article  Google Scholar 

  • Kim, D., Seo, D., Cho, S., & Kang, P. (2019). Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Information Sciences, 477, 15–29.

    Article  Google Scholar 

  • Kogan, J., Teboulle, M., & Nicholas, C. (2005). Data driven similarity measures for k-means like clustering algorithms. Information Retrieval, 8(2), 331–349.

    Article  Google Scholar 

  • Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications, 40(10), 4065–4074.

    Article  Google Scholar 

  • Liu, F., & Xu, D. (2018). Social roles and consequences in using social media in disasters: A structurational perspective. Information Systems Frontiers, 20(4), 693–711.

    Article  Google Scholar 

  • Liu, X., Wang, G. A., Johri, A., Zhou, M., & Fan, W. (2014). Harnessing global expertise: A comparative study of expertise profiling methods for online communities. Information Systems Frontiers, 16(4), 715–727.

    Article  Google Scholar 

  • Lozano, M. G., Schreiber, J., & Brynielsson, J. (2017). Tracking geographical locations using a geo-aware topic model for analyzing social media data. Decision Support Systems, 99, 18–29.

    Article  Google Scholar 

  • Mondal, T., Pramanik, P., Bhattacharya, I., Boral, N., & Ghosh, S. (2018). Analysis and early detection of rumors in a post disaster scenario. Information Systems Frontiers, 20(5), 961–979.

    Article  Google Scholar 

  • Mora, K., Chang, J., Beatson, A., & Morahan, C. (2015). Public perceptions of building seismic safety following the Canterbury earthquakes: A qualitative analysis using twitter and focus groups. International Journal of Disaster Risk Reduction, 13, 1–9.

    Article  Google Scholar 

  • Nair, M. R., Ramya, G. R., & Sivakumar, P. B. (2017). Usage and analysis of twitter during 2015 Chennai flood towards disaster management. Procedia computer science, 115, 350–358.

    Article  Google Scholar 

  • NewScientist, Floods kill 350 people in Kerala, Volume 239, Issue 3192, 25 August 2018, https://doi.org/10.1016/S0262-4079(18)31500-8.

  • Nugent, R., Dean, N., & Ayers, E. (2010). Skill set profile clustering: The empty K-means algorithm with automatic specification of starting cluster centers.

  • Öztürk, N., & Ayvaz, S. (2018). Sentiment analysis on twitter: A text mining approach to the Syrian refugee crisis. Telematics and Informatics, 35(1), 136–147.

    Article  Google Scholar 

  • Pandey, A. C., Rajpoot, D. S., & Saraswat, M. (2017). Twitter sentiment analysis using hybrid cuckoo search method. Information Processing & Management, 53(4), 764–779.

    Article  Google Scholar 

  • Rudra, K., Sharma, A., Ganguly, N., & Imran, M. (2018). Classifying and summarizing information from microblogs during epidemics. Information Systems Frontiers, 20(5), 933–948.

    Article  Google Scholar 

  • Saif, H., He, Y., Fernandez, M., & Alani, H. (2016). Contextual semantics for sentiment analysis of twitter. Information Processing & Management, 52(1), 5–19.

    Article  Google Scholar 

  • Saleena, N. (2018). An ensemble classification system for twitter sentiment analysis. Procedia computer science, 132, 937–946.

    Article  Google Scholar 

  • Špeh, J., Muhic, A., & Rupnik, J. (2013). Parameter estimation for the latent dirichlet allocation, Proceedings of the Conference on Data Mining and Data Warehouses, Ljubljana, Slovenia, pp. 1–4.

  • Syed, S., & Spruit, M. (2017). Full-text or abstract? Examining topic coherence scores using latent dirichlet allocation. In 2017 IEEE international conference on data science and advanced analytics (DSAA) (pp. 165-174). IEEE.

  • Tang, H., Tan, S., & Cheng, X. (2009). A survey on sentiment detection of reviews. Expert Systems with Applications, 36(7), 10760–10773.

    Article  Google Scholar 

  • Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016). Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web (pp. 287-297). International world wide web conferences steering committee.

  • Tripathy, A., Agrawal, A., & Rath, S. K. (2015). Classification of sentimental reviews using machine learning techniques. Procedia Computer Science, 57, 821–829.

    Article  Google Scholar 

  • Vomfell, L., Härdle, W. K., & Lessmann, S. (2018). Improving crime count forecasts using twitter and taxi data. Decision Support Systems, 113, 73–85.

    Article  Google Scholar 

  • Wu, D., & Cui, Y. (2018). Disaster early warning and damage assessment analysis using social media data and geo-location information. Decision Support Systems, 111, 48–59.

    Article  Google Scholar 

  • Xing, F. Z., Pallucchini, F., & Cambria, E. (2019). Cognitive-inspired domain adaptation of sentiment lexicons. Information Processing & Management, 56(3), 554–564.

    Article  Google Scholar 

  • Yang, S., & Stewart, B. (2019). @ Houstonpolice: An exploratory case of twitter during hurricane Harvey. Online Information Review, 43(7), 1334–1351.

    Article  Google Scholar 

  • Yoo, S., Song, J., & Jeong, O. (2018). Social media contents based sentiment analysis and prediction system. Expert Systems with Applications, 105, 102–111.

    Article  Google Scholar 

  • Zahra, K., Imran, M., & Ostermann, F. O. (2020). Automatic identification of eyewitness messages on twitter during disasters. Information Processing & Management, 57(1), 102107.

    Article  Google Scholar 

  • Zhao, W. L., Deng, C. H., & Ngo, C. W. (2018). K-means: A revisit. Neurocomputing, 291, 195–206.

    Article  Google Scholar 

  • Zhang, J., & Piramuthu, S. (2018). Product recommendation with latent review topics. Information Systems Frontiers, 20(3), 617–625.

    Article  Google Scholar 

  • Zhang, L., Wu, Z., Bu, Z., Jiang, Y., & Cao, J. (2018). A pattern-based topic detection and analysis system on Chinese tweets. Journal of computational science, 28, 369–381.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pankaj Dutta.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mendon, S., Dutta, P., Behl, A. et al. A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters. Inf Syst Front 23, 1145–1168 (2021). https://doi.org/10.1007/s10796-021-10107-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-021-10107-x

Keywords

Navigation