Abstract
The success factor of sentimental analysis lies in identifying the most occurring and relevant opinions among users relating to the particular topic. In this paper, we develop a framework to analyze users’ sentiments on Twitter on natural disasters using the data pre-processing techniques and a hybrid of machine learning, statistical modeling, and lexicon-based approach. We choose TF-IDF and K-means for sentiment classification among affinitive and hierarchical clustering. Latent Dirichlet Allocation, a pipeline of Doc2Vec and K-means used to capture themes, then perform multi-level polarity indices classification and its time series analysis. In our study, we draw insights from 243,746 tweets for Kerala’s 2018 natural disasters in India. The key findings of the study are the classification of sentiments based on similarity and polarity indices and identifying themes among the topics discussed on Twitter. We observe different sets of emotions and influencers, among others. Through this case example of Kerala floods, it shows how the government and other organizations could track the positive/negative sentiments concerning time and location; gain a better understanding of the topic of discussion trending among the public, and collaborate with crucial Twitter users/influencers to spread and figure out the gaps in the implementation of schemes in terms of design and execution. This research’s uniqueness is the streamlined and efficient combination of algorithms and techniques embedded in the framework used in achieving the above output, which can be integrated into a platform with GUI for further automation.
Similar content being viewed by others
References
Abedin, B., & Babar, A. (2018). Institutional vs. non-institutional use of social media during emergency response: A case of Twitter in 2014 Australian bush fire. Information Systems Frontiers, 20(4), 729–740.
Alotaibi, F. S., & Gupta, V. (2018). A cognitive inspired unsupervised language-independent text stemmer for information retrieval. Cognitive Systems Research, 52, 291–300.
Araque, O., Corcuera-Platas, I., Sanchez-Rada, J. F., & Iglesias, C. A. (2017). Enhancing in-depth learning sentiment analysis with ensemble techniques in social applications. Expert Systems with Applications, 77, 236–246.
Arroyo-Fernández, I., Méndez-Cruz, C. F., Sierra, G., Torres-Moreno, J. M., & Sidorov, G. (2019). Unsupervised sentence representations as word information series: Revisiting TF–IDF. Computer Speech & Language, 56, 107–129.
Ben-Lhachemi, N., & Nfaoui, E. H. (2018). Using tweets embeddings for hashtag recommendation on twitter. Procedia Computer Science, 127, 7–15.
Bhuvana, N., & Aram, I. A. (2019). Facebook and Whatsapp as disaster management tools during the Chennai (India) floods of 2015. International Journal of Disaster Risk Reduction, 101135.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.
Bandyopadhyay, A., Ganguly, D., Mitra, M., Saha, S. K., & Jones, G. J. (2018). An embedding based IR model for disaster situations. Information Systems Frontiers, 20(5), 925–932.
Bouguettaya, A., Yu, Q., Liu, X., Zhou, X., & Song, A. (2015). Efficient agglomerative hierarchical clustering. Expert Systems with Applications, 42(5), 2785–2797.
Calabrese, B. (2018). Data Cleaning. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, 472.
Dehkharghani, R., Mercan, H., Javeed, A., & Saygin, Y. (2014). Sentimental causal rule discovery from twitter. Expert Systems with Applications, 41(10), 4950–4958.
Deveaud, R., SanJuan, E., & Bellot, P. (2014). Accurate and effective latent concept modeling for ad hoc information retrieval. Document numérique, 17(1), 61–84.
Fang, J., Hu, J., Shi, X., & Zhao, L. (2019). Assessing disaster impacts and response using social media data in China: A case study of 2016 Wuhan rainstorm. International Journal of Disaster Risk Reduction, 34, 275–282.
Fersini, E., Messina, E., & Pozzi, F. A. (2016). Expressive signals in social media languages to improve polarity detection. Information Processing & Management, 52(1), 20–35.
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.
Gerber, M. S. (2014). Predicting crime using twitter and kernel density estimation. Decision Support Systems, 61, 115–125.
Hong, L., Fu, C., Wu, J., & Frias-Martinez, V. (2018). Information needs and communication gaps between citizens and local governments online during natural disasters. Information Systems Frontiers, 20(5), 1027–1039.
Indian Express, 483-dead-in-Kerala-floods-and-landslides-losses-more-than-annual-plan-outlay-pinarayi-vijayan, 30 August 2018.
Kankanamge, N., Yigitcanlar, T., Goonetilleke, A., & Kamruzzaman, M. (2019). Determining disaster severity through social media analysis: Testing the methodology with south East Queensland flood tweets. International Journal of Disaster Risk Reduction, 101360.
Kapoor, K. K., Tamilmani, K., Rana, N. P., Patil, P., Dwivedi, Y. K., & Nerur, S. (2018). Advances in social media research: Past, present and future. Information Systems Frontiers, 20(3), 531–558.
Kastrati, Z., & Imran, A. S. (2019). Performance analysis of machine learning classifiers on improved concept vector space models. Future Generation Computer Systems, 96, 552–562.
Kauer, A. U., & Moreira, V. P. (2016). Using information retrieval for sentiment polarity prediction. Expert Systems with Applications, 61, 282–289.
Khan, F. H., Bashir, S., & Qamar, U. (2014). TOM: Twitter opinion mining framework using hybrid classification scheme. Decision Support Systems, 57, 245–257.
Kim, D., Seo, D., Cho, S., & Kang, P. (2019). Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Information Sciences, 477, 15–29.
Kogan, J., Teboulle, M., & Nicholas, C. (2005). Data driven similarity measures for k-means like clustering algorithms. Information Retrieval, 8(2), 331–349.
Kontopoulos, E., Berberidis, C., Dergiades, T., & Bassiliades, N. (2013). Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications, 40(10), 4065–4074.
Liu, F., & Xu, D. (2018). Social roles and consequences in using social media in disasters: A structurational perspective. Information Systems Frontiers, 20(4), 693–711.
Liu, X., Wang, G. A., Johri, A., Zhou, M., & Fan, W. (2014). Harnessing global expertise: A comparative study of expertise profiling methods for online communities. Information Systems Frontiers, 16(4), 715–727.
Lozano, M. G., Schreiber, J., & Brynielsson, J. (2017). Tracking geographical locations using a geo-aware topic model for analyzing social media data. Decision Support Systems, 99, 18–29.
Mondal, T., Pramanik, P., Bhattacharya, I., Boral, N., & Ghosh, S. (2018). Analysis and early detection of rumors in a post disaster scenario. Information Systems Frontiers, 20(5), 961–979.
Mora, K., Chang, J., Beatson, A., & Morahan, C. (2015). Public perceptions of building seismic safety following the Canterbury earthquakes: A qualitative analysis using twitter and focus groups. International Journal of Disaster Risk Reduction, 13, 1–9.
Nair, M. R., Ramya, G. R., & Sivakumar, P. B. (2017). Usage and analysis of twitter during 2015 Chennai flood towards disaster management. Procedia computer science, 115, 350–358.
NewScientist, Floods kill 350 people in Kerala, Volume 239, Issue 3192, 25 August 2018, https://doi.org/10.1016/S0262-4079(18)31500-8.
Nugent, R., Dean, N., & Ayers, E. (2010). Skill set profile clustering: The empty K-means algorithm with automatic specification of starting cluster centers.
Öztürk, N., & Ayvaz, S. (2018). Sentiment analysis on twitter: A text mining approach to the Syrian refugee crisis. Telematics and Informatics, 35(1), 136–147.
Pandey, A. C., Rajpoot, D. S., & Saraswat, M. (2017). Twitter sentiment analysis using hybrid cuckoo search method. Information Processing & Management, 53(4), 764–779.
Rudra, K., Sharma, A., Ganguly, N., & Imran, M. (2018). Classifying and summarizing information from microblogs during epidemics. Information Systems Frontiers, 20(5), 933–948.
Saif, H., He, Y., Fernandez, M., & Alani, H. (2016). Contextual semantics for sentiment analysis of twitter. Information Processing & Management, 52(1), 5–19.
Saleena, N. (2018). An ensemble classification system for twitter sentiment analysis. Procedia computer science, 132, 937–946.
Špeh, J., Muhic, A., & Rupnik, J. (2013). Parameter estimation for the latent dirichlet allocation, Proceedings of the Conference on Data Mining and Data Warehouses, Ljubljana, Slovenia, pp. 1–4.
Syed, S., & Spruit, M. (2017). Full-text or abstract? Examining topic coherence scores using latent dirichlet allocation. In 2017 IEEE international conference on data science and advanced analytics (DSAA) (pp. 165-174). IEEE.
Tang, H., Tan, S., & Cheng, X. (2009). A survey on sentiment detection of reviews. Expert Systems with Applications, 36(7), 10760–10773.
Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016). Visualizing large-scale and high-dimensional data. In Proceedings of the 25th international conference on world wide web (pp. 287-297). International world wide web conferences steering committee.
Tripathy, A., Agrawal, A., & Rath, S. K. (2015). Classification of sentimental reviews using machine learning techniques. Procedia Computer Science, 57, 821–829.
Vomfell, L., Härdle, W. K., & Lessmann, S. (2018). Improving crime count forecasts using twitter and taxi data. Decision Support Systems, 113, 73–85.
Wu, D., & Cui, Y. (2018). Disaster early warning and damage assessment analysis using social media data and geo-location information. Decision Support Systems, 111, 48–59.
Xing, F. Z., Pallucchini, F., & Cambria, E. (2019). Cognitive-inspired domain adaptation of sentiment lexicons. Information Processing & Management, 56(3), 554–564.
Yang, S., & Stewart, B. (2019). @ Houstonpolice: An exploratory case of twitter during hurricane Harvey. Online Information Review, 43(7), 1334–1351.
Yoo, S., Song, J., & Jeong, O. (2018). Social media contents based sentiment analysis and prediction system. Expert Systems with Applications, 105, 102–111.
Zahra, K., Imran, M., & Ostermann, F. O. (2020). Automatic identification of eyewitness messages on twitter during disasters. Information Processing & Management, 57(1), 102107.
Zhao, W. L., Deng, C. H., & Ngo, C. W. (2018). K-means: A revisit. Neurocomputing, 291, 195–206.
Zhang, J., & Piramuthu, S. (2018). Product recommendation with latent review topics. Information Systems Frontiers, 20(3), 617–625.
Zhang, L., Wu, Z., Bu, Z., Jiang, Y., & Cao, J. (2018). A pattern-based topic detection and analysis system on Chinese tweets. Journal of computational science, 28, 369–381.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mendon, S., Dutta, P., Behl, A. et al. A Hybrid Approach of Machine Learning and Lexicons to Sentiment Analysis: Enhanced Insights from Twitter Data of Natural Disasters. Inf Syst Front 23, 1145–1168 (2021). https://doi.org/10.1007/s10796-021-10107-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-021-10107-x