Abstract
Automatic topic discovery from natural language texts has been a challenging and widely studied problem. The ability to discover the topics present in a collection of text documents is essential for information systems. Topic discovery has been used to obtain a compact representation of documents for grouping, classification, and retrieval. Some tasks that can benefit from topic discovery: recommendation systems, tracking misinformation, writing summaries, and text clustering. However, topic discovery from Spanish texts has been somewhat neglected. For this reason, this work proposes analyzing the behavior of topic discovery tasks in texts in Spanish, specifically in tweets about the Mexican economy during the COVID-19 pandemic, under three different approaches. A comparison was conducted, achieving promising results because the topic coherence metric indicates coherent topics. The highest score of 1.22 was obtained using PLSA with 50 topics, concluding that the topics encompassed the study domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Agüero-Torales, M.M., Vilares, D., López-Herrera, A.G.: Discovering topics in twitter about the COVID-19 outbreak in Spain. Procesamiento del Lenguaje Natural 66, 177–190 (2021)
Älga, A., Eriksson, O., Nordberg, M.: Analysis of scientific publications during the early phase of the COVID-19 pandemic: topic modeling study. J. Med. Internet Res. 22(11), e21559 (2020)
Amara, A., Hadj Taieb, M.A., Ben Aouicha, M.: Multilingual topic modeling for tracking COVID-19 trends based on Facebook data analysis. Appl. Intell. 51(5), 3052–3073 (2021)
Anaya, L.H.: Comparing Latent Dirichlet Allocation and Latent Semantic Analysis as Classifiers. ERIC (2011)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bougteb, Y., Ouhbi, B., Frikh, B., et al.: Deep learning based topics detection. In: 2019 Third International Conference on Intelligent Computing in Data Sciences (ICDS), pp. 1–7. IEEE (2019)
Figuerola, C.G.: Applying topic modeling techniques to degraded texts: Spanish historical press during the transición (1977-1982). In: Proceedings of the Sixth International Conference on Technological Ecosystems for Enhancing Multiculturality, pp. 857–862 (2018)
Fuentes-Pineda, G., Meza-Ruiz, I.V.: Topic discovery in massive text corpora based on min-hashing. Expert Syst. Appl. 136, 62–72 (2019)
Heintz, I., et al.: Automatic extraction of linguistic metaphors with LDA topic modeling. In: Proceedings of the First Workshop on Metaphor in NLP, pp. 58–66 (2013)
Hernández, A.R., Lorenzo, M.M.G., Simón-Cuevas, A., Arco, L., Serrano-Guerrero, J.: A semantic approach for topic-based polarity detection: a case study in the Spanish language. Procedia Comput. Sci. 162, 849–856 (2019)
Lyu, J.C., Le Han, E., Luli, G.K.: COVID-19 vaccine-related discussion on twitter: topic modeling and sentiment analysis. J. Med. Internet Res. 23(6), e24435 (2021)
Mena, A., Reátegui, R.: Topic identification from Spanish unstructured health texts. In: Botto-Tobar, M., Montes León, S., Camacho, O., Chávez, D., Torres-Carrión, P., Zambrano Vizuete, M. (eds.) ICAT 2020. CCIS, vol. 1388, pp. 351–362. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-71503-8_27
Navarro-Colorado, B.: On poetic topic modeling: extracting themes and motifs from a corpus of Spanish poetry. Front. Digit. Humanit. 5, 15 (2018)
Padró, L., Stanilovsky, E.: Freeling 3.0: towards wider multilinguality. In: LREC 2012 (2012)
Saorín, T.: Wikipedia de la A a la W, vol. 8. Editorial UOC (2012)
Sha, H., Hasan, M.A., Mohler, G., Brantingham, P.J.: Dynamic topic modeling of the COVID-19 twitter narrative among US governors and cabinet executives. arXiv preprint arXiv:2004.11692 (2020)
Sridhar, V.K.R.: Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 192–200 (2015)
Xue, J., Chen, J., Chen, C., Zheng, C., Li, S., Zhu, T.: Public discourse and sentiment during the COVID 19 pandemic: using latent dirichlet allocation for topic modeling on twitter. PLoS ONE 15(9), e0239441 (2020)
Acknowledgment
The authors would like to thank Universidad Autónoma Metropolitana, Azcapotzalco. The present work has been funded by the research project SI001-18 at UAM Azcapotzalco, and by the Consejo Nacional de Ciencia y Tecnología (CONACYT) with the scholarship number 788155. The authors thankfully acknowledge computer resources, technical advice and sup-port provided by Laboratorio Nacional de Supercómputo del Sureste de México (LNS), a member of the CONACYT national laboratories, with project No 202103090C and partly by project VIEP 2021 at BUAP.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Lezama Sánchez, A.L., Tovar Vidal, M., Reyes-Ortiz, J.A. (2023). Topic Discovery About Economy During COVID-19 Pandemic from Spanish Tweets. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 3. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-18344-7_37
Download citation
DOI: https://doi.org/10.1007/978-3-031-18344-7_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18343-0
Online ISBN: 978-3-031-18344-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)