Abstract
Social networks such as Twitter, Facebook, and Sina microblogs have emerged as major sources for discovering and sharing the latest topics. Because social network topics change quickly, developing an effective method to model such topics is urgently needed. However, topic modeling is challenging due to the sparsity problem and the dynamic change of topics in microblog streams. In this study, we propose dynamic topic modeling via a self-aggregation method (SADTM) that can capture the time-varying aspect of topic distributions and resolve the sparsity problem. The SADTM aggregates the observable and unordered short texts as the aggregated document without leveraging an external context to overcome the sparsity problem of short text. Furthermore, we exploit word pairs instead of words for each microblog to generate more word co-occurrence patterns. The SADTM models temporal dynamics by using the topic distribution at previous time steps in microblog streams to infer the current topic from sequential data. Extensive experiments on a real-world Sina microblog dataset demonstrate that our SADTM algorithm outperforms several state-of-the-art methods in topic coherence and cluster quality. Additionally, when applied in a search scene, our SADTM significantly outperforms all baseline methods in terms of the quality of the search results.
Similar content being viewed by others
References
Yin H, Cui B, Chen L, et al. (2014) A temporal context-aware model for user behavior modeling in social media systems. In: Proceedings of the 2014 ACM SIGMOD international conference on management of data, pp 1543–1554
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
Rosen-Zvi M, Griffiths T, Steyvers M, et al. (2004) The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence, pp 487–494
Cheng X, Yan X, Lan Y, et al. (2014) BTM: topic modeling over short texts. IEEE Trans Knowl Data Eng 26(12):2928–2941
Zuo Y, Wu J, Zhang H, et al. (2016) Topic modeling of short texts: A pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 2105–2114
Wang Y, Liu J, Huang Y, et al. (2016) Using hashtag graph-based topic model to connect semantically-related words without co-occurrence in microblogs. IEEE Trans Knowl Data Eng 28(7):1919–1933
Liang S, Yilmaz E, Kanoulas E (2016) Dynamic clustering of streaming short documents. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 995–1004
Xu Z, Chen L, Dai Y, et al. (2017) A dynamic topic model and matrix factorization-based travel recommendation method exploiting ubiquitous data. IEEE Trans Multimed 19(8):1933–1945
Zhao Y, Liang S, Ren Z, et al. (2016) Explainable user clustering in short text streams. In: Proceedings of the 39th international ACM SIGIR conference on research and development in information retrieval, pp 155–164
Sasaki K, Yoshikawa T, Furuhashi T (2014) Online topic model for twitter considering dynamics of user interests and topic trends. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1977–1985
Liang S, Ren Z, Zhao Y, et al. (2017) Inferring dynamic user interests in streams of short texts for user clustering. ACM Trans Inf Syst 36(1):10–47
Liu S, Yin J, Ouyang J et al (2014) MB-ToT: an effective model for topic mining in microblogs. Appl Math Inf Sci 8(1):299–308
Lim KW, Buntine W (2014) Twitter opinion topic model: extracting product opinions from tweets by leveraging hashtags and sentiment lexicon. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 1319–1328
Zhang C, Sun J (2012) Large scale microblog mining using distributed MB-LDA. In: Proceedings of the 21st international conference on World Wide Web, pp 1035–1042
Lu HM, Lee CH (2015) The topic-over-time mixed membership model (TOT-MMM): a twitter hashtag recommendation model that accommodates for temporal clustering effects. IEEE Intell Syst 30(3):18–25
Lin T, Tian W, Mei Q, et al. (2014) The dual-sparse topic model:mining focused topics and focused terms in short text. In: International conference on World Wide Web, pp 539–550
Zuo Y, Zhao J, Xu K (2016) Word network topic model: a simple but general solution for short and imbalanced texts. Knowl Inf Syst 48(2):379–398
Yang Y, Wang F, Zhang J, et al. (2018) A topic model for co-occurring normal documents and short texts. World Wide Web 21(2):487–513
Liu H, Ge Y, Zheng Q, et al. (2018) Detecting global and local topics via mining twitter data. Neurocomputing 273:120–132
Li X, Li C, Chi J, et al. (2017) Short text topic modeling by exploring original documents. Knowl Inf Syst 2(1):1–20
Iwata T, Hirao T, Ueda N (2017) Topic models for unsupervised cluster matching. IEEE Trans Knowl Data Eng 30(4):786–795
Lu H, Xie LY, Kang N, et al. (2017) Don’t forget the quantifiable relationship between words: using recurrent neural network for short text topic discovery. In: Proceedings of AAAI-17, pp 1192–1198
Xun G, Gopalakrishnan V, Ma F et al (2016) Topic discovery for short texts using word embeddings. 2016 IEEE 16th international conference on data mining (ICDM), pp 1299-1304
Yin J, Wang J (2014) A dirichlet multinomial mixture model-based approach for short text clustering. In: ACM SIGKDD international conference on knowledge discovery and data mining, pp 233–242
Yin H, Cui B, Chen L, et al. (2015) Dynamic user modeling in social media systems. ACM Transactions on Information Systems (TOIS) 33(3):10–54
Hua T, Ning Y, Chen F, et al. (2016) Topical analysis of interactions between news and social media. In: Proceedings of the 13th AAAI conference on artificial intelligence, pp 2964–2971
Cha Y, Bi B, Hsieh CC, et al. (2013) Incorporating popularity in topic models for social network analysis. In: Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval, pp 223–232
Zhao F, Zhu Y, Jin H, et al. (2016) A personalized hashtag recommendation approach using LDA-based topic model in microblog environment. Futur Gener Comput Syst 65:196– 206
Alam MH, Ryu WJ, Lee SK (2017) Hashtag-based topic evolution in social media. World Wide Web 20(6):1527–1549
Griffiths TL, Steyvers M (2004) Finding scientific topics. Proc Natl Acad Sci 101(1):5228–5235
Mimno D, Wallach HM, Talley E, et al. (2011) Optimizing semantic coherence in topic models. In: proceedings of the conference on empirical methods in natural language processing. Association for computational linguistics, pp 262–272
Croft WB, Metzler D, Strohman T (2010) Search engines: information retrieval in practice. In: Reading: Addison-Wesley, pp 2010
Wei X, Croft WB (2006) LDA-based document models for ad-hoc retrieval. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval, pp 178–185
Acknowledgements
Supported by the National Natural Science Foundation of China under Grant (No.61320106006, No.61532006, No.61772083)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection: Special Issue on Big Data and Smart Computing in Network Systems
Guest Editors: Jiming Chen, Kaoru Ota, Lu Wang, and Jianping He
Rights and permissions
About this article
Cite this article
Shi, L., Du, J., Liang, M. et al. Dynamic topic modeling via self-aggregation for short text streams. Peer-to-Peer Netw. Appl. 12, 1403–1417 (2019). https://doi.org/10.1007/s12083-018-0692-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12083-018-0692-7