Skip to main content

Topic Modeling over Text Streams from Social Media

  • Conference paper
  • First Online:
Text, Speech, and Dialogue (TSD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9924))

Included in the following conference series:

Abstract

Topic modeling becomes a popular research area which shows us new way to search, browse and summarize large amount of texts. Methods of topic modeling try to uncover the hidden thematic structure in document collections. Topic modeling in connection with social networks, which are one of the strongest communication tool and produces large amount of opinions and attitudes on world events, can be useful for analysis in case of crisis situations, elections, launching a new product on the market etc. For that reason we pro-pose a tool for topic modeling over text streams from social networks in this paper. Description of proposed tool is extended with practical experiments. Realized experiments shown promising results when using our tool on real data in comparison to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  MathSciNet  Google Scholar 

  2. Xie, P., Xing, E.: Integrating document clustering and topic modeling. In: Proceedings of 29th Conference Uncertainty in Artificial Intelligence, Bellevue, US, pp. 694–703 (2013)

    Google Scholar 

  3. Landauer, T.K., Foltz, P.W., Laham, D.: An introduction to latent semantic analysis. Discourse Process 25(2–3), 259–284 (1998)

    Article  Google Scholar 

  4. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of 15th Conference Uncertainty in Artificial Intelligence, Stockholm, Sweden, pp. 289–296 (1999)

    Google Scholar 

  5. Blei, D., Ng, A., Jordan, M.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 694–703 (2003)

    MATH  Google Scholar 

  6. Petterson, J., Buntine, W., Narayanamurthy, S., Caetano, T., Smola, A.: Word features for latent Dirichlet allocation. Adv. Neural Inf. Process. Syst. 23, 1921–1929 (2010)

    Google Scholar 

  7. Zhai, K., Boyd-Graber, J.: Online latent Dirichlet allocation with infine vocabulary. In: Proceedings of 30th International Conference on Machine Learning, Atlanta, US, pp. 561–569 (2013)

    Google Scholar 

  8. Teh, Y., Jordan, M., Beal, M., Blei, D.: Hierarchical Dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Li, X., Ouyang, J., Lu, Y.: Topic modeling for large-scale text data. Front. Electr. Electron. Eng. 16(6), 457–465 (2015)

    Google Scholar 

  10. Hoffman, M., Blei, D., Wang, C., Paisley, D.: Stochastic variational inference. J. Mach. Learn. Res. 14, 1303–1347 (2013)

    MathSciNet  MATH  Google Scholar 

  11. Phan, X., Nguyen, L., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of 17th International Conference on World Wide Web, Beijing, China, pp. 91–99 (2008)

    Google Scholar 

  12. Sridhar, V.: Unsupervised topic modeling for short texts using distributed representations of words. In: Proceedings of NAACL-HLT 2015, Denver, US, pp. 192–200 (2015)

    Google Scholar 

  13. Cheng, S., Yan, X., Lan, Y., Guo, J.: BTM - topic modeling over short texts. IEEE Trans. Knowl. Data Eng. 26(12), 2928–2941 (2014)

    Article  Google Scholar 

  14. Quan, X., Kit, C., Ge, Y., Pan, S.: Short and sparse text topic modeling via self-aggregation. In: Proceedings of 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina, pp. 2270–2276 (2015)

    Google Scholar 

  15. Blondel, V., Guillaume, J., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech. Theory Exp. 2008(10), P10008 (2008). (pp. 1–12)

    Article  Google Scholar 

  16. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the Web. Technical report, Stanford Digital Libraries (1998)

    Google Scholar 

  17. Yang, Y., Pedersen, J.: A comparative study of feature selection in text categorizations. In: Proceedings of 14th International Conference on Machine Learning, San Francisco, US, pp. 412–420 (1997)

    Google Scholar 

  18. Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)

    Book  MATH  Google Scholar 

  19. Pocs, J., Pocsova, J.: Basic theorem as representation of heterogeneous concept lattices. Front. Comput. Sci. 9(4), 636–642 (2015)

    Article  Google Scholar 

  20. Pocs, J., Pocsova, J.: Bipolarized extension of heterogeneous concept lattices. Appl. Math. Sci. 8(125–128), 6359–6365 (2014)

    Article  Google Scholar 

  21. Sarnovsky, M., Carnoka, N.: Distributed algorithm for text documents clustering based on k-means approach. Adv. Intell. Syst. Comput. 430, 165–174 (2016)

    Article  Google Scholar 

Download references

Acknowledgments

The work presented in this paper was supported by the Slovak VEGA grant 1/0493/16 and Slovak KEGA grant 025TUKE-4/2015.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Miroslav Smatana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Smatana, M., Paralič, J., Butka, P. (2016). Topic Modeling over Text Streams from Social Media. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech, and Dialogue. TSD 2016. Lecture Notes in Computer Science(), vol 9924. Springer, Cham. https://doi.org/10.1007/978-3-319-45510-5_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-45510-5_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-45509-9

  • Online ISBN: 978-3-319-45510-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics