Skip to main content

Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling

  • Conference paper
  • First Online:
Analysis of Images, Social Networks and Texts (AIST 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 661))

Abstract

In this paper we present a new asynchronous algorithm for learning additively regularized topic models and discuss the main architectural details of our implementation. The key property of the new algorithm is that it behaves in a fully deterministic fashion, which is typically hard to achieve in a non-blocking parallel implementation. The algorithm had been recently implemented in the BigARTM library (http://bigartm.org). Our new algorithm is compatible with all features previously introduced in BigARTM library, including multimodality, regularizers and scores calculation. While the existing BigARTM implementation compares favorably with alternative packages such as Vowpal Wabbit or Gensim, the new algorithm brings further improvements in CPU utilization, memory usage, and spends even less time to achieve the same perplexity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://archive.ics.uci.edu/ml/datasets/Bag+of+Words.

References

  1. Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77–84 (2012)

    Article  Google Scholar 

  2. Daud, A., Li, J., Zhou, L., Muhammad, F.: Knowledge discovery through directed probabilistic topic models: a survey. Front. Comput. Sci. Chin 4(2), 280–301 (2010)

    Article  Google Scholar 

  3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  4. Yuan, J., Gao, F., Ho, Q., Dai, W., Wei, J., Zheng, X., Xing, E.P., Liu, T.Y., Ma, W.Y.: LightLDA: big topic models on modest computer clusters. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1351–1361 (2015)

    Google Scholar 

  5. Hofmann, T.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57 (1999)

    Google Scholar 

  6. Hoffman, M.D., Blei, D.M., Bach, F.R.: Online learning for latent Dirichlet allocation. In: NIPS, pp. 856–864. Curran Associates Inc. (2010)

    Google Scholar 

  7. Newman, D., Asuncion, A., Smyth, P., Welling, M.: Distributed algorithms for topic models. J. Mach. Learn. Res. 10, 1801–1828 (2009)

    MathSciNet  MATH  Google Scholar 

  8. Wang, Y., Bai, H., Stanton, M., Chen, W.-Y., Chang, E.Y.: PLDA: parallel latent Dirichlet allocation for large-scale applications. In: Goldberg, A.V., Zhou, Y. (eds.) AAIM 2009. LNCS, vol. 5564, pp. 301–314. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02158-9_26

    Chapter  Google Scholar 

  9. Liu, Z., Zhang, Y., Chang, E.Y., Sun, M.: PLDA+: parallel latent Dirichlet allocation with data placement and pipeline processing. ACM Trans. Intell. Syst. Technol. 2(3), 26:1–26:18 (2011)

    Article  Google Scholar 

  10. Seide, F., Fu, H., Droppo, J., Li, G., Yu, D.: On parallelizability of stochastic gradient descent for speech DNNs. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 235–239. IEEE (2014)

    Google Scholar 

  11. Dean, J., Corrado, G.S., Monga, R., Chen, K., Devin, M., Le, Q.V., Mao, M.Z., Ranzato, M.-A., Senior, A., Tucker, P., Yang, K., Ng, A.Y.: Large scale distributed deep networks. In: NIPS, pp. 1223–1231 (2012)

    Google Scholar 

  12. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50, Valletta, Malta, May 2010

    Google Scholar 

  13. Langford, J., Li, L., Strehl, A.: Vowpal wabbit open source project. Technical report. Yahoo! (2007)

    Google Scholar 

  14. McCallum, A.K.: A Machine Learning for Language Toolkit (2002). http://mallet.cs.umass.edu

  15. Smola, A., Narayanamurthy, S.: An architecture for parallel topic models. Proc. VLDB Endow. 3(1–2), 703–710 (2010)

    Article  Google Scholar 

  16. Vorontsov, K.V.: Additive regularization for topic models of text collections. Dokl. Math. 89(3), 301–304 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  17. Vorontsov, K.V., Potapenko, A.A.: Additive regularization of topic models. Mach. Learn. Spec. Issue Data Anal. Intell. Optim. 101(1), 303–323 (2015)

    MathSciNet  MATH  Google Scholar 

  18. Vorontsov, K., Frei, O., Apishev, M., Romov, P., Dudarenko, M.: BigARTM: open source library for regularized multimodal topic modeling of large collections. In: Khachay, M.Y., Konstantinova, N., Panchenko, A., Ignatov, D.I., Labunets, V.G. (eds.) AIST 2015. CCIS, vol. 542, pp. 370–381. Springer, Heidelberg (2015). doi:10.1007/978-3-319-26123-2_36

    Chapter  Google Scholar 

  19. Vorontsov, K., Frei, O., Apishev, M., Romov, P., Suvorova, M., Yanina, A.: Non-Bayesian additive regularization for multimodal topic modeling of large collections. In: Proceedings of the 2015 Workshop on Topic Models: Post-Processing and Applications, pp. 29–37. ACM, New York (2015)

    Google Scholar 

Download references

Acknowledgements

The work was supported by Russian Science Foundation (grant 15-18-00091). Also we would like to thank Prof. K. V. Vorontsov for constant attention to our research and detailed feedback to this paper.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Oleksandr Frei or Murat Apishev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Frei, O., Apishev, M. (2017). Parallel Non-blocking Deterministic Algorithm for Online Topic Modeling. In: Ignatov, D., et al. Analysis of Images, Social Networks and Texts. AIST 2016. Communications in Computer and Information Science, vol 661. Springer, Cham. https://doi.org/10.1007/978-3-319-52920-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-52920-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-52919-6

  • Online ISBN: 978-3-319-52920-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics