Skip to main content

A Novel Updating Scheme for Probabilistic Latent Semantic Indexing

  • Conference paper
Advances in Artificial Intelligence (SETN 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3955))

Included in the following conference series:

Abstract

Probabilistic Latent Semantic Indexing (PLSI) is a statistical technique for automatic document indexing. A novel method is proposed for updating PLSI when new documents arrive. The proposed method adds incrementally the words of any new document in the term-document matrix and derives the updating equations for the probability of terms given the class (i.e. latent) variables and the probability of documents given the latent variables. The performance of the proposed method is compared to that of the folding-in algorithm, which is an inexpensive, but potentially inaccurate updating method. It is demonstrated that the proposed updating algorithm outperforms the folding-in method with respect to the mean squared error between the aforementioned probabilities as they are estimated by the two updating methods and the original non-adaptive PLSI algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  2. Yates, R.B., Neto, B.R.: Modern Information Retrieval. ACM Press, New York (1999)

    Google Scholar 

  3. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal American Society of Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  4. Hofmann, T.: Probabilistic latent semantic analysis. In: Proc. Uncertainty in Artificial Intelligence, UAI 1999, Stockholm (1999)

    Google Scholar 

  5. Hofmann, T., Puzicha, J.: Unsupervised learning from dyadic data. Technical Report TR-98-042, International Computer Science Institute, Berkeley, CA (1998)

    Google Scholar 

  6. Saul, L., Pereira, F.: Aggregate and mixed-order Markov models for statistical language processing. In: Cardie, C., Weischedel, R. (eds.) Proc. 2nd Conf. Empirical Methods in Natural Language Processing, pp. 81–89. Association for Computational Linguistics, Somerset, New Jersey (1997)

    Google Scholar 

  7. Almpanidis, G., Kotropoulos, C.: Combining text and link analysis for focused crawling. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds.) ICAPR 2005. LNCS, vol. 3686, pp. 278–287. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Bartell, B.T., Cottrell, G.W., Belew, R.K.: Latent semantic indexing is an optimal special case of multidimensional scaling. In: Proc. Research and Development in Information Retrieval, pp. 161–167 (1992)

    Google Scholar 

  9. Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. Research and Development in Information Retrieval, pp. 50–57 (1999)

    Google Scholar 

  10. Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the em algorithm (with discussion). Journal Royal Statistical Society, Series B 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  11. Neal, R., Hinton, G.: A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models, 355–368 (1999)

    Google Scholar 

  12. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Machine Learning 42, 177–196 (2001)

    Article  MATH  Google Scholar 

  13. Berry, M.W., Browne, M.: Understanding Search Engines: Mathematical Modeling and Text Retrieval. SIAM, Philadelphia (1999)

    MATH  Google Scholar 

  14. Lang, K.: Newsweeder: Learning to filter netnews. In: Proc. 12th Int. Conf. Machine Learning, pp. 331–339 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kotropoulos, C., Papaioannou, A. (2006). A Novel Updating Scheme for Probabilistic Latent Semantic Indexing. In: Antoniou, G., Potamias, G., Spyropoulos, C., Plexousakis, D. (eds) Advances in Artificial Intelligence. SETN 2006. Lecture Notes in Computer Science(), vol 3955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752912_16

Download citation

  • DOI: https://doi.org/10.1007/11752912_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34117-8

  • Online ISBN: 978-3-540-34118-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics