Skip to main content

Inferring Probability of Relevance Using the Method of Logistic Regression

  • Conference paper
SIGIR ’94

Abstract

This research evaluates a model for probabilistic text and document retrieval; the model utilizes the technique of logistic regression to obtain equations which rank documents by probability of relevance as a function of document and query properties. Since the model infers probability of relevance from statistical clues present in the texts of documents and queries, we call it logistic inference. By transforming the distribution of each statistical clue into its standardized distribution (one with mean μ = 0 and standard deviation σ = 1), the method allows one to apply logistic coefficients derived from a training collection to other document collections, with little loss of predictive power. The model is applied to three well-known information retrieval test collections, and the results are compared directly to the particular vector space model of retrieval which uses term-frequency/inverse-document-frequency (tfidf) weighting and the cosine similarity measure. In the comparison, the logistic inference method performs significantly better than (in two collections) or equally well as (in the third collection) the tfidf/cosine vector space model. The differences in performances of the two models were subjected to statistical tests to see if the differences are statistically significant or could have occurred by chance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salton G et al. The SMART retrieval system: Experiments in automatic document processing. Prentice-Hall, Englewood Cliffs, NJ, 1971

    Google Scholar 

  2. Salton G. Text processing: the transformation, analysis and retrieval of information by computer. Addison Wesley, Reading, MA-Menlo Park, CA, 1989

    Google Scholar 

  3. Salton G, McGill M. Introduction to modern information retrieval. McGraw-Hill, New York, 1983

    MATH  Google Scholar 

  4. Sparck-Jones K. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 1972; 28: 11–21

    Article  Google Scholar 

  5. Salton G Buckley C. Term weighting approaches in automatic text retrieval. Information Processing and Management 1988; 24: 513–523

    Article  Google Scholar 

  6. Robertson, S. The probability ranking principle in IR. Journal of Documentation 1977; 33: 294–304

    Article  Google Scholar 

  7. Robertson S Sparck-Jones K. Relevance weighting of search terms. Journal of the ASIS 1976; 27: 129–145

    Google Scholar 

  8. Cooper W. Inconsistencies and misnomers in probabilistic IR. In: Proceedings of the Fourteenth Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, Chicago, III, Oct 13–16, 1991, pp 57–61

    Google Scholar 

  9. Fuhr N Huther H. Optimum probability estimation from empirical distributions. Information Processing and Management 1989; 25: 493–507

    Article  Google Scholar 

  10. Hosmer D Lemeshow S. Applied logistic regression. John Wiley & Sons, New York, 1989

    Google Scholar 

  11. Fox E. Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types. PhD dissertation, Computer Science, Cornell University, 1983

    Google Scholar 

  12. Fuhr N. Optimal polynomial retrieval functions based on the probability ranking principle. ACM Transactions on Informations Systems 1989; 7: 183–204

    Article  Google Scholar 

  13. Fuhr N Buckley C. A probabilistic learning approach for document indexing. ACM Transactions on Informations Systems 1991 9: 223–248

    Article  Google Scholar 

  14. Haines D Croft B. Relevance feedback and inference networks. Proceedings of the 1993 SIGIR International Conference on Information Retrieva 1, Pittsburgh, Pa, June 27-July I, 1993, pp 2–12

    Google Scholar 

  15. Turtle H. Inference networks for document retrieval. PhD Dissertation, University of Massachusetts, COINS Technical Report 90–92, February, 1991

    Google Scholar 

  16. Fung R Crawford S Appelbaum L Tong R. An architecture for probabilistic concept-bases information retrieval. In: Proceedings of the 13th international conference on research and development in information retrieval. Brussels, Belgium, September 5–7, 1990, pp. 455–467

    Google Scholar 

  17. Swanson D. Information retrieval as a trial-and-error process. Library Quarterly 1977; 47: 128–148

    Article  Google Scholar 

  18. Hull D. Using statistical testing in the evaluation of retrieval experiments. Proceedings of the 1993 SIGIR international conference on information retrieval. Pittsburgh, Pa, June 27-July 1, 1993, pp. 329–338

    Google Scholar 

  19. Yu C Buckley C Lam H Salton G. A generalized term dependence model in information retrieval. Information Technology: Research and Development 1983; 2: 129–154

    Google Scholar 

  20. Cooper W Gey F Chen A. Information retrieval from the TIPSTER collection: an application of staged logistic regression. In: Proceedings of the First NIST Text Retrieval Conference, National Institute for Standards and Technology, Washington, DC, November 4–6, 1992, NIST Special Publication 500–207, March 1993, pp 73–88

    Google Scholar 

  21. Harman, D. Overview of the first TREC conference. In: Proceedings of the 1993 SIGIR international conference on information retrieva I, Pittsburgh, Pa, June 27-July 1, 1993, pp 36–47

    Google Scholar 

  22. Gey F. Probabilistic dependence and logistic inference in information retrieval. PhD dissertation, University of California, Berkeley, 1993

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag London Limited

About this paper

Cite this paper

Gey, F.C. (1994). Inferring Probability of Relevance Using the Method of Logistic Regression. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_23

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2099-5_23

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-19889-5

  • Online ISBN: 978-1-4471-2099-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics