Skip to main content

Natural Language Processing and Information Retrieval

  • Conference paper
  • First Online:
Information Extraction (SCIE 1999)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1714))

Included in the following conference series:

Abstract

Information retrieval addresses the problem of finding those documents whose content matches a user’s request from among a large collection of documents. Currently, the most successful general purpose retrieval methods are statistical methods that treat text as little more than a bag of words. However, attempts to improve retrieval performance through more sophisticated linguistic processing have been largely unsuccessful. Indeed, unless done carefully, such processing can degrade retrieval effectiveness.

Several factors contribute to the dificulty of improving on a good statistical baseline including: the forgiving nature but broad coverage of the typical retrieval task; the lack of good weighting schemes for compound index terms; and the implicit linguistic processing inherent in the statistical methods. Natural language processing techniques may be more important for related tasks such as question answering or document summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sparck Jones, K., Willett, P. (eds.): Readings in Information Retrieval. Morgan Kaufmann, San Franciso (1997)

    Google Scholar 

  2. Salton, G. Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Communications of the ACM. 18 (1975) 613–620

    Article  MATH  Google Scholar 

  3. Sparck Jones, K.: Further Reflections on TREC. Information Processing and Management. (To appear.)

    Google Scholar 

  4. Sparck Jones, K.: What is the Role of NLP in Text Retrieval? In: Strzalkowski, T. (ed.): Natural Language Information Retrieval. Kluwer (In press.)

    Google Scholar 

  5. Perez-Carballo, J., Strzalkowski, T.: Natural Language Information Retrieval: Progress Report. Information Processing and Mangement. (To appear.)

    Google Scholar 

  6. D’Amore, R.J., Mah, C.P.: One-Time complete Indexing of Text: Theory and Practice. Proceedings of the Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press (1985) 155–164

    Google Scholar 

  7. Cormack, G.V., Clarke, C.L.A., Palmer, C.R., To, S.S.L.: Passage-Based Query Refinement. Information Processing and Management. (To appear.)

    Google Scholar 

  8. Strzalkowski, T.: NLP Track at TREC-5. Proceedings of the Fifth Text Retrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 97–101. Also at http://trec.nist.gov/pubs.html

  9. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. MIT Press (1998)

    Google Scholar 

  10. Voorhees, E.M.: Using WordNet to Disambiguate Word Senses for Text Retrieval. Proceedings of the Sixteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM Press (1993) 171–180

    Google Scholar 

  11. Voorhees, E.M.: Using WordNet for Text Retrieval. In: Fellbaum, C. (ed.): Word-Net: An Electronic Lexical Database. MIT Press (1998) 285–303

    Google Scholar 

  12. Rau, L.F.: Conceptual Information Extraction and Retrieval from Natural Language Input. In: Sparck Jones, K., Willett, P. (eds.): Readings in Information Retrieval. Morgan Kaufmann, San Franciso (1997) 527–533

    Google Scholar 

  13. Mauldin, M.L.: Retrieval Performance in FERRET. Proceedings of the Fourteenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. ACM Press (1991) 347–355

    Google Scholar 

  14. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society for Information Science. 41 (1990) 391–407

    Article  Google Scholar 

  15. Fox, E.A.: Extending the Boolean and Vector Space Models of Information Retrieval with P-Norm Queries and Multiple Concept Types. Unpublished doctoral dissertation, Cornell University, Ithaca, NY. University Microfilms, Ann Arbor, MI.

    Google Scholar 

  16. Sanderson, M.: Word Sense Disambiguation and Information Retrieval. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag (1994) 142–151

    Google Scholar 

  17. Krovetz, R., Croft, W.B.: Lexical Ambiguity in Information Retrieval. ACM Transactions on Information Systems. 10 (1992) 115–141

    Article  Google Scholar 

  18. Leacock, C., Towell, G., Voorhees, E.M.: Towards Building Contextual Representations of Word Senses Using Statistical Models. In: Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. MIT Press (1996) 98–113

    Google Scholar 

  19. Paik, W., Liddy, E.D., Yu, E., Mckenna, M.: Categorizing and Standardizing Proper Nouns for Efficient Information Retrieval. In:Boguraev, B., Pustejovsky, J. (eds.): Corpus Processing for Lexical Acquisition. MIT Press (1996) 61–73

    Google Scholar 

  20. Burger, J.D., Aberdeen, J.S., Palmer, D.D.: Information Retrieval and Trainable Natural Language Processing. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 433–435. Also at http://trec.nist.gov/pubs.html

  21. Hull, D.A., Grefenstette, G., Schulze, B.M., Gaussier, E., Schütze, H., Pedersen, J.O.: Xerox TREC-5 Site Report: Routing, Filtering, NLP, and Spanish Tracks Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 167–180. Also at http://trec.nist.gov/pubs.html

  22. Zhai, C., Tong, X., Mili0107;-Frayling, N., Evans, D.A.: Evaluation of Syntactic Phrase Indexing—CLARIT NLP Track Report. Proceedings of the Fifth Text Retrieval Conference (TREC-5).NIST Special Publication 500-238 (1997), 347–357. Also at http://trec.nist.gov/pubs.html

  23. Strzalkowski, T., Guthrie, L., Karlgren, J., Leistensnider, J., Lin, F., Perez-Carballo, J., Straszheim, T., Wang, J., Wilding, J.: Natural Language Information Retrieval: TREC-5 Report. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 291–313. Also at http://trec.nist.gov/pubs.html

  24. Taghva, K., Borsack, J., Condit, A.: Results of Applying Probabilistic IR to OCR Text. Proceedings of the Seventeenth Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval. Springer-Verlag, (1994) 202–211

    Google Scholar 

  25. Kantor, P.B., Voorhees, E.M.: Report on the TREC-5 Confusion Track. Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238 (1997), 65–74. Also at http://trec.nist.gov/pubs.html

  26. Garofolo, J., Voorhees, E.M., Auzanne, C.G.P., Stanford, V.M., Lund, B.A.: 1998 TREC-7 Spoken Document Retrieval Track Overview and Results. Proceedings of the Seventh Text REtrieval Conference (TREC-7). (In press.) Also at http://trec.nist.gov/pubs.html

  27. Buckley, C., Mitra M., Walz, J., Cardie, C.: Using Clustering and SuperConcepts Within SMART: TREC 6. Proceedings of the Sixth Text REtrieval Conference (TREC-6). NIST Special Publication 500-240 (1998), 107–124. Also at http://trec.nist.gov/pubs.html

  28. Mani, I., House, D., Klein, G., Hirschman, L., Obrst, L., Firmin, T., Chrzanowski, M., Sundheim, B.: The TIPSTER SUMMAC Text Summarization Evaluation Final Report. MITRE Technical Report MTR 98W0000138. McLean, Virginia (1998). Also at http://www.nist.gov/itl/div894/894.02/related_projects/tipster_summac/final_rpt.html

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Voorhees, E.M. (1999). Natural Language Processing and Information Retrieval. In: Pazienza, M.T. (eds) Information Extraction. SCIE 1999. Lecture Notes in Computer Science(), vol 1714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48089-7_3

Download citation

  • DOI: https://doi.org/10.1007/3-540-48089-7_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-66625-7

  • Online ISBN: 978-3-540-48089-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics