Skip to main content

Short query linguistic expansion techniques: Palliating one-word queries by providing intermediate structure to text

  • Conference paper
  • First Online:
Information Extraction A Multidisciplinary Approach to an Emerging Information Technology (SCIE 1997)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1299))

Included in the following conference series:

Abstract

The usual approach to finding information on the WWW via existing Web browsers is to use a one or two word query. Browsers return a number of documents containing these words, and the user examines those documents, or their abstracts, sees how the word or words in their query are being used and alters their initial query accordingly. This contrasts markedly with the Information Retrieval models explored by researchers over the past thirty-five years. These models were designed for longer queries and do not provide an adequate response to the user needs. On the other hand, recent advances in natural language processing permit the extraction of typed information that is axed on one or two words. We review a selection of this typed information and describe how it could be used to present an intermediate structure for the user fitting between their short queries and the documents found in a heterogeneous text collection such as the WWW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 29.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 39.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Salah Ait-Mokhtar and Jean-Pierre Chanod. Incremental finite-state parsing. In ANLP'97, pages 72–79, Washington, 1997.

    Google Scholar 

  2. D.C. Blair and M.E. Maron. An evaluation of retrieval effectiveness. Communications of the ACM, 28:289–299, 1985.

    Article  Google Scholar 

  3. C. Borkowski. An experimental system for the automatic identification of personal names and personal titles in newspaper texts. American Documentation, 18:131, July 1967.

    Article  Google Scholar 

  4. Eric Brill. A simple Rule-Based part of speech tagger. In Proceedings of the Third conference on Applied Natural Language Processing, Trento, Italy, 1992. ACL.

    Google Scholar 

  5. Ted Briscoe, Greg Grefenstette, Lluís Padró, and Iskander Serai. Hybrid techniques for training hmm part-of-speech tagger. Technical Report MLTT-007, Rank Xerox Research Centre, 1994.

    Google Scholar 

  6. Chris Buckley, Amit Singhal, and Mindhar Mitra. New retrieval approaches using smart: Trec4. In D.K. Harman, editor, The Fourth Text Retrieval Conference (TREC-4), pages 25–48. U.S. Department of Commerce, 1996. NIST Special Publication 500–236.

    Google Scholar 

  7. John Carroll and Ted Briscoe. The derivation of a large computational lexicon for english from ldoce. In B. Boguraev and T. Briscoe, editors, Computational Lexicography for Natural Language Processing, London, 1989. Longman.

    Google Scholar 

  8. J.P. Chanod and P. Tapanainen. Creating a tagset, lexicon and guesser for a french tagger. In Proceedings of the A CL SIGDAT Workshop, Dublin, Ireland, 1995.

    Google Scholar 

  9. Eugene Charniak. Statistical Language Learning. MIT Press, Cambridge, Mass, 1993.

    Google Scholar 

  10. Fah-Chun Cheong. Internet Agents: Spiders, Wanderers, Brokers and 'Bots. New Riders Publishing, Indianapolis, 1996.

    Google Scholar 

  11. Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, March 1990.

    Google Scholar 

  12. Cyril W. Cleverdon. The significance of the cranfield tests on index languages. In A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, editors, Proceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pages 3–131, New York, Oct 13–16 1991. SIGIR'91, Association for Computing Machinery. Special issue of the SIGIR Forum.

    Google Scholar 

  13. Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. A practical part-of-speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing, April 1992.

    Google Scholar 

  14. Douglas Cutting, Jan O. Pedersen, David Karger, and John W. Tukey. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proceedings of SIGIR'92, pages 318–329, Copenhagen, Denmark, June 21–24 1992. ACM.

    Google Scholar 

  15. Steven J. DeRose. Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1):31–39, Winter 1988.

    Google Scholar 

  16. David D. Donaldson. Internal and external evidence in the identification and semantic categorization of proper names. In B. Boguraev and J. Pustejovsky, editors, Proceedings of the SIGLEX Workshop on Acquisition of Lexical Knowledge from Text, pages 32–43, Columbus, OH, 1993.

    Google Scholar 

  17. Lauren B. Doyle. Semantic road maps for literature searchers. Journal of the ACM, 8(4):553–578, October 1961.

    Article  Google Scholar 

  18. G. Grefenstette. Light parsing as finite state filtering. In Workshop on Extended finite state models of language, Budapest, Hungary, Aug 11–12 1996. ECAI'96.

    Google Scholar 

  19. G. Grefenstette and F. Segond. Multilingual natural language processing. International Corpus of Corpus Linguistics, 2(1), 1997.

    Article  Google Scholar 

  20. G. Grefenstette and P. Tapanainen. What is a word, what is a sentence? Problems of tokenization. In 3rd Conference on Computational Lexicography and Text Research, Budapest, Hungary, 7–10 July 1994. COMPLEX'94. http://www.rxrc.xerox.com/publis/mltt/mltt-004.ps.

    Google Scholar 

  21. Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston, 1994.

    Book  Google Scholar 

  22. Gregory Grefenstette. Corpus-derived first, second and third-order word affinities. In Sixth Euralex International Congress, Amsterdam, Aug 3–Sept 3, 1994.

    Google Scholar 

  23. Gregory Grefenstette. Comparing two language identification schemes. In Proceedings of the 3rd International Conference on the Statistical Analysis of Textual Data, JADT'95, Rome, Dec 11–13, 1995.

    Google Scholar 

  24. Gregory Grefenstette, Ulrich Heid, and Thierry Fontenelle. The DECIDE; project: Multilingual collocation extraction. In Seventh Euralex International Congress, University of Gothenburg, Sweden, Aug 13–18, 1996.

    Google Scholar 

  25. Donna Harman. Relevance feedback revisited. In Proceedings of SIGIR'92, Copenhagen, Denmark, June 21–24 1992. ACM.

    Google Scholar 

  26. Donna Harman, editor. The First Text REtrieval Conference (TREC-1). U.S. Government Printing Office, Washington, 1993. NIST Special Publication 500207.

    Google Scholar 

  27. Marti A. Hearst, David Karger, and Jan O. Pedersen. Scatter/gather as a tool for the navigation of retrieval results. In Robin Burke, editor, Working Notes of the AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, Cambridge, MA, November 1995. AAAI.

    Google Scholar 

  28. David A. Hull. Stemming algorithms: A case study for detailed evaluation. JASIS, 47(1), January 1996. Special Issue on the Evaluation of Information Retrieval systems.

    Google Scholar 

  29. L. Karttunen, J.P Chanod, G. Grefenstette, and A. Schiller. Regular expression for language engineering. Journal of Natural Language Engineering, 1997.

    Google Scholar 

  30. Lauri Karttunen. Finite-state lexicon compiler. Technical Report ISTL-NLTT1993-04-02, Xerox, Palo Alto Research Center, April 1993.

    Google Scholar 

  31. Lauri Karttunen. Directed replacement. In Proceedings of the 34rd Annual Meeting of the A CL, Santa Cruz, CA, 1996.

    Google Scholar 

  32. K. L. Kwok. A new method for weighting query terms for ad-hoc retrieval. In Proc. of the 19th ACMISIGIR Conference, pages 187–196, 1996.

    Google Scholar 

  33. X. A. Lu and R. B. Keefer. Query expansion/reduction and its impact on information retrieval effectiveness. In Donna Harman, editor, The Thirs Text REtrieval Conference (TREC-3), pages 231–239, Washington, 1995. U.S. Government Printing Office. NIST Special Publication 500-225.

    Google Scholar 

  34. M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.

    Article  Google Scholar 

  35. G. Russell, S. Pulman, G. Ritchie, and A. Black. A dictionary and morphological analyser for english. In 11th International Conference on Computational Linguistics, pages 277–279, Bonn, Germany, 1987.

    Google Scholar 

  36. Gerard Salton. A note on information retrieval models. In RIAO'85, pages 2–27, Grenoble, France, March 18–20 1985. CID, Paris, and IMAG.

    Google Scholar 

  37. Gerard Salton and M. McGill. An Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.

    MATH  Google Scholar 

  38. Anne Schiller. Multilingual finite-state noun phrase extraction. In Workshop on Extended finite state models of language, Budapest, Hungary, Aug 11–12 1996. ECAI'96.

    Google Scholar 

  39. Anne Schiller. Multilingual part-of-speech tagging and noun phrase mark-up. In 15th European Conference on Grammar and Lexicon of Romance Languages, University of Munich, Sept 19–21 1996.

    Google Scholar 

  40. Pasi Tapanainen. RXRC finite-state compiler. Technical Report MLTT-020, Rank Xerox Research Centre, Grenoble, April 1995.

    Google Scholar 

  41. Atro Voutilainen, Julia Heikkila, and Arto Anttila. A lexicon and constraint grammar of english. In Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, France, July 1992. COLING'92.

    Google Scholar 

  42. Beatrice Warren, editor. Semantic Patterns of Noun-Noun Compounds. Acta Universitatis Gothoburgensis, Goteborg, Sweden, 1978. Gothenburg Studies in English, 41.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Maria Teresa Pazienza

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grefenstette, G. (1997). Short query linguistic expansion techniques: Palliating one-word queries by providing intermediate structure to text. In: Pazienza, M.T. (eds) Information Extraction A Multidisciplinary Approach to an Emerging Information Technology. SCIE 1997. Lecture Notes in Computer Science, vol 1299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63438-X_6

Download citation

  • DOI: https://doi.org/10.1007/3-540-63438-X_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63438-6

  • Online ISBN: 978-3-540-69548-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics