Short query linguistic expansion techniques: Palliating one-word queries by providing intermediate structure to text

Grefenstette, Gregory

doi:10.1007/3-540-63438-X_6

Gregory Grefenstette¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1299))

Included in the following conference series:

International Summer School on Information Extraction

491 Accesses
21 Citations

Abstract

The usual approach to finding information on the WWW via existing Web browsers is to use a one or two word query. Browsers return a number of documents containing these words, and the user examines those documents, or their abstracts, sees how the word or words in their query are being used and alters their initial query accordingly. This contrasts markedly with the Information Retrieval models explored by researchers over the past thirty-five years. These models were designed for longer queries and do not provide an adequate response to the user needs. On the other hand, recent advances in natural language processing permit the extraction of typed information that is axed on one or two words. We review a selection of this typed information and describe how it could be used to present an intermediate structure for the user fitting between their short queries and the documents found in a heterogeneous text collection such as the WWW.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 29.99; Price excludes VAT (USA)

Softcover Book: USD 39.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Salah Ait-Mokhtar and Jean-Pierre Chanod. Incremental finite-state parsing. In ANLP'97, pages 72–79, Washington, 1997.
Google Scholar
D.C. Blair and M.E. Maron. An evaluation of retrieval effectiveness. Communications of the ACM, 28:289–299, 1985.
Article Google Scholar
C. Borkowski. An experimental system for the automatic identification of personal names and personal titles in newspaper texts. American Documentation, 18:131, July 1967.
Article Google Scholar
Eric Brill. A simple Rule-Based part of speech tagger. In Proceedings of the Third conference on Applied Natural Language Processing, Trento, Italy, 1992. ACL.
Google Scholar
Ted Briscoe, Greg Grefenstette, Lluís Padró, and Iskander Serai. Hybrid techniques for training hmm part-of-speech tagger. Technical Report MLTT-007, Rank Xerox Research Centre, 1994.
Google Scholar
Chris Buckley, Amit Singhal, and Mindhar Mitra. New retrieval approaches using smart: Trec4. In D.K. Harman, editor, The Fourth Text Retrieval Conference (TREC-4), pages 25–48. U.S. Department of Commerce, 1996. NIST Special Publication 500–236.
Google Scholar
John Carroll and Ted Briscoe. The derivation of a large computational lexicon for english from ldoce. In B. Boguraev and T. Briscoe, editors, Computational Lexicography for Natural Language Processing, London, 1989. Longman.
Google Scholar
J.P. Chanod and P. Tapanainen. Creating a tagset, lexicon and guesser for a french tagger. In Proceedings of the A CL SIGDAT Workshop, Dublin, Ireland, 1995.
Google Scholar
Eugene Charniak. Statistical Language Learning. MIT Press, Cambridge, Mass, 1993.
Google Scholar
Fah-Chun Cheong. Internet Agents: Spiders, Wanderers, Brokers and 'Bots. New Riders Publishing, Indianapolis, 1996.
Google Scholar
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, March 1990.
Google Scholar
Cyril W. Cleverdon. The significance of the cranfield tests on index languages. In A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, editors, Proceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pages 3–131, New York, Oct 13–16 1991. SIGIR'91, Association for Computing Machinery. Special issue of the SIGIR Forum.
Google Scholar
Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. A practical part-of-speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing, April 1992.
Google Scholar
Douglas Cutting, Jan O. Pedersen, David Karger, and John W. Tukey. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proceedings of SIGIR'92, pages 318–329, Copenhagen, Denmark, June 21–24 1992. ACM.
Google Scholar
Steven J. DeRose. Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1):31–39, Winter 1988.
Google Scholar
David D. Donaldson. Internal and external evidence in the identification and semantic categorization of proper names. In B. Boguraev and J. Pustejovsky, editors, Proceedings of the SIGLEX Workshop on Acquisition of Lexical Knowledge from Text, pages 32–43, Columbus, OH, 1993.
Google Scholar
Lauren B. Doyle. Semantic road maps for literature searchers. Journal of the ACM, 8(4):553–578, October 1961.
Article Google Scholar
G. Grefenstette. Light parsing as finite state filtering. In Workshop on Extended finite state models of language, Budapest, Hungary, Aug 11–12 1996. ECAI'96.
Google Scholar
G. Grefenstette and F. Segond. Multilingual natural language processing. International Corpus of Corpus Linguistics, 2(1), 1997.
Article Google Scholar
G. Grefenstette and P. Tapanainen. What is a word, what is a sentence? Problems of tokenization. In 3rd Conference on Computational Lexicography and Text Research, Budapest, Hungary, 7–10 July 1994. COMPLEX'94. http://www.rxrc.xerox.com/publis/mltt/mltt-004.ps.
Google Scholar
Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston, 1994.
Book Google Scholar
Gregory Grefenstette. Corpus-derived first, second and third-order word affinities. In Sixth Euralex International Congress, Amsterdam, Aug 3–Sept 3, 1994.
Google Scholar
Gregory Grefenstette. Comparing two language identification schemes. In Proceedings of the 3rd International Conference on the Statistical Analysis of Textual Data, JADT'95, Rome, Dec 11–13, 1995.
Google Scholar
Gregory Grefenstette, Ulrich Heid, and Thierry Fontenelle. The DECIDE; project: Multilingual collocation extraction. In Seventh Euralex International Congress, University of Gothenburg, Sweden, Aug 13–18, 1996.
Google Scholar
Donna Harman. Relevance feedback revisited. In Proceedings of SIGIR'92, Copenhagen, Denmark, June 21–24 1992. ACM.
Google Scholar
Donna Harman, editor. The First Text REtrieval Conference (TREC-1). U.S. Government Printing Office, Washington, 1993. NIST Special Publication 500207.
Google Scholar
Marti A. Hearst, David Karger, and Jan O. Pedersen. Scatter/gather as a tool for the navigation of retrieval results. In Robin Burke, editor, Working Notes of the AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, Cambridge, MA, November 1995. AAAI.
Google Scholar
David A. Hull. Stemming algorithms: A case study for detailed evaluation. JASIS, 47(1), January 1996. Special Issue on the Evaluation of Information Retrieval systems.
Google Scholar
L. Karttunen, J.P Chanod, G. Grefenstette, and A. Schiller. Regular expression for language engineering. Journal of Natural Language Engineering, 1997.
Google Scholar
Lauri Karttunen. Finite-state lexicon compiler. Technical Report ISTL-NLTT1993-04-02, Xerox, Palo Alto Research Center, April 1993.
Google Scholar
Lauri Karttunen. Directed replacement. In Proceedings of the 34rd Annual Meeting of the A CL, Santa Cruz, CA, 1996.
Google Scholar
K. L. Kwok. A new method for weighting query terms for ad-hoc retrieval. In Proc. of the 19th ACMISIGIR Conference, pages 187–196, 1996.
Google Scholar
X. A. Lu and R. B. Keefer. Query expansion/reduction and its impact on information retrieval effectiveness. In Donna Harman, editor, The Thirs Text REtrieval Conference (TREC-3), pages 231–239, Washington, 1995. U.S. Government Printing Office. NIST Special Publication 500-225.
Google Scholar
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
Article Google Scholar
G. Russell, S. Pulman, G. Ritchie, and A. Black. A dictionary and morphological analyser for english. In 11th International Conference on Computational Linguistics, pages 277–279, Bonn, Germany, 1987.
Google Scholar
Gerard Salton. A note on information retrieval models. In RIAO'85, pages 2–27, Grenoble, France, March 18–20 1985. CID, Paris, and IMAG.
Google Scholar
Gerard Salton and M. McGill. An Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
MATH Google Scholar
Anne Schiller. Multilingual finite-state noun phrase extraction. In Workshop on Extended finite state models of language, Budapest, Hungary, Aug 11–12 1996. ECAI'96.
Google Scholar
Anne Schiller. Multilingual part-of-speech tagging and noun phrase mark-up. In 15th European Conference on Grammar and Lexicon of Romance Languages, University of Munich, Sept 19–21 1996.
Google Scholar
Pasi Tapanainen. RXRC finite-state compiler. Technical Report MLTT-020, Rank Xerox Research Centre, Grenoble, April 1995.
Google Scholar
Atro Voutilainen, Julia Heikkila, and Arto Anttila. A lexicon and constraint grammar of english. In Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, France, July 1992. COLING'92.
Google Scholar
Beatrice Warren, editor. Semantic Patterns of Noun-Noun Compounds. Acta Universitatis Gothoburgensis, Goteborg, Sweden, 1978. Gothenburg Studies in English, 41.
Google Scholar

Download references

Author information

Authors and Affiliations

Rank Xerox Research Centre, 38240, Meylan, France
Gregory Grefenstette

Authors

Gregory Grefenstette
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Maria Teresa Pazienza

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Grefenstette, G. (1997). Short query linguistic expansion techniques: Palliating one-word queries by providing intermediate structure to text. In: Pazienza, M.T. (eds) Information Extraction A Multidisciplinary Approach to an Emerging Information Technology. SCIE 1997. Lecture Notes in Computer Science, vol 1299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63438-X_6

Download citation

DOI: https://doi.org/10.1007/3-540-63438-X_6
Published: 30 July 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63438-6
Online ISBN: 978-3-540-69548-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics