Abstract
The usual approach to finding information on the WWW via existing Web browsers is to use a one or two word query. Browsers return a number of documents containing these words, and the user examines those documents, or their abstracts, sees how the word or words in their query are being used and alters their initial query accordingly. This contrasts markedly with the Information Retrieval models explored by researchers over the past thirty-five years. These models were designed for longer queries and do not provide an adequate response to the user needs. On the other hand, recent advances in natural language processing permit the extraction of typed information that is axed on one or two words. We review a selection of this typed information and describe how it could be used to present an intermediate structure for the user fitting between their short queries and the documents found in a heterogeneous text collection such as the WWW.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Salah Ait-Mokhtar and Jean-Pierre Chanod. Incremental finite-state parsing. In ANLP'97, pages 72–79, Washington, 1997.
D.C. Blair and M.E. Maron. An evaluation of retrieval effectiveness. Communications of the ACM, 28:289–299, 1985.
C. Borkowski. An experimental system for the automatic identification of personal names and personal titles in newspaper texts. American Documentation, 18:131, July 1967.
Eric Brill. A simple Rule-Based part of speech tagger. In Proceedings of the Third conference on Applied Natural Language Processing, Trento, Italy, 1992. ACL.
Ted Briscoe, Greg Grefenstette, LluÃs Padró, and Iskander Serai. Hybrid techniques for training hmm part-of-speech tagger. Technical Report MLTT-007, Rank Xerox Research Centre, 1994.
Chris Buckley, Amit Singhal, and Mindhar Mitra. New retrieval approaches using smart: Trec4. In D.K. Harman, editor, The Fourth Text Retrieval Conference (TREC-4), pages 25–48. U.S. Department of Commerce, 1996. NIST Special Publication 500–236.
John Carroll and Ted Briscoe. The derivation of a large computational lexicon for english from ldoce. In B. Boguraev and T. Briscoe, editors, Computational Lexicography for Natural Language Processing, London, 1989. Longman.
J.P. Chanod and P. Tapanainen. Creating a tagset, lexicon and guesser for a french tagger. In Proceedings of the A CL SIGDAT Workshop, Dublin, Ireland, 1995.
Eugene Charniak. Statistical Language Learning. MIT Press, Cambridge, Mass, 1993.
Fah-Chun Cheong. Internet Agents: Spiders, Wanderers, Brokers and 'Bots. New Riders Publishing, Indianapolis, 1996.
Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography. Computational Linguistics, 16(1):22–29, March 1990.
Cyril W. Cleverdon. The significance of the cranfield tests on index languages. In A. Bookstein, Y. Chiaramella, G. Salton, and V. V. Raghavan, editors, Proceedings of the 14th Annual International ACM/SIGIR Conference on Research and Development in Information Retrieval, pages 3–131, New York, Oct 13–16 1991. SIGIR'91, Association for Computing Machinery. Special issue of the SIGIR Forum.
Doug Cutting, Julian Kupiec, Jan Pedersen, and Penelope Sibun. A practical part-of-speech tagger. Proceedings of the Third Conference on Applied Natural Language Processing, April 1992.
Douglas Cutting, Jan O. Pedersen, David Karger, and John W. Tukey. Scatter/Gather: A cluster-based approach to browsing large document collections. In Proceedings of SIGIR'92, pages 318–329, Copenhagen, Denmark, June 21–24 1992. ACM.
Steven J. DeRose. Grammatical category disambiguation by statistical optimization. Computational Linguistics, 14(1):31–39, Winter 1988.
David D. Donaldson. Internal and external evidence in the identification and semantic categorization of proper names. In B. Boguraev and J. Pustejovsky, editors, Proceedings of the SIGLEX Workshop on Acquisition of Lexical Knowledge from Text, pages 32–43, Columbus, OH, 1993.
Lauren B. Doyle. Semantic road maps for literature searchers. Journal of the ACM, 8(4):553–578, October 1961.
G. Grefenstette. Light parsing as finite state filtering. In Workshop on Extended finite state models of language, Budapest, Hungary, Aug 11–12 1996. ECAI'96.
G. Grefenstette and F. Segond. Multilingual natural language processing. International Corpus of Corpus Linguistics, 2(1), 1997.
G. Grefenstette and P. Tapanainen. What is a word, what is a sentence? Problems of tokenization. In 3rd Conference on Computational Lexicography and Text Research, Budapest, Hungary, 7–10 July 1994. COMPLEX'94. http://www.rxrc.xerox.com/publis/mltt/mltt-004.ps.
Gregory Grefenstette. Explorations in Automatic Thesaurus Discovery. Kluwer Academic Press, Boston, 1994.
Gregory Grefenstette. Corpus-derived first, second and third-order word affinities. In Sixth Euralex International Congress, Amsterdam, Aug 3–Sept 3, 1994.
Gregory Grefenstette. Comparing two language identification schemes. In Proceedings of the 3rd International Conference on the Statistical Analysis of Textual Data, JADT'95, Rome, Dec 11–13, 1995.
Gregory Grefenstette, Ulrich Heid, and Thierry Fontenelle. The DECIDE; project: Multilingual collocation extraction. In Seventh Euralex International Congress, University of Gothenburg, Sweden, Aug 13–18, 1996.
Donna Harman. Relevance feedback revisited. In Proceedings of SIGIR'92, Copenhagen, Denmark, June 21–24 1992. ACM.
Donna Harman, editor. The First Text REtrieval Conference (TREC-1). U.S. Government Printing Office, Washington, 1993. NIST Special Publication 500207.
Marti A. Hearst, David Karger, and Jan O. Pedersen. Scatter/gather as a tool for the navigation of retrieval results. In Robin Burke, editor, Working Notes of the AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval, Cambridge, MA, November 1995. AAAI.
David A. Hull. Stemming algorithms: A case study for detailed evaluation. JASIS, 47(1), January 1996. Special Issue on the Evaluation of Information Retrieval systems.
L. Karttunen, J.P Chanod, G. Grefenstette, and A. Schiller. Regular expression for language engineering. Journal of Natural Language Engineering, 1997.
Lauri Karttunen. Finite-state lexicon compiler. Technical Report ISTL-NLTT1993-04-02, Xerox, Palo Alto Research Center, April 1993.
Lauri Karttunen. Directed replacement. In Proceedings of the 34rd Annual Meeting of the A CL, Santa Cruz, CA, 1996.
K. L. Kwok. A new method for weighting query terms for ad-hoc retrieval. In Proc. of the 19th ACMISIGIR Conference, pages 187–196, 1996.
X. A. Lu and R. B. Keefer. Query expansion/reduction and its impact on information retrieval effectiveness. In Donna Harman, editor, The Thirs Text REtrieval Conference (TREC-3), pages 231–239, Washington, 1995. U.S. Government Printing Office. NIST Special Publication 500-225.
M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130–137, 1980.
G. Russell, S. Pulman, G. Ritchie, and A. Black. A dictionary and morphological analyser for english. In 11th International Conference on Computational Linguistics, pages 277–279, Bonn, Germany, 1987.
Gerard Salton. A note on information retrieval models. In RIAO'85, pages 2–27, Grenoble, France, March 18–20 1985. CID, Paris, and IMAG.
Gerard Salton and M. McGill. An Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
Anne Schiller. Multilingual finite-state noun phrase extraction. In Workshop on Extended finite state models of language, Budapest, Hungary, Aug 11–12 1996. ECAI'96.
Anne Schiller. Multilingual part-of-speech tagging and noun phrase mark-up. In 15th European Conference on Grammar and Lexicon of Romance Languages, University of Munich, Sept 19–21 1996.
Pasi Tapanainen. RXRC finite-state compiler. Technical Report MLTT-020, Rank Xerox Research Centre, Grenoble, April 1995.
Atro Voutilainen, Julia Heikkila, and Arto Anttila. A lexicon and constraint grammar of english. In Proceedings of the Fourteenth International Conference on Computational Linguistics, Nantes, France, July 1992. COLING'92.
Beatrice Warren, editor. Semantic Patterns of Noun-Noun Compounds. Acta Universitatis Gothoburgensis, Goteborg, Sweden, 1978. Gothenburg Studies in English, 41.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grefenstette, G. (1997). Short query linguistic expansion techniques: Palliating one-word queries by providing intermediate structure to text. In: Pazienza, M.T. (eds) Information Extraction A Multidisciplinary Approach to an Emerging Information Technology. SCIE 1997. Lecture Notes in Computer Science, vol 1299. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63438-X_6
Download citation
DOI: https://doi.org/10.1007/3-540-63438-X_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63438-6
Online ISBN: 978-3-540-69548-6
eBook Packages: Springer Book Archive