Skip to main content

The Use of Categories and Clusters for Organizing Retrieval Results

  • Chapter
Natural Language Information Retrieval

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 7))

Abstract

An important problem for information access systems is that of organizing large sets of documents that have been retrieved in response to a query. Text categorization and text clustering are two natural language processing tasks whose results can be applied to document organization. This chapter describes user interfaces that use categories and clusters to organize retrieval results, and examines the relationship between the two.1

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Agosti, M., G. Gradenigo, and P.G. Marchetti. (1992). A hypertext environment for interacting with large textual databases. Information Processing f.4 Management, 28 (3), pp. 371–387.

    Article  Google Scholar 

  • Allen, Robert B., Pascal Obry, and Michael Littman. (1993). An interface for navigating clustered document sets returned by queries. In Proceedings of ACM COOCS: Conference on Organizational Computing Systems,Milpitis, CA, November.

    Google Scholar 

  • Belkin, N., P. G. Marchetti, and C. Cool. (1993). Braque–design of an interface to support user interaction in information retrieval. Information Processing and Management, 29 (3), pp. 325–344.

    Article  Google Scholar 

  • Card, Stuart K., George G. Robertson, and William York. (1996). The webbook and the web forager: An information workspace for the world-wide web. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems,Vancouver, Canada.

    Google Scholar 

  • Carpineto, Claudio and Giovanni Romano. (1996). Information retrieval through hybrid navigation of lattice representations. International Journal of Human-Computer Studies, 45 (5), pp. 553–578.

    Article  Google Scholar 

  • Casey, Michael A. and Joshua S. Wachman. (1996). Unsupervised cross-modal analysis of professional monologue discourse. In Proceedings of the Workshop on the Integration of Gesture and Language in Speech,Wilmington, DE.

    Google Scholar 

  • Chalmers, Matthew and Paul Chitson. (1992). Bead: Exploration in information visualization. In Proceedings of the 15th Annual International ACM/SIGIR Conference, pp. 330–337, Copenhagen, Denmark.

    Google Scholar 

  • Chen, Hsinchen, Andrea L. Houston, Robin R. Sewell, and Bruce R. Schatz. (1997). Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Sciences (JASIS). To appear.

    Google Scholar 

  • Croft, W. Bruce. (1977). Clustering large files of documents using the single link method. Journal of the American Society for Information Science, 28, pp. 341–344.

    Article  Google Scholar 

  • Croft, W. Bruce, Robert Cook, and Dean Wilder. (1995). Providing government information on the internet: Experiences with THOMAS. In Proceedings of Digital Libraries ‘85, pp. 19–24, Austin, TX.

    Google Scholar 

  • Cutting, Douglass R., David Karger, and Jan Pedersen. (1993). Constant interaction-time Scatter/Gather browsing of very large document collections. In Proceedings of the 16th Annual International ACM/SIGIR Conference, pp. 126–135, Pittsburgh, PA.

    Google Scholar 

  • Cutting, Douglass R., Jan O. Pedersen, and Per-Kristian Halvorsen. (1991). An object-oriented architecture for text retrieval. In Conference Proceedings of RIAO’91, Intelligent Text and Image Handling, Barcelona, Spain, pp. 285–298. Also available as Xerox PARC technical report SSL-90–83.

    Google Scholar 

  • Cutting, Douglass R., Jan O. Pedersen, David Karger, and John W. Tukey. (1992). Scatter/Gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM/SIGIR Conference, pp. 318–329, Copenhagen, Denmark.

    Google Scholar 

  • Drabenstott, Karen M. and Marjorie S. Weller. (1996). The exact-display approach for online catalog subject searching. Information Processing and Management, 32 (6), pp. 719–745.

    Article  Google Scholar 

  • Fisher, Douglas H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, pp. 139–172.

    Google Scholar 

  • Fowler, Richard H., Wendy A. L. Fowler, and Bradley A. Wilson. (1991). Integrating query, thesaurus, and documents through a common visual representation. In Proceedings of the 14th Annual International ACM/SIGIR Conference,pp. 142–151, Chicago.

    Google Scholar 

  • Fox, Edward A., Deborah Hix, Lucy T. Nowell, Dennis J. Brueni, William C. Wake, Lenwwod S. Heath, and Durgesh Rao. (1993). Users, user interfaces, and objects: Envision, a digital library. Journal of the American Society for Information Science, 44 (8), pp. 480–491.

    Article  Google Scholar 

  • Fumas, George W. (1986). Generalized fisheye views. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems,pp. 16–23. ACM.

    Google Scholar 

  • Harman, Donna. (1993). Overview of the first Text REtrieval Conference. In Proceedins of the 16th Annual International ACM/SIGIR Conference,pp. 36–48, Pittsburgh, P.A.

    Google Scholar 

  • Hayes, Phillip J. (1992). Intelligent high-volume text processing using shallow, domain-specific techniques. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval,pp. 227–242. Lawrence Erlbaum Associates.

    Google Scholar 

  • Hearst, Marti, Jan Pedersen, Peter Pirolli, Hinrich Schüetze, Gregory Grefenstette, and David Hull. (1996). Four TREC-4 Tracks: the Xerox site report. In Donna Harman, editor, Proceedings of the Fourth Text Retrieval Conference TREC-4. National Institute of Standards and Technology Special Publication.

    Google Scholar 

  • Hearst, Marti A.,, David Karger, and Jan O. Pedersen. (1995). Scatter/gather as a tool for the navigation of retrieval results. In Robin Burke,,editor, Working Notes of the AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval,Cambridge, MA, AAAI.

    Google Scholar 

  • Hearst, Marti A. (1995). Tilebars: Visualization of term distribution information in full text information access. In Proceedings of the ACM SIGCHI Conference on. Human Factors in Computing Systems,Denver, CO.

    Google Scholar 

  • Hearst, Marti A. and Chandu Karadi. (1997). Cat-a-cone: An interactive interface for specifying searches and viewing retrieval results using a large category hierarchy. In Proceedings of the 20th Annual International ACM/SIGIR Conference, Philadelphia, PA. pp. 246–257.

    Google Scholar 

  • Hearst, Marti A. and Jan O. Pedersen. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the 19th. Annual International ACM/SIGIR Conference, pp. 76–84, Zurich, Switzerland.

    Google Scholar 

  • Huffman, Stephen. (1996). Acquaintance: Language-independent document categorization by n-grams. In Donna Harman, editor, Proceedings of the Fourth Text Retrieval Conference TREC-4. National Institute of Standards and Technology Special Publication, 500–236, pp. 359–372.

    Google Scholar 

  • Hull, David A., Jan O. Pedersen, and Hinrich Schütze. (1996). Method combination for document filtering. In Proceedings of the 19th Annual International ACM/SIGIR Conference, pp. 279–287, Zurich, Switzerland.

    Google Scholar 

  • Iwayama, M. and T. Tokunaga. (1995). Cluster-based text categorization: a comparison of category search strategies. In Proceedings of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273–280, Seattle, WA.

    Google Scholar 

  • Kaufman, Leonard and Peter J. Rousseeuw. (1990). Finding Groups in Data. John Wiley and Sons.

    Google Scholar 

  • Klavans, Judith L. and Philip Resnik. (1996). The Balancing Act: Combining Symbolic and Statistical Approaches to Language. MIT Press.

    Google Scholar 

  • Kleihoemer, Adrienee J., Manette B. Lazear, and Jan O. Pedersen. (1996). Tailoring a retrieval system for naive users. In Proceedings of the Fifth Annual Symposium on

    Google Scholar 

  • Document Analysis and Information Retrieval (SDAIR),Las Vegas, NV. Kolodner, Janet L. (1993). Case-based Reasoning. Morgan Kaufmann Publishers.

    Google Scholar 

  • Korfhage, Robert R. (1991). To see or not to see - is that the query? In Proceedings of the 14th Annual International ACM/SIGIR Conference,pp. 134–141, Chicago.

    Google Scholar 

  • Lagoze, Carl. (1996). The warwick framework: A container architecture for diverse sets of metadata.

    Google Scholar 

  • Lakoff, George. (1987). Women, Fire, and Dangerous Things. University of Chicago Press, Chicago, IL.

    Google Scholar 

  • Larson, Ray R. (1992). Experiments in automatic library of congress classification. Jour-

    Google Scholar 

  • nal of the American Society for Information Science,43(2), pp. 130–148.

    Google Scholar 

  • Laurel, Brenda (editor). (1990). The Art of human-computer interface design. Addison-

    Google Scholar 

  • Wesley Pub. Co., Reading, MA.

    Google Scholar 

  • Lebowitz, Michael. (1987). Experiments with incremental concept formation: Unirnem. Machine Learning, 2, pp. 103–138.

    Google Scholar 

  • Lewis, David D. (1992). Text Representation for Intelligent Text Retrieval: A

    Google Scholar 

  • Classification-Oriented View. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval,pp. 179–198. Lawrence Erlbaum Associates.

    Google Scholar 

  • Lewis, David D. and Philip J. Hayes. (1994). Special issue on text categorization. Transactions of Office Information Systems, 12 (3).

    Google Scholar 

  • Lin, Xia. (1997). Map displays for information retrieval. Journal of the American Society for Information Science, 48 (1), pp. 40–54.

    Article  Google Scholar 

  • Lin, Xia, Dagobert Soergel, and Gary Marchionini. (1991). A self-organizing semantic map for information retrieval. In Proceedings of the 14th Annual International ACM/SIGIR Conference,pp. 262–269, Chicago.

    Google Scholar 

  • Lowe, Henry J. and G. Octo Barnett. (1994). Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Journal of the American Medical Association (JAMA), 271 (4), pp 1103–1108.

    Article  Google Scholar 

  • Lu, X. Allan and Robert B. Keefer. (1995). Query expansion/reduction and its impact on retrieval effectiveness. In Donna Harman, editor, Proceedings of the Third Text Retrieval Conference TREC-3, pp. 231–239. National Institute of Standards and Technology Special Publication 500–225.

    Google Scholar 

  • Maarek, Y. S. and A.J. Wecker. (1994). The librarian’s assistant: Automatically assembling books into dynamic bookshelves. In Proceedings of RIAO ‘84; Intelligent Multimedia Information Retrieval Systems and Management, pp. 233–247.

    Google Scholar 

  • Manber, Udi and Sun Wu. (1994). GLIMPSE: a tool to search through entire file systems. In Proceedings of the Winter 1994 USENIX Conference, pp. 23–31, San Francisco, CA.

    Google Scholar 

  • Markey, Karen, Pauline Atherton, and Claudia Newton. (1982). An analysis of controlled vocabulary and free text search statements in online searches. Online Review, 4, pp. 225–236.

    Article  Google Scholar 

  • McCune, B., R. Tong, J.S. Dean, and D. Shapiro. (1985). Rubric: A system for rule-based information retrieval. IEEE Transactions on Software Engineering, 11 (9), pp. 45–74.

    Google Scholar 

  • Pedersen, Jan O. (1993). Computational aids for query improvement. In H. P. Frei and P. Schauble, editors, Hypermedia. Proceedings of the International Hypermedia ‘83 Conference, Zurich, Switzerland, American Statistical Association.

    Google Scholar 

  • Pratt, Wanda. (1997). Dynamic organization of search results using the umis. In American Medical Informatics Association Fall Symposium. pp. 480–484.

    Google Scholar 

  • Rao, R. and S. K. Card. (1994). The table lens: Merging graphical and symbolic representations in an interactive focus+context visualization for tabular information. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems. ACM.

    Google Scholar 

  • Rennison, Earl. (1994). Galaxy of news: An approach to visualizing and understanding expansive news landscapes. In Proceedings of UIST 94, ACM Symposium on User Interface Software and Technology,pp. 3–12, New York.

    Google Scholar 

  • Robertson, George C., Stuart K. Card, and Jock D. MacKinlay. (1993). Information visualization using 3D interactive animation. Communications of the ACM, 36 (4),pp. 56–71.

    Article  Google Scholar 

  • Rose, Daniel E. and Richard K. Below. (1991). Toward a direct-manipulation interface for conceptual information retrieval systems. In Martin Dillon, editor, Interfaces for Information Retrieval and Online Systems, pp. 39–54. Greenwood Press, New York, NY.

    Google Scholar 

  • Salton, Gerard. (1971). Cluster search strategies and the optimization of retrieval effectiveness. In G. Salton, editor, The SMART Retrieval System, pp. 223–242. Prentice-Hall, Englewood Cliffs, N.J.

    Google Scholar 

  • Salton, Gerard. (1989). Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading, MA.

    Google Scholar 

  • Schaffer, Doug, Zhengping Zuo, Saul Greenberg, Lyn Bartram, John Dill, Shelli Dubs, and Mark Roseman. (1996). Navigating hierarchically clustered networks through fisheye and full-zoom methods. ACM Transactions on Computer-Human Interaction, 3 (2), pp. 162–188.

    Article  Google Scholar 

  • Spoerri, Anselm. (1993). InfoCrystal: A visual tool for information retrieval and management. In Proceedings of Information Knowledge and Management ‘83,Washington, D.C.

    Google Scholar 

  • Stanfill, Craig and David L. Waltz. (1992). Statistical methods, artificial intelligence, and information retrieval. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval,pp. 215–226. Lawrence Erlbaum Associates.

    Google Scholar 

  • Thompson, R. H. and B. W. Croft. (1989). Support for browsing in an intelligent textretrieval system. International Journal of Man-Machine Studies, 30 (6), pp. 639–668.

    Article  Google Scholar 

  • R.ijsbergen, C. J. (1979). Information Retrieval. Butterworths, London.

    Google Scholar 

  • Voorhees, Ellen M. (1985). The cluster hypothesis revisited. In Proceedings of ACM/SIGIR, pp. 188–196.

    Google Scholar 

  • Willett, Peter. (1988). Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24 (5), pp. 577–597.

    Article  Google Scholar 

  • Wise, James A., James J. Thomas, Kelly Pennock, David Lantrip, Marc Pottier, and Anne Schur. (1995). Visualizing the non-visual: Spatial analysis and interaction with information from text documents. In Proceedings of the Information Visualization Symposium 95,pp. 51–58. IEEE Computer Society Press.

    Google Scholar 

  • Yang, Yiming and Christopher G. Chute. (1994). An example-based mapping method for text categorization and retrieval. Transactions of Office Information Systems, 12(3). Special Issue on Text Categorization.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Hearst, M.A. (1999). The Use of Categories and Clusters for Organizing Retrieval Results. In: Strzalkowski, T. (eds) Natural Language Information Retrieval. Text, Speech and Language Technology, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2388-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-94-017-2388-6_14

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-5209-4

  • Online ISBN: 978-94-017-2388-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics