The Use of Categories and Clusters for Organizing Retrieval Results

Hearst, Marti A.

doi:10.1007/978-94-017-2388-6_14

Marti A. Hearst⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 7))

277 Accesses
19 Citations

Abstract

An important problem for information access systems is that of organizing large sets of documents that have been retrieved in response to a query. Text categorization and text clustering are two natural language processing tasks whose results can be applied to document organization. This chapter describes user interfaces that use categories and clusters to organize retrieval results, and examines the relationship between the two.¹

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agosti, M., G. Gradenigo, and P.G. Marchetti. (1992). A hypertext environment for interacting with large textual databases. Information Processing f.4 Management, 28 (3), pp. 371–387.
Article Google Scholar
Allen, Robert B., Pascal Obry, and Michael Littman. (1993). An interface for navigating clustered document sets returned by queries. In Proceedings of ACM COOCS: Conference on Organizational Computing Systems,Milpitis, CA, November.
Google Scholar
Belkin, N., P. G. Marchetti, and C. Cool. (1993). Braque–design of an interface to support user interaction in information retrieval. Information Processing and Management, 29 (3), pp. 325–344.
Article Google Scholar
Card, Stuart K., George G. Robertson, and William York. (1996). The webbook and the web forager: An information workspace for the world-wide web. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems,Vancouver, Canada.
Google Scholar
Carpineto, Claudio and Giovanni Romano. (1996). Information retrieval through hybrid navigation of lattice representations. International Journal of Human-Computer Studies, 45 (5), pp. 553–578.
Article Google Scholar
Casey, Michael A. and Joshua S. Wachman. (1996). Unsupervised cross-modal analysis of professional monologue discourse. In Proceedings of the Workshop on the Integration of Gesture and Language in Speech,Wilmington, DE.
Google Scholar
Chalmers, Matthew and Paul Chitson. (1992). Bead: Exploration in information visualization. In Proceedings of the 15th Annual International ACM/SIGIR Conference, pp. 330–337, Copenhagen, Denmark.
Google Scholar
Chen, Hsinchen, Andrea L. Houston, Robin R. Sewell, and Bruce R. Schatz. (1997). Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Sciences (JASIS). To appear.
Google Scholar
Croft, W. Bruce. (1977). Clustering large files of documents using the single link method. Journal of the American Society for Information Science, 28, pp. 341–344.
Article Google Scholar
Croft, W. Bruce, Robert Cook, and Dean Wilder. (1995). Providing government information on the internet: Experiences with THOMAS. In Proceedings of Digital Libraries ‘85, pp. 19–24, Austin, TX.
Google Scholar
Cutting, Douglass R., David Karger, and Jan Pedersen. (1993). Constant interaction-time Scatter/Gather browsing of very large document collections. In Proceedings of the 16th Annual International ACM/SIGIR Conference, pp. 126–135, Pittsburgh, PA.
Google Scholar
Cutting, Douglass R., Jan O. Pedersen, and Per-Kristian Halvorsen. (1991). An object-oriented architecture for text retrieval. In Conference Proceedings of RIAO’91, Intelligent Text and Image Handling, Barcelona, Spain, pp. 285–298. Also available as Xerox PARC technical report SSL-90–83.
Google Scholar
Cutting, Douglass R., Jan O. Pedersen, David Karger, and John W. Tukey. (1992). Scatter/Gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM/SIGIR Conference, pp. 318–329, Copenhagen, Denmark.
Google Scholar
Drabenstott, Karen M. and Marjorie S. Weller. (1996). The exact-display approach for online catalog subject searching. Information Processing and Management, 32 (6), pp. 719–745.
Article Google Scholar
Fisher, Douglas H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, pp. 139–172.
Google Scholar
Fowler, Richard H., Wendy A. L. Fowler, and Bradley A. Wilson. (1991). Integrating query, thesaurus, and documents through a common visual representation. In Proceedings of the 14th Annual International ACM/SIGIR Conference,pp. 142–151, Chicago.
Google Scholar
Fox, Edward A., Deborah Hix, Lucy T. Nowell, Dennis J. Brueni, William C. Wake, Lenwwod S. Heath, and Durgesh Rao. (1993). Users, user interfaces, and objects: Envision, a digital library. Journal of the American Society for Information Science, 44 (8), pp. 480–491.
Article Google Scholar
Fumas, George W. (1986). Generalized fisheye views. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems,pp. 16–23. ACM.
Google Scholar
Harman, Donna. (1993). Overview of the first Text REtrieval Conference. In Proceedins of the 16th Annual International ACM/SIGIR Conference,pp. 36–48, Pittsburgh, P^.A.
Google Scholar
Hayes, Phillip J. (1992). Intelligent high-volume text processing using shallow, domain-specific techniques. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval,pp. 227–242. Lawrence Erlbaum Associates.
Google Scholar
Hearst, Marti, Jan Pedersen, Peter Pirolli, Hinrich Schüetze, Gregory Grefenstette, and David Hull. (1996). Four TREC-4 Tracks: the Xerox site report. In Donna Harman, editor, Proceedings of the Fourth Text Retrieval Conference TREC-4. National Institute of Standards and Technology Special Publication.
Google Scholar
Hearst, Marti A.,, David Karger, and Jan O. Pedersen. (1995). Scatter/gather as a tool for the navigation of retrieval results. In Robin Burke,,editor, Working Notes of the AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval,Cambridge, MA, AAAI.
Google Scholar
Hearst, Marti A. (1995). Tilebars: Visualization of term distribution information in full text information access. In Proceedings of the ACM SIGCHI Conference on. Human Factors in Computing Systems,Denver, CO.
Google Scholar
Hearst, Marti A. and Chandu Karadi. (1997). Cat-a-cone: An interactive interface for specifying searches and viewing retrieval results using a large category hierarchy. In Proceedings of the 20th Annual International ACM/SIGIR Conference, Philadelphia, PA. pp. 246–257.
Google Scholar
Hearst, Marti A. and Jan O. Pedersen. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the 19th. Annual International ACM/SIGIR Conference, pp. 76–84, Zurich, Switzerland.
Google Scholar
Huffman, Stephen. (1996). Acquaintance: Language-independent document categorization by n-grams. In Donna Harman, editor, Proceedings of the Fourth Text Retrieval Conference TREC-4. National Institute of Standards and Technology Special Publication, 500–236, pp. 359–372.
Google Scholar
Hull, David A., Jan O. Pedersen, and Hinrich Schütze. (1996). Method combination for document filtering. In Proceedings of the 19th Annual International ACM/SIGIR Conference, pp. 279–287, Zurich, Switzerland.
Google Scholar
Iwayama, M. and T. Tokunaga. (1995). Cluster-based text categorization: a comparison of category search strategies. In Proceedings of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273–280, Seattle, WA.
Google Scholar
Kaufman, Leonard and Peter J. Rousseeuw. (1990). Finding Groups in Data. John Wiley and Sons.
Google Scholar
Klavans, Judith L. and Philip Resnik. (1996). The Balancing Act: Combining Symbolic and Statistical Approaches to Language. MIT Press.
Google Scholar
Kleihoemer, Adrienee J., Manette B. Lazear, and Jan O. Pedersen. (1996). Tailoring a retrieval system for naive users. In Proceedings of the Fifth Annual Symposium on
Google Scholar
Document Analysis and Information Retrieval (SDAIR),Las Vegas, NV. Kolodner, Janet L. (1993). Case-based Reasoning. Morgan Kaufmann Publishers.
Google Scholar
Korfhage, Robert R. (1991). To see or not to see - is that the query? In Proceedings of the 14th Annual International ACM/SIGIR Conference,pp. 134–141, Chicago.
Google Scholar
Lagoze, Carl. (1996). The warwick framework: A container architecture for diverse sets of metadata.
Google Scholar
Lakoff, George. (1987). Women, Fire, and Dangerous Things. University of Chicago Press, Chicago, IL.
Google Scholar
Larson, Ray R. (1992). Experiments in automatic library of congress classification. Jour-
Google Scholar
nal of the American Society for Information Science,43(2), pp. 130–148.
Google Scholar
Laurel, Brenda (editor). (1990). The Art of human-computer interface design. Addison-
Google Scholar
Wesley Pub. Co., Reading, MA.
Google Scholar
Lebowitz, Michael. (1987). Experiments with incremental concept formation: Unirnem. Machine Learning, 2, pp. 103–138.
Google Scholar
Lewis, David D. (1992). Text Representation for Intelligent Text Retrieval: A
Google Scholar
Classification-Oriented View. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval,pp. 179–198. Lawrence Erlbaum Associates.
Google Scholar
Lewis, David D. and Philip J. Hayes. (1994). Special issue on text categorization. Transactions of Office Information Systems, 12 (3).
Google Scholar
Lin, Xia. (1997). Map displays for information retrieval. Journal of the American Society for Information Science, 48 (1), pp. 40–54.
Article Google Scholar
Lin, Xia, Dagobert Soergel, and Gary Marchionini. (1991). A self-organizing semantic map for information retrieval. In Proceedings of the 14th Annual International ACM/SIGIR Conference,pp. 262–269, Chicago.
Google Scholar
Lowe, Henry J. and G. Octo Barnett. (1994). Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Journal of the American Medical Association (JAMA), 271 (4), pp 1103–1108.
Article Google Scholar
Lu, X. Allan and Robert B. Keefer. (1995). Query expansion/reduction and its impact on retrieval effectiveness. In Donna Harman, editor, Proceedings of the Third Text Retrieval Conference TREC-3, pp. 231–239. National Institute of Standards and Technology Special Publication 500–225.
Google Scholar
Maarek, Y. S. and A.J. Wecker. (1994). The librarian’s assistant: Automatically assembling books into dynamic bookshelves. In Proceedings of RIAO ‘84; Intelligent Multimedia Information Retrieval Systems and Management, pp. 233–247.
Google Scholar
Manber, Udi and Sun Wu. (1994). GLIMPSE: a tool to search through entire file systems. In Proceedings of the Winter 1994 USENIX Conference, pp. 23–31, San Francisco, CA.
Google Scholar
Markey, Karen, Pauline Atherton, and Claudia Newton. (1982). An analysis of controlled vocabulary and free text search statements in online searches. Online Review, 4, pp. 225–236.
Article Google Scholar
McCune, B., R. Tong, J.S. Dean, and D. Shapiro. (1985). Rubric: A system for rule-based information retrieval. IEEE Transactions on Software Engineering, 11 (9), pp. 45–74.
Google Scholar
Pedersen, Jan O. (1993). Computational aids for query improvement. In H. P. Frei and P. Schauble, editors, Hypermedia. Proceedings of the International Hypermedia ‘83 Conference, Zurich, Switzerland, American Statistical Association.
Google Scholar
Pratt, Wanda. (1997). Dynamic organization of search results using the umis. In American Medical Informatics Association Fall Symposium. pp. 480–484.
Google Scholar
Rao, R. and S. K. Card. (1994). The table lens: Merging graphical and symbolic representations in an interactive focus+context visualization for tabular information. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems. ACM.
Google Scholar
Rennison, Earl. (1994). Galaxy of news: An approach to visualizing and understanding expansive news landscapes. In Proceedings of UIST 94, ACM Symposium on User Interface Software and Technology,pp. 3–12, New York.
Google Scholar
Robertson, George C., Stuart K. Card, and Jock D. MacKinlay. (1993). Information visualization using 3D interactive animation. Communications of the ACM, 36 (4),pp. 56–71.
Article Google Scholar
Rose, Daniel E. and Richard K. Below. (1991). Toward a direct-manipulation interface for conceptual information retrieval systems. In Martin Dillon, editor, Interfaces for Information Retrieval and Online Systems, pp. 39–54. Greenwood Press, New York, NY.
Google Scholar
Salton, Gerard. (1971). Cluster search strategies and the optimization of retrieval effectiveness. In G. Salton, editor, The SMART Retrieval System, pp. 223–242. Prentice-Hall, Englewood Cliffs, N.J.
Google Scholar
Salton, Gerard. (1989). Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading, MA.
Google Scholar
Schaffer, Doug, Zhengping Zuo, Saul Greenberg, Lyn Bartram, John Dill, Shelli Dubs, and Mark Roseman. (1996). Navigating hierarchically clustered networks through fisheye and full-zoom methods. ACM Transactions on Computer-Human Interaction, 3 (2), pp. 162–188.
Article Google Scholar
Spoerri, Anselm. (1993). InfoCrystal: A visual tool for information retrieval and management. In Proceedings of Information Knowledge and Management ‘83,Washington, D.C.
Google Scholar
Stanfill, Craig and David L. Waltz. (1992). Statistical methods, artificial intelligence, and information retrieval. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval,pp. 215–226. Lawrence Erlbaum Associates.
Google Scholar
Thompson, R. H. and B. W. Croft. (1989). Support for browsing in an intelligent textretrieval system. International Journal of Man-Machine Studies, 30 (6), pp. 639–668.
Article Google Scholar
R.ijsbergen, C. J. (1979). Information Retrieval. Butterworths, London.
Google Scholar
Voorhees, Ellen M. (1985). The cluster hypothesis revisited. In Proceedings of ACM/SIGIR, pp. 188–196.
Google Scholar
Willett, Peter. (1988). Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24 (5), pp. 577–597.
Article Google Scholar
Wise, James A., James J. Thomas, Kelly Pennock, David Lantrip, Marc Pottier, and Anne Schur. (1995). Visualizing the non-visual: Spatial analysis and interaction with information from text documents. In Proceedings of the Information Visualization Symposium 95,pp. 51–58. IEEE Computer Society Press.
Google Scholar
Yang, Yiming and Christopher G. Chute. (1994). An example-based mapping method for text categorization and retrieval. Transactions of Office Information Systems, 12(3). Special Issue on Text Categorization.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Management & Systems, University of California, 102 South Hall, Berkeley, CA, 94720-4600, USA
Marti A. Hearst

Authors

Marti A. Hearst
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

General Electric, Research & Development, 12301, Schenectady, NY, USA
Tomek Strzalkowski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hearst, M.A. (1999). The Use of Categories and Clusters for Organizing Retrieval Results. In: Strzalkowski, T. (eds) Natural Language Information Retrieval. Text, Speech and Language Technology, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2388-6_14

Download citation

DOI: https://doi.org/10.1007/978-94-017-2388-6_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5209-4
Online ISBN: 978-94-017-2388-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics