Abstract
An important problem for information access systems is that of organizing large sets of documents that have been retrieved in response to a query. Text categorization and text clustering are two natural language processing tasks whose results can be applied to document organization. This chapter describes user interfaces that use categories and clusters to organize retrieval results, and examines the relationship between the two.1
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agosti, M., G. Gradenigo, and P.G. Marchetti. (1992). A hypertext environment for interacting with large textual databases. Information Processing f.4 Management, 28 (3), pp. 371–387.
Allen, Robert B., Pascal Obry, and Michael Littman. (1993). An interface for navigating clustered document sets returned by queries. In Proceedings of ACM COOCS: Conference on Organizational Computing Systems,Milpitis, CA, November.
Belkin, N., P. G. Marchetti, and C. Cool. (1993). Braque–design of an interface to support user interaction in information retrieval. Information Processing and Management, 29 (3), pp. 325–344.
Card, Stuart K., George G. Robertson, and William York. (1996). The webbook and the web forager: An information workspace for the world-wide web. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems,Vancouver, Canada.
Carpineto, Claudio and Giovanni Romano. (1996). Information retrieval through hybrid navigation of lattice representations. International Journal of Human-Computer Studies, 45 (5), pp. 553–578.
Casey, Michael A. and Joshua S. Wachman. (1996). Unsupervised cross-modal analysis of professional monologue discourse. In Proceedings of the Workshop on the Integration of Gesture and Language in Speech,Wilmington, DE.
Chalmers, Matthew and Paul Chitson. (1992). Bead: Exploration in information visualization. In Proceedings of the 15th Annual International ACM/SIGIR Conference, pp. 330–337, Copenhagen, Denmark.
Chen, Hsinchen, Andrea L. Houston, Robin R. Sewell, and Bruce R. Schatz. (1997). Internet browsing and searching: User evaluations of category map and concept space techniques. Journal of the American Society for Information Sciences (JASIS). To appear.
Croft, W. Bruce. (1977). Clustering large files of documents using the single link method. Journal of the American Society for Information Science, 28, pp. 341–344.
Croft, W. Bruce, Robert Cook, and Dean Wilder. (1995). Providing government information on the internet: Experiences with THOMAS. In Proceedings of Digital Libraries ‘85, pp. 19–24, Austin, TX.
Cutting, Douglass R., David Karger, and Jan Pedersen. (1993). Constant interaction-time Scatter/Gather browsing of very large document collections. In Proceedings of the 16th Annual International ACM/SIGIR Conference, pp. 126–135, Pittsburgh, PA.
Cutting, Douglass R., Jan O. Pedersen, and Per-Kristian Halvorsen. (1991). An object-oriented architecture for text retrieval. In Conference Proceedings of RIAO’91, Intelligent Text and Image Handling, Barcelona, Spain, pp. 285–298. Also available as Xerox PARC technical report SSL-90–83.
Cutting, Douglass R., Jan O. Pedersen, David Karger, and John W. Tukey. (1992). Scatter/Gather: A cluster-based approach to browsing large document collections. In Proceedings of the 15th Annual International ACM/SIGIR Conference, pp. 318–329, Copenhagen, Denmark.
Drabenstott, Karen M. and Marjorie S. Weller. (1996). The exact-display approach for online catalog subject searching. Information Processing and Management, 32 (6), pp. 719–745.
Fisher, Douglas H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning, 2, pp. 139–172.
Fowler, Richard H., Wendy A. L. Fowler, and Bradley A. Wilson. (1991). Integrating query, thesaurus, and documents through a common visual representation. In Proceedings of the 14th Annual International ACM/SIGIR Conference,pp. 142–151, Chicago.
Fox, Edward A., Deborah Hix, Lucy T. Nowell, Dennis J. Brueni, William C. Wake, Lenwwod S. Heath, and Durgesh Rao. (1993). Users, user interfaces, and objects: Envision, a digital library. Journal of the American Society for Information Science, 44 (8), pp. 480–491.
Fumas, George W. (1986). Generalized fisheye views. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems,pp. 16–23. ACM.
Harman, Donna. (1993). Overview of the first Text REtrieval Conference. In Proceedins of the 16th Annual International ACM/SIGIR Conference,pp. 36–48, Pittsburgh, P.A.
Hayes, Phillip J. (1992). Intelligent high-volume text processing using shallow, domain-specific techniques. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval,pp. 227–242. Lawrence Erlbaum Associates.
Hearst, Marti, Jan Pedersen, Peter Pirolli, Hinrich Schüetze, Gregory Grefenstette, and David Hull. (1996). Four TREC-4 Tracks: the Xerox site report. In Donna Harman, editor, Proceedings of the Fourth Text Retrieval Conference TREC-4. National Institute of Standards and Technology Special Publication.
Hearst, Marti A.,, David Karger, and Jan O. Pedersen. (1995). Scatter/gather as a tool for the navigation of retrieval results. In Robin Burke,,editor, Working Notes of the AAAI Fall Symposium on AI Applications in Knowledge Navigation and Retrieval,Cambridge, MA, AAAI.
Hearst, Marti A. (1995). Tilebars: Visualization of term distribution information in full text information access. In Proceedings of the ACM SIGCHI Conference on. Human Factors in Computing Systems,Denver, CO.
Hearst, Marti A. and Chandu Karadi. (1997). Cat-a-cone: An interactive interface for specifying searches and viewing retrieval results using a large category hierarchy. In Proceedings of the 20th Annual International ACM/SIGIR Conference, Philadelphia, PA. pp. 246–257.
Hearst, Marti A. and Jan O. Pedersen. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the 19th. Annual International ACM/SIGIR Conference, pp. 76–84, Zurich, Switzerland.
Huffman, Stephen. (1996). Acquaintance: Language-independent document categorization by n-grams. In Donna Harman, editor, Proceedings of the Fourth Text Retrieval Conference TREC-4. National Institute of Standards and Technology Special Publication, 500–236, pp. 359–372.
Hull, David A., Jan O. Pedersen, and Hinrich Schütze. (1996). Method combination for document filtering. In Proceedings of the 19th Annual International ACM/SIGIR Conference, pp. 279–287, Zurich, Switzerland.
Iwayama, M. and T. Tokunaga. (1995). Cluster-based text categorization: a comparison of category search strategies. In Proceedings of the 18th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 273–280, Seattle, WA.
Kaufman, Leonard and Peter J. Rousseeuw. (1990). Finding Groups in Data. John Wiley and Sons.
Klavans, Judith L. and Philip Resnik. (1996). The Balancing Act: Combining Symbolic and Statistical Approaches to Language. MIT Press.
Kleihoemer, Adrienee J., Manette B. Lazear, and Jan O. Pedersen. (1996). Tailoring a retrieval system for naive users. In Proceedings of the Fifth Annual Symposium on
Document Analysis and Information Retrieval (SDAIR),Las Vegas, NV. Kolodner, Janet L. (1993). Case-based Reasoning. Morgan Kaufmann Publishers.
Korfhage, Robert R. (1991). To see or not to see - is that the query? In Proceedings of the 14th Annual International ACM/SIGIR Conference,pp. 134–141, Chicago.
Lagoze, Carl. (1996). The warwick framework: A container architecture for diverse sets of metadata.
Lakoff, George. (1987). Women, Fire, and Dangerous Things. University of Chicago Press, Chicago, IL.
Larson, Ray R. (1992). Experiments in automatic library of congress classification. Jour-
nal of the American Society for Information Science,43(2), pp. 130–148.
Laurel, Brenda (editor). (1990). The Art of human-computer interface design. Addison-
Wesley Pub. Co., Reading, MA.
Lebowitz, Michael. (1987). Experiments with incremental concept formation: Unirnem. Machine Learning, 2, pp. 103–138.
Lewis, David D. (1992). Text Representation for Intelligent Text Retrieval: A
Classification-Oriented View. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval,pp. 179–198. Lawrence Erlbaum Associates.
Lewis, David D. and Philip J. Hayes. (1994). Special issue on text categorization. Transactions of Office Information Systems, 12 (3).
Lin, Xia. (1997). Map displays for information retrieval. Journal of the American Society for Information Science, 48 (1), pp. 40–54.
Lin, Xia, Dagobert Soergel, and Gary Marchionini. (1991). A self-organizing semantic map for information retrieval. In Proceedings of the 14th Annual International ACM/SIGIR Conference,pp. 262–269, Chicago.
Lowe, Henry J. and G. Octo Barnett. (1994). Understanding and using the medical subject headings (MeSH) vocabulary to perform literature searches. Journal of the American Medical Association (JAMA), 271 (4), pp 1103–1108.
Lu, X. Allan and Robert B. Keefer. (1995). Query expansion/reduction and its impact on retrieval effectiveness. In Donna Harman, editor, Proceedings of the Third Text Retrieval Conference TREC-3, pp. 231–239. National Institute of Standards and Technology Special Publication 500–225.
Maarek, Y. S. and A.J. Wecker. (1994). The librarian’s assistant: Automatically assembling books into dynamic bookshelves. In Proceedings of RIAO ‘84; Intelligent Multimedia Information Retrieval Systems and Management, pp. 233–247.
Manber, Udi and Sun Wu. (1994). GLIMPSE: a tool to search through entire file systems. In Proceedings of the Winter 1994 USENIX Conference, pp. 23–31, San Francisco, CA.
Markey, Karen, Pauline Atherton, and Claudia Newton. (1982). An analysis of controlled vocabulary and free text search statements in online searches. Online Review, 4, pp. 225–236.
McCune, B., R. Tong, J.S. Dean, and D. Shapiro. (1985). Rubric: A system for rule-based information retrieval. IEEE Transactions on Software Engineering, 11 (9), pp. 45–74.
Pedersen, Jan O. (1993). Computational aids for query improvement. In H. P. Frei and P. Schauble, editors, Hypermedia. Proceedings of the International Hypermedia ‘83 Conference, Zurich, Switzerland, American Statistical Association.
Pratt, Wanda. (1997). Dynamic organization of search results using the umis. In American Medical Informatics Association Fall Symposium. pp. 480–484.
Rao, R. and S. K. Card. (1994). The table lens: Merging graphical and symbolic representations in an interactive focus+context visualization for tabular information. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems. ACM.
Rennison, Earl. (1994). Galaxy of news: An approach to visualizing and understanding expansive news landscapes. In Proceedings of UIST 94, ACM Symposium on User Interface Software and Technology,pp. 3–12, New York.
Robertson, George C., Stuart K. Card, and Jock D. MacKinlay. (1993). Information visualization using 3D interactive animation. Communications of the ACM, 36 (4),pp. 56–71.
Rose, Daniel E. and Richard K. Below. (1991). Toward a direct-manipulation interface for conceptual information retrieval systems. In Martin Dillon, editor, Interfaces for Information Retrieval and Online Systems, pp. 39–54. Greenwood Press, New York, NY.
Salton, Gerard. (1971). Cluster search strategies and the optimization of retrieval effectiveness. In G. Salton, editor, The SMART Retrieval System, pp. 223–242. Prentice-Hall, Englewood Cliffs, N.J.
Salton, Gerard. (1989). Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, Reading, MA.
Schaffer, Doug, Zhengping Zuo, Saul Greenberg, Lyn Bartram, John Dill, Shelli Dubs, and Mark Roseman. (1996). Navigating hierarchically clustered networks through fisheye and full-zoom methods. ACM Transactions on Computer-Human Interaction, 3 (2), pp. 162–188.
Spoerri, Anselm. (1993). InfoCrystal: A visual tool for information retrieval and management. In Proceedings of Information Knowledge and Management ‘83,Washington, D.C.
Stanfill, Craig and David L. Waltz. (1992). Statistical methods, artificial intelligence, and information retrieval. In Paul S. Jacobs, editor, Text-Based Intelligent Systems: Current Research and Practice in Information Extraction and Retrieval,pp. 215–226. Lawrence Erlbaum Associates.
Thompson, R. H. and B. W. Croft. (1989). Support for browsing in an intelligent textretrieval system. International Journal of Man-Machine Studies, 30 (6), pp. 639–668.
R.ijsbergen, C. J. (1979). Information Retrieval. Butterworths, London.
Voorhees, Ellen M. (1985). The cluster hypothesis revisited. In Proceedings of ACM/SIGIR, pp. 188–196.
Willett, Peter. (1988). Recent trends in hierarchical document clustering: A critical review. Information Processing and Management, 24 (5), pp. 577–597.
Wise, James A., James J. Thomas, Kelly Pennock, David Lantrip, Marc Pottier, and Anne Schur. (1995). Visualizing the non-visual: Spatial analysis and interaction with information from text documents. In Proceedings of the Information Visualization Symposium 95,pp. 51–58. IEEE Computer Society Press.
Yang, Yiming and Christopher G. Chute. (1994). An example-based mapping method for text categorization and retrieval. Transactions of Office Information Systems, 12(3). Special Issue on Text Categorization.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Hearst, M.A. (1999). The Use of Categories and Clusters for Organizing Retrieval Results. In: Strzalkowski, T. (eds) Natural Language Information Retrieval. Text, Speech and Language Technology, vol 7. Springer, Dordrecht. https://doi.org/10.1007/978-94-017-2388-6_14
Download citation
DOI: https://doi.org/10.1007/978-94-017-2388-6_14
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-5209-4
Online ISBN: 978-94-017-2388-6
eBook Packages: Springer Book Archive