Clustering Web Search Results with Maximum Spanning Trees

Di Marco, Antonio; Navigli, Roberto

doi:10.1007/978-3-642-23954-0_20

Antonio Di Marco¹⁹ &
Roberto Navigli¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6934))

Included in the following conference series:

Congress of the Italian Association for Artificial Intelligence

976 Accesses
9 Citations

Abstract

We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. We show that our approach improves classical search result clustering methods in terms of both clustering quality and degree of diversification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agirre, E., Martínez, D., de Lacalle, O.L., Soroa, A.: Evaluating and optimizing the parameters of an unsupervised graph-based WSD algorithm. In: Proc. of TextGraphs 2006, New York, USA, pp. 89–96 (2006)
Google Scholar
Bennett, P.N., Nguyen, N.: Refined experts: improving classification in large taxonomies. In: Proc. of SIGIR 2009, Boston, MA, USA, pp. 11–18 (2009)
Google Scholar
Bernardini, A., Carpineto, C., D’Amico, M.: Full-subtopic retrieval with keyphrase-based search results clustering. In: Proc. of WI 2009, Milan, Italy, pp. 206–213 (2009)
Google Scholar
Brants, T., Franz, A.: Web 1t 5-gram, ver. 1, ldc2006t13. In: LDC, PA, USA (2006)
Google Scholar
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proc. of SIGIR 1998, Melbourne, Australia, pp. 335–336 (1998)
Google Scholar
Carmel, D., Roitman, H., Zwerdling, N.: Enhancing cluster labeling using Wikipedia. In: Proc. of SIGIR 2009, MA, USA, pp. 139–146 (2009)
Google Scholar
Carpineto, C., Osiński, S., Romano, G., Weiss, D.: A survey of web clustering engines. ACM Computing Surveys 41(3), 1–38 (2009)
Article Google Scholar
Carpineto, C., Romano, G.: Exploiting the potential of concept lattices for information retrieval with CREDO. Journal of Universal Computer Science 10(8), 985–1013 (2004)
MATH Google Scholar
Chen, H., Karger, D.R.: Less is more: probabilistic models for retrieving fewer relevant documents. In: Proc. of SIGIR 2006, Seattle, WA, USA, pp. 429–436 (2006)
Google Scholar
Chen, J., Zaïane, O.R., Goebel, R.: An unsupervised approach to cluster web search results based on word sense communities. In: Proc. of WI-IAT 2008, Sydney, Australia, pp. 725–729 (2008)
Google Scholar
Cheng, D., Vempala, S., Kannan, R., Wang, G.: A divide-and-merge methodology for clustering. In: Proc. of PODS 2005, New York, NY, USA, pp. 196–205 (2005)
Google Scholar
Crabtree, D., Gao, X., Andreae, P.: Improving web clustering by cluster selection. In: Proc. of WI 2005, Compiègne, France, pp. 172–178 (2005)
Google Scholar
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proc. of SIGIR 1992, Copenhagen, Denmark, pp. 318–329 (1992)
Google Scholar
Di Giacomo, E., Didimo, W., Grilli, L., Liotta, G.: Graph visualization techniques for web clustering engines. IEEE Transactions on Visualization and Computer Graphics 13(2), 294–304 (2007)
Article Google Scholar
Harris, Z.: Distributional structure. Word 10, 146–162 (1954)
Article Google Scholar
Kamvar, M., Baluja, S.: A large scale study of wireless search behavior: Google mobile search. In: Proc. of CHI 2006, New York, NY, USA, pp. 701–709 (2006)
Google Scholar
Ke, W., Sugimoto, C.R., Mostafa, J.: Dynamicity vs. effectiveness: studying online clustering for scatter/gather. In: Proc. of SIGIR 2009, MA, USA, pp. 19–26 (2009)
Google Scholar
Krovetz, R., Croft, W.B.: Lexical ambiguity and Information Retrieval. ACM Transactions on Information Systems 10(2), 115–141 (1992)
Article Google Scholar
Kurland, O.: The opposite of smoothing: a language model approach to ranking query-specific document clusters. In: Proc. of SIGIR 2008, Singapore, pp. 171–178 (2008)
Google Scholar
Kurland, O., Domshlak, C.: A rank-aggregation approach to searching for optimal query-specific clusters. In: Proc. of SIGIR 2008, Singapore, pp. 547–554 (2008)
Google Scholar
Lee, K.S., Croft, W.B., Allan, J.: A cluster-based resampling method for pseudo-relevance feedback. In: Proc. of SIGIR 2008, Singapore, pp. 235–242 (2008)
Google Scholar
Lin, D.: Automatic retrieval and clustering of similar words. In: Proc. of the 17th COLING, Montreal, Canada, pp. 768–774 (1998)
Google Scholar
Liu, S., Yu, C., Meng, W.: Word Sense Disambiguation in queries. In: Proc. of CIKM 2005, Bremen, Germany, pp. 525–532 (2005)
Google Scholar
Mandala, R., Tokunaga, T., Tanaka, H.: The use of WordNet in Information Retrieval. In: Proc. of the COLING-ACL Workshop on Usage of Wordnet in Natural Language Processing, Montreal, Canada, pp. 31–37 (1998)
Google Scholar
Miller, G.A., Beckwith, R.T., Fellbaum, C.D., Gross, D., Miller, K.: WordNet: an online lexical database. International Journal of Lexicography 3(4), 235–244 (1990)
Article Google Scholar
Navigli, R.: Word Sense Disambiguation: a survey. ACM Computing Surveys 41(2), 1–69 (2009)
Article Google Scholar
Navigli, R., Crisafulli, G.: Inducing word senses to improve web search result clustering. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Boston, USA, pp. 116–126 (2010)
Google Scholar
Ngo, C.L., Nguyen, H.S.: A method of web search result clustering based on rough sets. In: Proc. of WI 2005, Compiègne, France, pp. 673–679 (2005)
Google Scholar
Nguyen, C.-T., Phan, X.-H., Horiguchi, S., Nguyen, T.-T., Ha, Q.-T.: Web search clustering and labeling with hidden topics. ACM Transactions on Asian Language Information Processing 8(3), 1–40 (2009)
Article Google Scholar
Osinski, S., Weiss, D.: A concept-driven algorithm for clustering search results. IEEE Intelligent Systems 20(3), 48–54 (2005)
Article Google Scholar
Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association 66(336), 846–850 (1971)
Article Google Scholar
van Rijsbergen, C.J.: Information Retrieval, 2nd edn. Butterworths (1979)
Google Scholar
Sanderson, M.: Word Sense Disambiguation and Information Retrieval. In: Proc. of SIGIR 1994, Dublin, Ireland, pp. 142–151 (1994)
Google Scholar
Sanderson, M.: Ambiguous queries: test collections need more sense. In: Proc. of SIGIR 2008, Singapore, pp. 499–506 (2008)
Google Scholar
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–124 (1998)
MathSciNet Google Scholar
Schütze, H., Pedersen, J.: Information Retrieval based on word senses. In: Proceedings of SDAIR 1995, Las Vegas, Nevada, USA, pp. 161–175 (1995)
Google Scholar
Stokoe, C., Oakes, M.J., Tait, J.I.: Word Sense Disambiguation in Information Retrieval revisited. In: Proc. of SIGIR 2003, Canada, pp. 159–166 (2003)
Google Scholar
Swaminathan, A., Mathew, C.V., Kirovski, D.: Essential pages. In: Proc. of WI 2009, Milan, Italy, pp. 173–182 (2009)
Google Scholar
Véronis, J.: HyperLex: lexical cartography for Information Retrieval. Computer Speech and Language 18(3), 223–252 (2004)
Article Google Scholar
Voorhees, E.M.: Using WordNet to disambiguate word senses for text retrieval. In: Proc. of SIGIR 1993, Pittsburgh, PA, USA, pp. 171–180 (1993)
Google Scholar
Widdows, D., Dorow, B.: A graph model for unsupervised lexical acquisition. In: Proc. of the 19th COLING, Taipei, Taiwan, pp. 1–7 (2002)
Google Scholar
Maarek, Y., Ron Fagin, I.B.S., Pelleg, D.: Ephemeral document clustering for web applications. IBM Research Report RJ 10186 (2000)
Google Scholar
Zamir, O., Etzioni, O.: Web document clustering: a feasibility demonstration. In: Proc. of SIGIR 1998, Melbourne, Australia, pp. 46–54 (1998)
Google Scholar
Zamir, O., Etzioni, O., Madani, O., Karp, R.M.: Fast and intuitive clustering of web documents. In: Proc. of KDD 1997, Newport Beach, California, pp. 287–290 (1997)
Google Scholar
Zhai, C., Cohen, W.W., Lafferty, J.: Beyond independent relevance: Methods and evaluation metrics for subtopic retrieval. In: Proc. of SIGIR 2003, Toronto, Canada, pp. 10–17 (2003)
Google Scholar
Zhang, B., Li, H., Liu, Y., Ji, L., Xi, W., Fan, W., Chen, Z., Ma, W.-Y.: Improving web search results using affinity graph. In: Proc. of SIGIR 2005, Salvador, Brazil, pp. 504–511 (2005)
Google Scholar
Zhang, X., Hu, X., Zhou, X.: A comparative evaluation of different link types on enhancing document clustering. In: Proc. of SIGIR 2008, Singapore, pp. 555–562 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Informatica, Sapienza Università di Roma, Via Salaria, 113, 00198, Roma, Italy
Antonio Di Marco & Roberto Navigli

Authors

Antonio Di Marco
View author publications
You can also search for this author in PubMed Google Scholar
Roberto Navigli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Chemical, Management, Computer, and Mechanical Engineering (DICGIM), University of Palermo, Viale delle Scienze, Edificio 6, 90128, Palermo, Italy
Roberto Pirrone & Filippo Sorbello &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Di Marco, A., Navigli, R. (2011). Clustering Web Search Results with Maximum Spanning Trees. In: Pirrone, R., Sorbello, F. (eds) AI*IA 2011: Artificial Intelligence Around Man and Beyond. AI*IA 2011. Lecture Notes in Computer Science(), vol 6934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23954-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-23954-0_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23953-3
Online ISBN: 978-3-642-23954-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics