Two-Stage Learning to Rank for Information Retrieval

Dang, Van; Bendersky, Michael; Croft, W. Bruce

doi:10.1007/978-3-642-36973-5_36

Van Dang²³,
Michael Bendersky²³ &
W. Bruce Croft²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Included in the following conference series:

European Conference on Information Retrieval

3185 Accesses
32 Citations

Abstract

Current learning to rank approaches commonly focus on learning the best possible ranking function given a small fixed set of documents. This document set is often retrieved from the collection using a simple unsupervised bag-of-words method, e.g. BM25. This can potentially lead to learning a sub-optimal ranking, since many relevant documents may be excluded from the initially retrieved set. In this paper we propose a novel two-stage learning framework to address this problem. We first learn a ranking function over the entire retrieval collection using a limited set of textual features including weighted phrases, proximities and expansion terms. This function is then used to retrieve the best possible subset of documents over which the final model is trained using a larger set of query- and document-dependent features. Empirical evaluation using two web collections unequivocally demonstrates that our proposed two-stage framework, being able to learn its model from more relevant documents, outperforms current learning to rank approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Liu, T.Y.: Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3(3), 225–331 (2009)
Article Google Scholar
Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Information Retrieval 10(3), 257–274 (2007)
Article Google Scholar
Liu, T.Y., Xu, J., Qin, T., Xiong, W., Li, H.: LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval. In: SIGIR (2007)
Google Scholar
Bendersky, M., Metzler, D., Croft, W.B.: Effective query formulation with multiple information sources. In: WSDM, pp. 443–452 (2012)
Google Scholar
Bendersky, M., Metzler, D., Croft, W.B.: Learning concept importance using a weighted dependence model. In: WSDM, pp. 31–40 (2010)
Google Scholar
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: SIGIR, pp. 472–479 (2005)
Google Scholar
Peng, J., Macdonald, C., He, B., Plachouras, V., Ounis, I.: Incorporating term dependency in the DFR framework. In: SIGIR, pp. 843–844 (2007)
Google Scholar
Lu, Y., Peng, F., Mishne, G., Wei, X., Dumoulin, B.: Improving Web search relevance with semantic features. In: EMNLP, pp. 648–657 (2009)
Google Scholar
Zhu, M., Shi, S., Li, M., Wen, J.R.: Effective top-k computation in retrieving structured documents with term-proximity support. In: CIKM, pp. 771–780 (2007)
Google Scholar
Tonellotto, N., Macdonald, C., Ounis, I.: Efficient dynamic pruning with proximity support. In: LSDS-IR (2010)
Google Scholar
Burges, C.J.C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.N.: Learning to rank using gradient descent. In: ICML, pp. 89–96 (2005)
Google Scholar
Burges, C.J.C., Ragno, R., Le, Q.V.: Learning to Rank with Nonsmooth Cost Functions. In: NIPS, pp. 193–200 (2006)
Google Scholar
Macdonald, C., Santos, R., Ounis, I.: The whens and hows of learning to rank for web search. Information Retrieval, 1–45 (2012)
Google Scholar
McCreadie, R., Macdonald, C., Santos, R.L.T., Ounis, I.: University of Glasgow at TREC 2011: Experiments with Terrier in Crowdsourcing, Microblog, and Web Tracks. In: TREC (2011)
Google Scholar
Bendersky, M., Croft, W.B., Diao, Y.: Quality-biased ranking of web documents. In: WSDM, pp. 95–104 (2011)
Google Scholar
Friedman, J.H.: Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 1189–1232 (1999)
Article Google Scholar
Freund, Y., Iyer, R., Schapire, R., Singer, Y.: An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research 4, 933–969 (2003)
MathSciNet Google Scholar
Wu, Q., Burges, C.J.C., Gao, K.S., Adapting, J.: boosting for information retrieval measures. Information Retrieval 13(3), 254–270 (2010)
Article Google Scholar
Chapelle, O., Y.C.: Yahoo! learning to rank challenge overview. Machine Learning. Machine Learning Research - Proceedings Track 14, 1–24 (2011)
Google Scholar
Donmez, P., Svore, K.M., Burges, C.J.C.: On the local optimality of LambdaRank. In: SIGIR, pp. 460–467 (2009)
Google Scholar
Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proceedings of the Annual ACM SIGIR Conference, pp. 311–318 (2007)
Google Scholar
Aslam, J.A., Kanoulas, E., Pavlu, V., Savev, S., Yilmaz, E.: Document selection methodologies for efficient and effective learning-to-rank. In: SIGIR, pp. 468–475 (2009)
Google Scholar
Donmez, P., Carbonell, J.G.: Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 78–89. Springer, Heidelberg (2009)
Chapter Google Scholar
Yilmaz, E., Robertson, S.: On the choice of effectiveness measures for learning to rank. Information Retrieval 13, 271–290 (2010)
Article Google Scholar
Boytsov, L., Belova, A.: Evaluating learning-to-rank methods in the web track adhoc task. In: TREC (2011)
Google Scholar
Bendersky, M., Metzler, D., Croft, W.B.: Parameterized concept weighting in verbose queries. In: SIGIR, pp. 605–614 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts Amherst, USA
Van Dang, Michael Bendersky & W. Bruce Croft

Authors

Van Dang
View author publications
You can also search for this author in PubMed Google Scholar
Michael Bendersky
View author publications
You can also search for this author in PubMed Google Scholar
W. Bruce Croft
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Yandex, Leo Tolstoy, 16, 119021, Moscow, Russia
Pavel Serdyukov & Ilya Segalovich &
Kontur Labs and Ural Federal University, Fonvizina 3-27, 620078, Yekaterinburg, Russia
Pavel Braslavski
National Research University Higher School of Economics (HSE), Pokrovskii bd 11, 109028, Moscow, Russia
Sergei O. Kuznetsov
University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Knowledge Media Institute, The Open University, Walton Hall, MK7 6AA, Milton Keynes, UK
Stefan Rüger
Mathematics & Computer Science Department, Emory University, 400 dowman Drive, 30329, Atlanta, GA, USA
Eugene Agichtein
Department of Computer Science, University College London, Gower Street, WC1E 6BT, London, UK
Emine Yilmaz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dang, V., Bendersky, M., Croft, W.B. (2013). Two-Stage Learning to Rank for Information Retrieval. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_36

Download citation

DOI: https://doi.org/10.1007/978-3-642-36973-5_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36972-8
Online ISBN: 978-3-642-36973-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics