Skip to main content

Two-Stage Learning to Rank for Information Retrieval

  • Conference paper
Advances in Information Retrieval (ECIR 2013)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7814))

Included in the following conference series:

Abstract

Current learning to rank approaches commonly focus on learning the best possible ranking function given a small fixed set of documents. This document set is often retrieved from the collection using a simple unsupervised bag-of-words method, e.g. BM25. This can potentially lead to learning a sub-optimal ranking, since many relevant documents may be excluded from the initially retrieved set. In this paper we propose a novel two-stage learning framework to address this problem. We first learn a ranking function over the entire retrieval collection using a limited set of textual features including weighted phrases, proximities and expansion terms. This function is then used to retrieve the best possible subset of documents over which the final model is trained using a larger set of query- and document-dependent features. Empirical evaluation using two web collections unequivocally demonstrates that our proposed two-stage framework, being able to learn its model from more relevant documents, outperforms current learning to rank approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Liu, T.Y.: Learning to rank for information retrieval. Foundations and Trends in Information Retrieval 3(3), 225–331 (2009)

    Article  Google Scholar 

  2. Metzler, D., Croft, W.B.: Linear feature-based models for information retrieval. Information Retrieval 10(3), 257–274 (2007)

    Article  Google Scholar 

  3. Liu, T.Y., Xu, J., Qin, T., Xiong, W., Li, H.: LETOR: Benchmark Dataset for Research on Learning to Rank for Information Retrieval. In: SIGIR (2007)

    Google Scholar 

  4. Bendersky, M., Metzler, D., Croft, W.B.: Effective query formulation with multiple information sources. In: WSDM, pp. 443–452 (2012)

    Google Scholar 

  5. Bendersky, M., Metzler, D., Croft, W.B.: Learning concept importance using a weighted dependence model. In: WSDM, pp. 31–40 (2010)

    Google Scholar 

  6. Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: SIGIR, pp. 472–479 (2005)

    Google Scholar 

  7. Peng, J., Macdonald, C., He, B., Plachouras, V., Ounis, I.: Incorporating term dependency in the DFR framework. In: SIGIR, pp. 843–844 (2007)

    Google Scholar 

  8. Lu, Y., Peng, F., Mishne, G., Wei, X., Dumoulin, B.: Improving Web search relevance with semantic features. In: EMNLP, pp. 648–657 (2009)

    Google Scholar 

  9. Zhu, M., Shi, S., Li, M., Wen, J.R.: Effective top-k computation in retrieving structured documents with term-proximity support. In: CIKM, pp. 771–780 (2007)

    Google Scholar 

  10. Tonellotto, N., Macdonald, C., Ounis, I.: Efficient dynamic pruning with proximity support. In: LSDS-IR (2010)

    Google Scholar 

  11. Burges, C.J.C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.N.: Learning to rank using gradient descent. In: ICML, pp. 89–96 (2005)

    Google Scholar 

  12. Burges, C.J.C., Ragno, R., Le, Q.V.: Learning to Rank with Nonsmooth Cost Functions. In: NIPS, pp. 193–200 (2006)

    Google Scholar 

  13. Macdonald, C., Santos, R., Ounis, I.: The whens and hows of learning to rank for web search. Information Retrieval, 1–45 (2012)

    Google Scholar 

  14. McCreadie, R., Macdonald, C., Santos, R.L.T., Ounis, I.: University of Glasgow at TREC 2011: Experiments with Terrier in Crowdsourcing, Microblog, and Web Tracks. In: TREC (2011)

    Google Scholar 

  15. Bendersky, M., Croft, W.B., Diao, Y.: Quality-biased ranking of web documents. In: WSDM, pp. 95–104 (2011)

    Google Scholar 

  16. Friedman, J.H.: Greedy function approximation: A gradient boosting machine. Annals of Statistics 29, 1189–1232 (1999)

    Article  Google Scholar 

  17. Freund, Y., Iyer, R., Schapire, R., Singer, Y.: An efficient boosting algorithm for combining preferences. The Journal of Machine Learning Research 4, 933–969 (2003)

    MathSciNet  Google Scholar 

  18. Wu, Q., Burges, C.J.C., Gao, K.S., Adapting, J.: boosting for information retrieval measures. Information Retrieval 13(3), 254–270 (2010)

    Article  Google Scholar 

  19. Chapelle, O., Y.C.: Yahoo! learning to rank challenge overview. Machine Learning. Machine Learning Research - Proceedings Track 14, 1–24 (2011)

    Google Scholar 

  20. Donmez, P., Svore, K.M., Burges, C.J.C.: On the local optimality of LambdaRank. In: SIGIR, pp. 460–467 (2009)

    Google Scholar 

  21. Metzler, D., Croft, W.B.: Latent concept expansion using markov random fields. In: Proceedings of the Annual ACM SIGIR Conference, pp. 311–318 (2007)

    Google Scholar 

  22. Aslam, J.A., Kanoulas, E., Pavlu, V., Savev, S., Yilmaz, E.: Document selection methodologies for efficient and effective learning-to-rank. In: SIGIR, pp. 468–475 (2009)

    Google Scholar 

  23. Donmez, P., Carbonell, J.G.: Active Sampling for Rank Learning via Optimizing the Area under the ROC Curve. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 78–89. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  24. Yilmaz, E., Robertson, S.: On the choice of effectiveness measures for learning to rank. Information Retrieval 13, 271–290 (2010)

    Article  Google Scholar 

  25. Boytsov, L., Belova, A.: Evaluating learning-to-rank methods in the web track adhoc task. In: TREC (2011)

    Google Scholar 

  26. Bendersky, M., Metzler, D., Croft, W.B.: Parameterized concept weighting in verbose queries. In: SIGIR, pp. 605–614 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dang, V., Bendersky, M., Croft, W.B. (2013). Two-Stage Learning to Rank for Information Retrieval. In: Serdyukov, P., et al. Advances in Information Retrieval. ECIR 2013. Lecture Notes in Computer Science, vol 7814. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36973-5_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36973-5_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36972-8

  • Online ISBN: 978-3-642-36973-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics