Skip to main content
Log in

Deeply supervised model for click-through rate prediction in sponsored search

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

In sponsored search it is critical to match ads that are relevant to a query and to accurately predict their likelihood of being clicked. Commercial search engines typically use machine learning models for both query-ad relevance matching and click-through-rate (CTR) prediction. However, matching models are based on the similarity between a query and an ad, ignoring the fact that a retrieved ad may not attract clicks, while click models rely on click history, limiting their use for new queries and ads. We propose a deeply supervised architecture that jointly learns the semantic embeddings of a query and an ad as well as their corresponding CTR. We also propose a novel cohort negative sampling technique for learning implicit negative signals. We trained the proposed architecture using one billion query-ad pairs from a major commercial web search engine. This architecture improves the best-performing baseline deep neural architectures by 2% of AUC for CTR prediction and by statistically significant 0.5% of NDCG for query-ad matching.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. We use word cohort to disambiguate our sampling strategy from the traditional mini-batch i.i.d. sampling.

  2. https://github.com/yahoo/TensorFlowOnSpark.

References

  • Aiello L, Arapakis I, Baeza-Yates R, Bai X, Barbieri N, Mantrach A, Silvestri F (2016) The role of relevance in sponsored search. In: 25th ACM international conference on information and knowledge management. ACM, pp 185–194

  • Baeza-Yates RA, Ribeiro-Neto B (1999) Modern information retrieval. Addison-Wesley Longman Publishing Co. Inc, Boston

    Google Scholar 

  • Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: International conference on learning representations

  • Bhamidipati N, Kant R, Mishra S (2017) A large scale prediction engine for app install clicks and conversions. In: Conference on information and knowledge management. ACM, pp 167–175

  • Cheng H, Cantú-Paz E (2010) Personalized click prediction in sponsored search. In: 3rd ACM international conference on web search and data mining. ACM, pp 351–360

  • Chen Y, Yan TW (2012) Position-normalized click prediction in search advertising. In: 18th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 795–803

  • Chen T, Sun Y, Shi Y, Hong L (2017) On sampling strategies for neural network-based collaborative filtering. In: 23rd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 767–776

  • Conneau A, Schwenk H, Barrault L, Lecun Y (2017) Very deep convolutional networks for text classification. In: 15th Conference of the European chapter of the association for computational linguistics, pp 1107–1116

  • Edizel B, Mantrach A, Bai X (2017) Deep character-level click-through rate prediction for sponsored search. In: 40th ACM SIGIR international conference on research and development in information retrieval, pp 305–314

  • Fuxman A, Tsaparas P, Achan K, Agrawal R (2008) Using the wisdom of the crowds for keyword generation. In: 17th international conference on world wide web. ACM, pp 61–70

  • Gligorijevic D, Gligorijevic J, Raghuveer A, Grbovic M, Obradovic Z (2018a) Modeling mobile user actions for purchase recommendation using deep memory networks. In: The 41st international ACM SIGIR conference on research and development in information retrieval, pp 1021–1024

  • Gligorijevic D, Stojanovic J, Satz W, Stojkovic I, Schreyer K, Del Portal D, Obradovic Z (2018b) Deep attention model for triage of emergency department patients. In: SIAM international conference on data mining, pp 297–305

  • Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. In: 27th international conference on machine learning, pp 13–20

  • Grbovic M, Djuric N, Radosavljevic V, Silvestri F, Bhamidipati N (2015) Context- and content-aware embeddings for query rewriting in sponsored search. In: International ACM SIGIR conference on research and development in information retrieval, pp 383–392

  • Grbovic M, Djuric N, Radosavljevic V, Silvestri F, Baeza-Yates R, Feng A, Ordentlich E, Yang L, Owens L (2016) Scalable semantic matching of search queries to ads in sponsored search advertising. In: international ACM SIGIR conference on research and development in information retrieval, pp 375–384

  • Greff K, Srivastava RK, Koutník J, Steunebrink BR, Schmidhuber J (2017) LSTM: a search space odyssey. IEEE Trans Neural Netw Learn Syst 28(10):2222–2232

    Article  MathSciNet  Google Scholar 

  • Guo J, Fan Y, Ai Q, Croft WB (2016) A deep relevance matching model for ad-hoc retrieval. In: 25th ACM international conference on information and knowledge management. ACM, pp 55–64

  • He X, Pan J, Jin O, Xu T, Liu B, Xu T, Shi Y, Atallah A, Herbrich R, Bowers S, et al (2014) Practical lessons from predicting clicks on ads at Facebook. In: 8th international workshop on data mining for online advertising. ACM, pp 1–9

  • Huang PS, He X, Gao J, Deng L, Acero A, Heck L (2013) Learning deep structured semantic models for web search using clickthrough data. In: 22nd ACM international conference on information and knowledge management. ACM, pp 2333–2338

  • Jaech A, Kamisetty H, Ringger E, Clarke C (2017) Match-tensor: a deep relevance model for search. arXiv preprint arXiv:1701.07795

  • Jiang Z (2016) Research on CTR prediction for contextual advertising based on deep architecture model. J Control Eng Appl Inform 18(1):11–19

    Google Scholar 

  • Jones R, Rey B, Madani O, Greiner W (2006) Generating query substitutions. In: 15th international conference on world wide web. ACM, pp 387–396

  • Kingma D, Ba J (2015) Adam: a method for stochastic optimization. In: 3rd international conference on learning representations

  • Lee CY, Xie S, Gallagher P, Zhang Z, Tu Z (2015) Deeply-supervised nets. In: Artificial intelligence and statistics, pp 562–570

  • Li H, Xu J et al (2014) Semantic matching in search. Found Trends Inf Retr 7(5):343–469

    Article  MathSciNet  Google Scholar 

  • Liu P, Qiu X, Huang X (2016) Deep multi-task learning with shared memory. In: Conference on empirical methods in natural language processing, pp 118–127

  • McMahan HB, Holt G, Sculley D, Young M, Ebner D, Grady J, Nie L, Phillips T, Davydov E, Golovin D, et al (2013) Ad click prediction: a view from the trenches. In: 19th ACM SIGKDD international conference on knowledge discovery and data mining

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013a) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26:3111–3119

    Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013b) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781

  • Mitra B, Diaz F, Craswell N (2017) Learning to match using local and distributed representations of text for web search. In: 26th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 1291–1299

  • Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing, pp 1532–1543

  • Richardson M, Dominowska E, Ragno R (2007) Predicting clicks: estimating the click-through rate for new ads. In: 16th international conference on world wide web. ACM, pp 521–530

  • Robertson SE, Walker S (1994) Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In: International ACM SIGIR conference on research and development in information retrieval. Springer, New York, pp 232–241

  • Rumelhart DE, Hinton GE, Williams RJ et al (1988) Learning representations by back-propagating errors. Cognit Model 5(3):1

    MATH  Google Scholar 

  • Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681

    Article  Google Scholar 

  • Shan Y, Hoens TR, Jiao J, Wang H, Yu D, Mao J (2016) Deep crossing: web-scale modeling without manually crafted combinatorial features. In: 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 255–262

  • Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A, et al (2015) Going deeper with convolutions. In: International conference on learning representations, pp 1–9

  • Wang Y, Wang L, Li Y, He D, Chen W, Liu TY (2013) A theoretical analysis of NDCG ranking measures, vol. 8. In: 26th annual conference on learning theory

  • Yan S, Lin W, Wu T, Xiao D, Zheng X, Wu B, Liu K (2018) Beyond keywords and relevance: a personalized ad retrieval framework in e-commerce sponsored search. In: 27th international conference on world wide web, pp 1919–1928

  • Zhai S, Chang Kh, Zhang R, Zhang ZM (2016) Deepintent: learning attentions for online advertising with recurrent neural networks. In: 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1295–1304

  • Zhang Y, Dai H, Xu C, Feng J, Wang T, Bian J, Wang B, Liu TY (2014) Sequential click prediction for sponsored search with recurrent neural networks. In: AAAI conference on artificial intelligence, pp 1369–1375

  • Zhang X, Zhao J, LeCun Y (2015) Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp 649–657

  • Zhang Y, Lee K, Lee H (2016) Augmenting supervised neural networks with unsupervised objectives for large-scale image classification. In: International conference on machine learning, pp 612–621

  • Zheng Z, Zha H, Zhang T, Chapelle O, Chen K, Sun G (2008) A general boosting method and its application to learning ranking functions for web search. In: Advances in neural information processing systems, pp 1697–1704

Download references

Acknowledgements

The authors gratefully thank to Lee Yang for his invaluable help in deploying our models on distributed GPU clusters, as well as Aleksandar Obradovic and Stefan Obradovic for proofreading and editing the language of the manuscript. The authors would like to thank the anonymous referees for their valuable comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zoran Obradovic.

Additional information

Responsible editor: Po-ling Loh, Evimaria Terzi, Antti Ukkonen, Karsten Borgwardt.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Amit Goyal: The work was done when the author was with Yahoo Research.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gligorijevic, J., Gligorijevic, D., Stojkovic, I. et al. Deeply supervised model for click-through rate prediction in sponsored search. Data Min Knowl Disc 33, 1446–1467 (2019). https://doi.org/10.1007/s10618-019-00625-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-019-00625-3

Keywords

Navigation