Skip to main content

Marrying Relevance and Genre Rankings: An Exploratory Study

  • Chapter
  • First Online:
Genres on the Web

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 42))

  • 1097 Accesses

Abstract

In this chapter, we discuss different options for using genre-related information in Web search. We conduct an experiment on merging genre-related and text-relevance rankings using a reference Web collection. A method for automatic extraction of formality score akin to readability score using canonical discriminant analysis applied to a sample of genres with decreasing formality is proposed. Effects of aggregating genre-related and text relevance rankings are considered. Evaluation of the results shows moderate positive effects. Findings suggest that further research is needed on implicit use of genre-related information in Web search.

This paper expands the short paper presented at the workshop “Towards Genre-Enabled Search Engines: The Impact of NLP” [9].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    WEGA, a Firefox plug-in (see [31], Chapter 8 by Stein et al., this volume), exemplifies this approach.

  2. 2.

    http://citeseer.ist.psu.edu, http://scholar.google.com

  3. 3.

    http://technorati.com, http://blog.yandex.ru

  4. 4.

    http://news.google.com, http://news.yandex.ru

  5. 5.

    http://shopping.yahoo.com, http://www.pricegrabber.com

References

  1. Abdul-Jaleel, N., J. Allan, W.B. Croft, F. Diaz, L. Larkey, X. Li, M.D. Smucker, and C. Wade. 2005. UMass at TREC 2004: Novelty and HARD. In Proceedings of TREC 2004.

    Google Scholar 

  2. Ageev, M., I. Vershinnikov, and B. Dobrov. 2005. Extraction of the significant part of web pages for information retrieval (in Russian) [Izvlečenie značimoi informacii iz web-stranic dlja zada informacionnogo poiska]. In Internet-Matematika, 283–301. Available online: http://company.yandex.ru/grant/2005/07_Ageev_102942.pdf

  3. Allan, J. 2004. HARD track overview in TREC 2003: High accuracy retrieval from documents. In Proceedings of TREC-2003, 24–37.

    Google Scholar 

  4. Allan, J. 2005. HARD track overview in TREC 2004: High accuracy retrieval from documents. In Proceedings of TREC-2004, 25–35.

    Google Scholar 

  5. Beitzel, S.M., E.C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. 2004. Fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology (JASIST) 55(10):859–868.

    Article  Google Scholar 

  6. Belkin, N., I. Chaleva, M. Cole, Y.-L. Li, L. Liu, Y.-H. Liu, G. Muresan, C. Smith, Y. Sun, X.-J. Yuan, and X.-M. Zhang. 2005. Rutgers’ HARD track experiences at TREC 2004. In: Proceedings of TREC-2004.

    Google Scholar 

  7. Braslavski, P. 2004. Document style recognition using shallow statistical analysis. In Proceedings of the ESSLLI 2004 Workshop on Combining Shallow and Deep Processing for NLP, 1–9. Nancy. Available online: http://esslli2004.loria.fr/content/readers/36.pdf

  8. Braslavski, P., and A. Tselishchev. 2005. Style-dependent document ranking. In: Proceedings of the 7th Russian Conference on Digital Libraries (RCDL’2005), 159–164. Available online: http://www.rcdl2005.uniyar.ac.ru/ru/RCDL2005/papers/sek7_1_paper.pdf

  9. Braslavski, P. 2007. Combining relevance and genre-related rankings: An Exploratory Study. In Proceedings of the International Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 1–4, Borovets, Bulgaria. Available online: http://kansas.ru/pb/ paper/ranlp2007.pdf

  10. Collins-Thompson, K., and J.P. Callan. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL, 193–200.

    Google Scholar 

  11. DuBay, W.H. 2004. The principles of readability. Available nline: http://www.nald.ca/fulltext/ readab/readab.pdf

  12. Gulin, A., M. Maslov, and I. Segalovich. 2006. Yandex’ algorithm for text relevance ranking at ROMIP’2006 (in Russian) [Algoritm tekstovogo ranˇzirovanija Jandeksa na ROMIP’2006]. In Proceedings of ROMIP’2006, 40–51. Suzdal. Available online: http://www.romip.ru/romip2006/03_yandex.pdf

    Google Scholar 

  13. Gupta, S., G. Kaiser, S. Stolfo, and H. Becker. 2005. Genre classification of websites using search engine snippets. In Proceedings of SIGIR’2005 Workshop “Stylistic Analysis of Text For Information Access”. Salvador, Bahia.

    Google Scholar 

  14. Karlgren, J., and D. Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proceedings of the 15th Conference on Computational Linguistics, 1071–1075.

    Google Scholar 

  15. Kožina, M.N. 1968. Foundations of the functional stylistics (in Russian) [K osnovaniyam funkcional’noi stilistiki], Perm.

    Google Scholar 

  16. Kumaran, G., R. Jones, and Madani, O. 2005. Biasing web search results for topic familiarity. In Proceedings of CIKM’05, 271–272.

    Google Scholar 

  17. Lim, C.S., K.J. Lee, and G.C. Kim. 2005. Multiple sets of features for automatic genre classification of web documents. Information Processing and Management 41:1263–1276.

    Article  Google Scholar 

  18. Liu, X., W.B. Croft, P. Oh, and D. Hart. 2004. Automatic recognition of reading levels from user queries. In Proceedings of SIGIR’2004, 548–549.

    Google Scholar 

  19. Meyer zu Eissen, S., and B. Stein. 2004. Genre classification of web pages. In Proceedings of the 27th German Conference on Artificial Intelligence (KI-2004), 256–269. Ulm.

    Google Scholar 

  20. Michos, S., E. Stamatatos, N. Fakotakis, G. Kokkinakis. 1996. Categorizing texts by using a three level functional style description. In Artificial intelligence: Methodology, systems, applications, frontiers in artificial intelligence and applications, ed. A.M. Rasmsay, vol. 35. Available online: http://slt.wcl.ee.upatras.gr/papers/michos2.pdf

  21. Mystem Tool. http://company.yandex.ru/technology/mystem/

  22. Rauber, A., and A. Müller-Kögler. 2001. Integrating automatic genre analysis into digital libraries. In Proceedings of the JCDL’2001, 1–10.

    Google Scholar 

  23. Richardson, M., A. Prakash, and E. Brill. 2006. Beyond PageRank: Machine learning for static ranking. In Proceedings of WWW’2006, 707–715.

    Google Scholar 

  24. Rosso, M.A. 2005. Using genre to improve web search. PhD thesis, University of North Carolina, Chapel Hill, NC.

    Google Scholar 

  25. Russian Information Retrieval Evaluation Seminar (ROMIP). http://romip.ru

  26. Santini, M. 2004. State-of-the-art on automatic genre identification. Technical Report ITRI-04-03, Information Technology Research Institute, University of Brighton, Brighton. Available online: ftp://ftp.itri.bton.ac.uk/reports/ITRI-04-03.pdf

  27. Santini, M. 2007. Automatic identification of genre in web pages. PhD thesis, University of Brighton, Brighton.

    Google Scholar 

  28. Si, L., and J. Callan. 2001. A statistical model for scientific readability. In Proceedings of CIKM’2001, 574–576.

    Google Scholar 

  29. Strzalkowski, T., L. Guthrie, J. Karlgren, J. Leistensnider, F. Lin, J. Perez-Carballo, T. Straszheim, J. Wang, and J. Wilding. 1996. Natural language information retrieval: TREC-5 Report. In Proceedings of TREC’1995.

    Google Scholar 

  30. Stubbe, A., C. Ringlstetter, and R. Goebel, R. 2007. Elements of a learning interface for genre qualified search. In Proceedings of the International Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 21–28. Borovets, Bulgaria.

    Google Scholar 

  31. WEGA: Web Genre Analysis Project. http://www.uni-weimar.de/cms/medien/webis/research/ projects/wega.html

Download references

Acknowledgments

We would like to thank Mikhail Ageev and Andrei Tselishchev for their help with data processing. We also thank Yandex (www.yandex.ru) for providing us with the experimental data. Many thanks to Matthew McCool and volume editors for their valuable comments on the draft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pavel Braslavski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer Science+Business Media B.V.

About this chapter

Cite this chapter

Braslavski, P. (2010). Marrying Relevance and Genre Rankings: An Exploratory Study. In: Mehler, A., Sharoff, S., Santini, M. (eds) Genres on the Web. Text, Speech and Language Technology, vol 42. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9178-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-90-481-9178-9_9

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-90-481-9177-2

  • Online ISBN: 978-90-481-9178-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics