Marrying Relevance and Genre Rankings: An Exploratory Study

Braslavski, Pavel

doi:10.1007/978-90-481-9178-9_9

Pavel Braslavski⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 42))

1097 Accesses

Abstract

In this chapter, we discuss different options for using genre-related information in Web search. We conduct an experiment on merging genre-related and text-relevance rankings using a reference Web collection. A method for automatic extraction of formality score akin to readability score using canonical discriminant analysis applied to a sample of genres with decreasing formality is proposed. Effects of aggregating genre-related and text relevance rankings are considered. Evaluation of the results shows moderate positive effects. Findings suggest that further research is needed on implicit use of genre-related information in Web search.

This paper expands the short paper presented at the workshop “Towards Genre-Enabled Search Engines: The Impact of NLP” [9].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
WEGA, a Firefox plug-in (see [31], Chapter 8 by Stein et al., this volume), exemplifies this approach.
2.
http://citeseer.ist.psu.edu, http://scholar.google.com
3.
http://technorati.com, http://blog.yandex.ru
4.
http://news.google.com, http://news.yandex.ru
5.
http://shopping.yahoo.com, http://www.pricegrabber.com

References

Abdul-Jaleel, N., J. Allan, W.B. Croft, F. Diaz, L. Larkey, X. Li, M.D. Smucker, and C. Wade. 2005. UMass at TREC 2004: Novelty and HARD. In Proceedings of TREC 2004.
Google Scholar
Ageev, M., I. Vershinnikov, and B. Dobrov. 2005. Extraction of the significant part of web pages for information retrieval (in Russian) [Izvlečenie značimoi informacii iz web-stranic dlja zada informacionnogo poiska]. In Internet-Matematika, 283–301. Available online: http://company.yandex.ru/grant/2005/07_Ageev_102942.pdf
Allan, J. 2004. HARD track overview in TREC 2003: High accuracy retrieval from documents. In Proceedings of TREC-2003, 24–37.
Google Scholar
Allan, J. 2005. HARD track overview in TREC 2004: High accuracy retrieval from documents. In Proceedings of TREC-2004, 25–35.
Google Scholar
Beitzel, S.M., E.C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. 2004. Fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology (JASIST) 55(10):859–868.
Article Google Scholar
Belkin, N., I. Chaleva, M. Cole, Y.-L. Li, L. Liu, Y.-H. Liu, G. Muresan, C. Smith, Y. Sun, X.-J. Yuan, and X.-M. Zhang. 2005. Rutgers’ HARD track experiences at TREC 2004. In: Proceedings of TREC-2004.
Google Scholar
Braslavski, P. 2004. Document style recognition using shallow statistical analysis. In Proceedings of the ESSLLI 2004 Workshop on Combining Shallow and Deep Processing for NLP, 1–9. Nancy. Available online: http://esslli2004.loria.fr/content/readers/36.pdf
Braslavski, P., and A. Tselishchev. 2005. Style-dependent document ranking. In: Proceedings of the 7th Russian Conference on Digital Libraries (RCDL’2005), 159–164. Available online: http://www.rcdl2005.uniyar.ac.ru/ru/RCDL2005/papers/sek7_1_paper.pdf
Braslavski, P. 2007. Combining relevance and genre-related rankings: An Exploratory Study. In Proceedings of the International Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 1–4, Borovets, Bulgaria. Available online: http://kansas.ru/pb/ paper/ranlp2007.pdf
Collins-Thompson, K., and J.P. Callan. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL, 193–200.
Google Scholar
DuBay, W.H. 2004. The principles of readability. Available nline: http://www.nald.ca/fulltext/ readab/readab.pdf
Gulin, A., M. Maslov, and I. Segalovich. 2006. Yandex’ algorithm for text relevance ranking at ROMIP’2006 (in Russian) [Algoritm tekstovogo ranˇzirovanija Jandeksa na ROMIP’2006]. In Proceedings of ROMIP’2006, 40–51. Suzdal. Available online: http://www.romip.ru/romip2006/03_yandex.pdf
Google Scholar
Gupta, S., G. Kaiser, S. Stolfo, and H. Becker. 2005. Genre classification of websites using search engine snippets. In Proceedings of SIGIR’2005 Workshop “Stylistic Analysis of Text For Information Access”. Salvador, Bahia.
Google Scholar
Karlgren, J., and D. Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proceedings of the 15th Conference on Computational Linguistics, 1071–1075.
Google Scholar
Kožina, M.N. 1968. Foundations of the functional stylistics (in Russian) [K osnovaniyam funkcional’noi stilistiki], Perm.
Google Scholar
Kumaran, G., R. Jones, and Madani, O. 2005. Biasing web search results for topic familiarity. In Proceedings of CIKM’05, 271–272.
Google Scholar
Lim, C.S., K.J. Lee, and G.C. Kim. 2005. Multiple sets of features for automatic genre classification of web documents. Information Processing and Management 41:1263–1276.
Article Google Scholar
Liu, X., W.B. Croft, P. Oh, and D. Hart. 2004. Automatic recognition of reading levels from user queries. In Proceedings of SIGIR’2004, 548–549.
Google Scholar
Meyer zu Eissen, S., and B. Stein. 2004. Genre classification of web pages. In Proceedings of the 27th German Conference on Artificial Intelligence (KI-2004), 256–269. Ulm.
Google Scholar
Michos, S., E. Stamatatos, N. Fakotakis, G. Kokkinakis. 1996. Categorizing texts by using a three level functional style description. In Artificial intelligence: Methodology, systems, applications, frontiers in artificial intelligence and applications, ed. A.M. Rasmsay, vol. 35. Available online: http://slt.wcl.ee.upatras.gr/papers/michos2.pdf
Mystem Tool. http://company.yandex.ru/technology/mystem/
Rauber, A., and A. Müller-Kögler. 2001. Integrating automatic genre analysis into digital libraries. In Proceedings of the JCDL’2001, 1–10.
Google Scholar
Richardson, M., A. Prakash, and E. Brill. 2006. Beyond PageRank: Machine learning for static ranking. In Proceedings of WWW’2006, 707–715.
Google Scholar
Rosso, M.A. 2005. Using genre to improve web search. PhD thesis, University of North Carolina, Chapel Hill, NC.
Google Scholar
Russian Information Retrieval Evaluation Seminar (ROMIP). http://romip.ru
Santini, M. 2004. State-of-the-art on automatic genre identification. Technical Report ITRI-04-03, Information Technology Research Institute, University of Brighton, Brighton. Available online: ftp://ftp.itri.bton.ac.uk/reports/ITRI-04-03.pdf
Santini, M. 2007. Automatic identification of genre in web pages. PhD thesis, University of Brighton, Brighton.
Google Scholar
Si, L., and J. Callan. 2001. A statistical model for scientific readability. In Proceedings of CIKM’2001, 574–576.
Google Scholar
Strzalkowski, T., L. Guthrie, J. Karlgren, J. Leistensnider, F. Lin, J. Perez-Carballo, T. Straszheim, J. Wang, and J. Wilding. 1996. Natural language information retrieval: TREC-5 Report. In Proceedings of TREC’1995.
Google Scholar
Stubbe, A., C. Ringlstetter, and R. Goebel, R. 2007. Elements of a learning interface for genre qualified search. In Proceedings of the International Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 21–28. Borovets, Bulgaria.
Google Scholar
WEGA: Web Genre Analysis Project. http://www.uni-weimar.de/cms/medien/webis/research/ projects/wega.html

Download references

Acknowledgments

We would like to thank Mikhail Ageev and Andrei Tselishchev for their help with data processing. We also thank Yandex (www.yandex.ru) for providing us with the experimental data. Many thanks to Matthew McCool and volume editors for their valuable comments on the draft.

Author information

Authors and Affiliations

Institute of Engineering Science RAS, 620219, Ekaterinburg, Russia
Pavel Braslavski

Authors

Pavel Braslavski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pavel Braslavski .

Editor information

Editors and Affiliations

, Text Technology/Applied Comp. Ling., Bielefeld University, Universitätsstrasse 25, Bielefeld, 33615, Germany
Alexander Mehler
LS2 9JT Leeds, United Kingdom
Serge Sharoff
Varvsgatan 25, Stockholm, 117 29, Sweden
Marina Santini

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Braslavski, P. (2010). Marrying Relevance and Genre Rankings: An Exploratory Study. In: Mehler, A., Sharoff, S., Santini, M. (eds) Genres on the Web. Text, Speech and Language Technology, vol 42. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9178-9_9

Download citation

DOI: https://doi.org/10.1007/978-90-481-9178-9_9
Published: 16 August 2010
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9177-2
Online ISBN: 978-90-481-9178-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics