Abstract
In this chapter, we discuss different options for using genre-related information in Web search. We conduct an experiment on merging genre-related and text-relevance rankings using a reference Web collection. A method for automatic extraction of formality score akin to readability score using canonical discriminant analysis applied to a sample of genres with decreasing formality is proposed. Effects of aggregating genre-related and text relevance rankings are considered. Evaluation of the results shows moderate positive effects. Findings suggest that further research is needed on implicit use of genre-related information in Web search.
This paper expands the short paper presented at the workshop “Towards Genre-Enabled Search Engines: The Impact of NLP” [9].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
References
Abdul-Jaleel, N., J. Allan, W.B. Croft, F. Diaz, L. Larkey, X. Li, M.D. Smucker, and C. Wade. 2005. UMass at TREC 2004: Novelty and HARD. In Proceedings of TREC 2004.
Ageev, M., I. Vershinnikov, and B. Dobrov. 2005. Extraction of the significant part of web pages for information retrieval (in Russian) [Izvlečenie značimoi informacii iz web-stranic dlja zada informacionnogo poiska]. In Internet-Matematika, 283–301. Available online: http://company.yandex.ru/grant/2005/07_Ageev_102942.pdf
Allan, J. 2004. HARD track overview in TREC 2003: High accuracy retrieval from documents. In Proceedings of TREC-2003, 24–37.
Allan, J. 2005. HARD track overview in TREC 2004: High accuracy retrieval from documents. In Proceedings of TREC-2004, 25–35.
Beitzel, S.M., E.C. Jensen, A. Chowdhury, D. Grossman, O. Frieder, and N. Goharian. 2004. Fusion of effective retrieval strategies in the same information retrieval system. Journal of the American Society for Information Science and Technology (JASIST) 55(10):859–868.
Belkin, N., I. Chaleva, M. Cole, Y.-L. Li, L. Liu, Y.-H. Liu, G. Muresan, C. Smith, Y. Sun, X.-J. Yuan, and X.-M. Zhang. 2005. Rutgers’ HARD track experiences at TREC 2004. In: Proceedings of TREC-2004.
Braslavski, P. 2004. Document style recognition using shallow statistical analysis. In Proceedings of the ESSLLI 2004 Workshop on Combining Shallow and Deep Processing for NLP, 1–9. Nancy. Available online: http://esslli2004.loria.fr/content/readers/36.pdf
Braslavski, P., and A. Tselishchev. 2005. Style-dependent document ranking. In: Proceedings of the 7th Russian Conference on Digital Libraries (RCDL’2005), 159–164. Available online: http://www.rcdl2005.uniyar.ac.ru/ru/RCDL2005/papers/sek7_1_paper.pdf
Braslavski, P. 2007. Combining relevance and genre-related rankings: An Exploratory Study. In Proceedings of the International Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 1–4, Borovets, Bulgaria. Available online: http://kansas.ru/pb/ paper/ranlp2007.pdf
Collins-Thompson, K., and J.P. Callan. 2004. A language modeling approach to predicting reading difficulty. In Proceedings of HLT/NAACL, 193–200.
DuBay, W.H. 2004. The principles of readability. Available nline: http://www.nald.ca/fulltext/ readab/readab.pdf
Gulin, A., M. Maslov, and I. Segalovich. 2006. Yandex’ algorithm for text relevance ranking at ROMIP’2006 (in Russian) [Algoritm tekstovogo ranˇzirovanija Jandeksa na ROMIP’2006]. In Proceedings of ROMIP’2006, 40–51. Suzdal. Available online: http://www.romip.ru/romip2006/03_yandex.pdf
Gupta, S., G. Kaiser, S. Stolfo, and H. Becker. 2005. Genre classification of websites using search engine snippets. In Proceedings of SIGIR’2005 Workshop “Stylistic Analysis of Text For Information Access”. Salvador, Bahia.
Karlgren, J., and D. Cutting. 1994. Recognizing text genres with simple metrics using discriminant analysis. In Proceedings of the 15th Conference on Computational Linguistics, 1071–1075.
Kožina, M.N. 1968. Foundations of the functional stylistics (in Russian) [K osnovaniyam funkcional’noi stilistiki], Perm.
Kumaran, G., R. Jones, and Madani, O. 2005. Biasing web search results for topic familiarity. In Proceedings of CIKM’05, 271–272.
Lim, C.S., K.J. Lee, and G.C. Kim. 2005. Multiple sets of features for automatic genre classification of web documents. Information Processing and Management 41:1263–1276.
Liu, X., W.B. Croft, P. Oh, and D. Hart. 2004. Automatic recognition of reading levels from user queries. In Proceedings of SIGIR’2004, 548–549.
Meyer zu Eissen, S., and B. Stein. 2004. Genre classification of web pages. In Proceedings of the 27th German Conference on Artificial Intelligence (KI-2004), 256–269. Ulm.
Michos, S., E. Stamatatos, N. Fakotakis, G. Kokkinakis. 1996. Categorizing texts by using a three level functional style description. In Artificial intelligence: Methodology, systems, applications, frontiers in artificial intelligence and applications, ed. A.M. Rasmsay, vol. 35. Available online: http://slt.wcl.ee.upatras.gr/papers/michos2.pdf
Mystem Tool. http://company.yandex.ru/technology/mystem/
Rauber, A., and A. Müller-Kögler. 2001. Integrating automatic genre analysis into digital libraries. In Proceedings of the JCDL’2001, 1–10.
Richardson, M., A. Prakash, and E. Brill. 2006. Beyond PageRank: Machine learning for static ranking. In Proceedings of WWW’2006, 707–715.
Rosso, M.A. 2005. Using genre to improve web search. PhD thesis, University of North Carolina, Chapel Hill, NC.
Russian Information Retrieval Evaluation Seminar (ROMIP). http://romip.ru
Santini, M. 2004. State-of-the-art on automatic genre identification. Technical Report ITRI-04-03, Information Technology Research Institute, University of Brighton, Brighton. Available online: ftp://ftp.itri.bton.ac.uk/reports/ITRI-04-03.pdf
Santini, M. 2007. Automatic identification of genre in web pages. PhD thesis, University of Brighton, Brighton.
Si, L., and J. Callan. 2001. A statistical model for scientific readability. In Proceedings of CIKM’2001, 574–576.
Strzalkowski, T., L. Guthrie, J. Karlgren, J. Leistensnider, F. Lin, J. Perez-Carballo, T. Straszheim, J. Wang, and J. Wilding. 1996. Natural language information retrieval: TREC-5 Report. In Proceedings of TREC’1995.
Stubbe, A., C. Ringlstetter, and R. Goebel, R. 2007. Elements of a learning interface for genre qualified search. In Proceedings of the International Workshop “Towards Genre-Enabled Search Engines: The Impact of NLP”, 21–28. Borovets, Bulgaria.
WEGA: Web Genre Analysis Project. http://www.uni-weimar.de/cms/medien/webis/research/ projects/wega.html
Acknowledgments
We would like to thank Mikhail Ageev and Andrei Tselishchev for their help with data processing. We also thank Yandex (www.yandex.ru) for providing us with the experimental data. Many thanks to Matthew McCool and volume editors for their valuable comments on the draft.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Braslavski, P. (2010). Marrying Relevance and Genre Rankings: An Exploratory Study. In: Mehler, A., Sharoff, S., Santini, M. (eds) Genres on the Web. Text, Speech and Language Technology, vol 42. Springer, Dordrecht. https://doi.org/10.1007/978-90-481-9178-9_9
Download citation
DOI: https://doi.org/10.1007/978-90-481-9178-9_9
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-90-481-9177-2
Online ISBN: 978-90-481-9178-9
eBook Packages: Computer ScienceComputer Science (R0)