Web Documents Prioritization Using Iterative Improvement

Chaudhary, Kamika; Gupta, Neena; Kumar, Santosh

doi:10.1007/978-981-10-8657-1_35

Kamika Chaudhary¹³,
Neena Gupta¹³ &
Santosh Kumar¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 827))

Included in the following conference series:

International Conference on Next Generation Computing Technologies

1348 Accesses
1 Citations

Abstract

The amount of information accumulating on World Wide Web is growing in size exponentially. This led to difficulty in accessing the relevant information as it becomes tough for a user to access his required information in minimum amount of time. As a result of single query placed by user in search engine a large number of search results appear in front of him and to dig out the most relevant web link becomes a cumbersome task for user which can lead to decrease in trust for search engine. This paper proposes an approach for web structure and web usage mining by using iterative improvement algorithm. Iterative improvement is a randomized algorithm which is used for solving combinatorial optimization problem. This technique helps in selecting top T web pages and prioritizing them in relevance order. Experimental evaluation has been done which shows significant improvement in performance. The parameters used are access frequency, time duration, no of visitors, hubs and authorities. They cover the area of both web structure and web usage mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Seyfi, A., Patel, A.: A focused crawler combinatory link and content model based on T-graph principles. Comput. Stand. Interfaces 43, 1–11 (2016)
Article Google Scholar
Derhami, V., Khodadadian, E., Ghasemzadeh, M., Bidoki, A.M.Z.: Applying reinforcement learning for web pages ranking algorithms. Appl. Soft Comput. 13(4), 1686–1692 (2013)
Article Google Scholar
Bidoki, A.M.Z., Ghodsnia, P., Yazdani, N., Oroumchian, F.: A3CRank: an adaptive ranking method based on connectivity, content and click-through data. Inf. Process. Manag. 46(2), 159–169 (2010)
Article Google Scholar
Zheng, Z., Chen, K., Sun, G., Zha, H.: A regression framework for learning ranking functions using relative relevance judgments. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 287–294, July 2007
Google Scholar
Bidoki, A.M.Z., Yazdani, N., Ghodsnia, P.: FICA: a novel intelligent crawling algorithm based on reinforcement learning. Web Intell. Agent Syst.: Int. J. 7(4), 363–373 (2009)
Google Scholar
Choi, D.Y.: Enhancing the power of web search engines by means of fuzzy query. Decis. Support Syst. 35(1), 31–44 (2003)
Article Google Scholar
Wang, H., Li, Y., Guo, K.: Countering web spam of link-based ranking based on link analysis. Procedia Eng. 23, 310–315 (2011)
Article Google Scholar
Gupta, S.K., Singh, D., Doegar, A.: Web documents prioritization using genetic algorithm. In: IEEE International Conference on Computing for Sustainable Global Development (INDIACom), pp. 3042–3047 (2016)
Google Scholar
Chaudhary, K., Gupta, S.K.: Prioritizing web links based on web usage and content data. In: IEEE International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), pp. 546–551 (2014)
Google Scholar
Johnson, F., Kumar, S.: Web content mining using genetic algorithm. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds.) ICAC3 2013. CCIS, vol. 361, pp. 82–93. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36321-4_8
Chapter Google Scholar
Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 993–1002 (2011)
Google Scholar
Koundal, D.: Prioritize the ordering of URL queue in focused crawler. J. AI Data Min. 2(1), 25–31 (2014)
Google Scholar
Bendersky, M., Croft, W.B., Diao, Y.: Quality-biased ranking of web documents. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 95–104 (2011)
Google Scholar
Cho, J., Roy, S., Adams, R.E.: Page quality: in search of an unbiased web ranking. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 551–562 (2005)
Google Scholar
Abdullah, S., Burke, E.K., McCollum, B.: Using a randomised iterative improvement algorithm with composite neighbourhood structures for the university course timetabling problem. In: Doerner, K.F., Gendreau, M., Greistorfer, P., Gutjahr, W., Hartl, R.F., Reimann, M. (eds.) Metaheuristics, pp. 153–169. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-71921-4_8
Chapter Google Scholar
Xue, G.R., Zeng, H.J., Chen, Z., Yu, Y., Ma, W.Y., Xi, W., Fan, W.: Optimizing web search using web click-through data. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 118–126 (2004)
Google Scholar
Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on web usage mining. Commun. ACM 43(8), 142–151 (2000)
Article Google Scholar
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Article MathSciNet Google Scholar
Narasimhan, H., Satheesh, S.: A randomized iterative improvement algorithm for photomosaic generation. In: Nature & Biologically Inspired Computing World Congress, pp. 777–781 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Gurukula Kangri Vishwavidyalaya, Kanya Gurukula Campus, Dehradun, Uttarakhand, India
Kamika Chaudhary & Neena Gupta
Department of Computer Science and Engineering, Krishna Institute of Engineering and Technology, Ghaziabad, Uttar Pradesh, India
Santosh Kumar

Authors

Kamika Chaudhary
View author publications
You can also search for this author in PubMed Google Scholar
Neena Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kamika Chaudhary .

Editor information

Editors and Affiliations

Indian Institute of Technology Patna, Patna, Bihar, India
Pushpak Bhattacharyya
University of Petroleum and Energy Studies, Dehradun, India
Hanumat G. Sastry
University of Petroleum and Energy Studies, Dehradun, India
Venkatadri Marriboyina
University of Petroleum and Energy Studies, Dehradun, India
Rashmi Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chaudhary, K., Gupta, N., Kumar, S. (2018). Web Documents Prioritization Using Iterative Improvement. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 827. Springer, Singapore. https://doi.org/10.1007/978-981-10-8657-1_35

Download citation

DOI: https://doi.org/10.1007/978-981-10-8657-1_35
Published: 09 June 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8656-4
Online ISBN: 978-981-10-8657-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics