Skip to main content

Web Documents Prioritization Using Iterative Improvement

  • Conference paper
  • First Online:
Smart and Innovative Trends in Next Generation Computing Technologies (NGCT 2017)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 827))

Included in the following conference series:

Abstract

The amount of information accumulating on World Wide Web is growing in size exponentially. This led to difficulty in accessing the relevant information as it becomes tough for a user to access his required information in minimum amount of time. As a result of single query placed by user in search engine a large number of search results appear in front of him and to dig out the most relevant web link becomes a cumbersome task for user which can lead to decrease in trust for search engine. This paper proposes an approach for web structure and web usage mining by using iterative improvement algorithm. Iterative improvement is a randomized algorithm which is used for solving combinatorial optimization problem. This technique helps in selecting top T web pages and prioritizing them in relevance order. Experimental evaluation has been done which shows significant improvement in performance. The parameters used are access frequency, time duration, no of visitors, hubs and authorities. They cover the area of both web structure and web usage mining.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Seyfi, A., Patel, A.: A focused crawler combinatory link and content model based on T-graph principles. Comput. Stand. Interfaces 43, 1–11 (2016)

    Article  Google Scholar 

  2. Derhami, V., Khodadadian, E., Ghasemzadeh, M., Bidoki, A.M.Z.: Applying reinforcement learning for web pages ranking algorithms. Appl. Soft Comput. 13(4), 1686–1692 (2013)

    Article  Google Scholar 

  3. Bidoki, A.M.Z., Ghodsnia, P., Yazdani, N., Oroumchian, F.: A3CRank: an adaptive ranking method based on connectivity, content and click-through data. Inf. Process. Manag. 46(2), 159–169 (2010)

    Article  Google Scholar 

  4. Zheng, Z., Chen, K., Sun, G., Zha, H.: A regression framework for learning ranking functions using relative relevance judgments. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 287–294, July 2007

    Google Scholar 

  5. Bidoki, A.M.Z., Yazdani, N., Ghodsnia, P.: FICA: a novel intelligent crawling algorithm based on reinforcement learning. Web Intell. Agent Syst.: Int. J. 7(4), 363–373 (2009)

    Google Scholar 

  6. Choi, D.Y.: Enhancing the power of web search engines by means of fuzzy query. Decis. Support Syst. 35(1), 31–44 (2003)

    Article  Google Scholar 

  7. Wang, H., Li, Y., Guo, K.: Countering web spam of link-based ranking based on link analysis. Procedia Eng. 23, 310–315 (2011)

    Article  Google Scholar 

  8. Gupta, S.K., Singh, D., Doegar, A.: Web documents prioritization using genetic algorithm. In: IEEE International Conference on Computing for Sustainable Global Development (INDIACom), pp. 3042–3047 (2016)

    Google Scholar 

  9. Chaudhary, K., Gupta, S.K.: Prioritizing web links based on web usage and content data. In: IEEE International Conference on Issues and Challenges in Intelligent Computing Techniques (ICICT), pp. 546–551 (2014)

    Google Scholar 

  10. Johnson, F., Kumar, S.: Web content mining using genetic algorithm. In: Unnikrishnan, S., Surve, S., Bhoir, D. (eds.) ICAC3 2013. CCIS, vol. 361, pp. 82–93. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-36321-4_8

    Chapter  Google Scholar 

  11. Ding, S., Suel, T.: Faster top-k document retrieval using block-max indexes. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 993–1002 (2011)

    Google Scholar 

  12. Koundal, D.: Prioritize the ordering of URL queue in focused crawler. J. AI Data Min. 2(1), 25–31 (2014)

    Google Scholar 

  13. Bendersky, M., Croft, W.B., Diao, Y.: Quality-biased ranking of web documents. In: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, pp. 95–104 (2011)

    Google Scholar 

  14. Cho, J., Roy, S., Adams, R.E.: Page quality: in search of an unbiased web ranking. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 551–562 (2005)

    Google Scholar 

  15. Abdullah, S., Burke, E.K., McCollum, B.: Using a randomised iterative improvement algorithm with composite neighbourhood structures for the university course timetabling problem. In: Doerner, K.F., Gendreau, M., Greistorfer, P., Gutjahr, W., Hartl, R.F., Reimann, M. (eds.) Metaheuristics, pp. 153–169. Springer, Boston (2007). https://doi.org/10.1007/978-0-387-71921-4_8

    Chapter  Google Scholar 

  16. Xue, G.R., Zeng, H.J., Chen, Z., Yu, Y., Ma, W.Y., Xi, W., Fan, W.: Optimizing web search using web click-through data. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 118–126 (2004)

    Google Scholar 

  17. Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on web usage mining. Commun. ACM 43(8), 142–151 (2000)

    Article  Google Scholar 

  18. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)

    Article  MathSciNet  Google Scholar 

  19. Narasimhan, H., Satheesh, S.: A randomized iterative improvement algorithm for photomosaic generation. In: Nature & Biologically Inspired Computing World Congress, pp. 777–781 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kamika Chaudhary .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chaudhary, K., Gupta, N., Kumar, S. (2018). Web Documents Prioritization Using Iterative Improvement. In: Bhattacharyya, P., Sastry, H., Marriboyina, V., Sharma, R. (eds) Smart and Innovative Trends in Next Generation Computing Technologies. NGCT 2017. Communications in Computer and Information Science, vol 827. Springer, Singapore. https://doi.org/10.1007/978-981-10-8657-1_35

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8657-1_35

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8656-4

  • Online ISBN: 978-981-10-8657-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics