Skip to main content

Web Information Retrieval - an Algorithmic Perspective

  • Conference paper
  • First Online:
Algorithms - ESA 2000 (ESA 2000)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1879))

Included in the following conference series:

Abstract

In this paper we survey algorithmic aspects of Web information retrieval. As an example, we discuss ranking of search engine results using connectivity analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley, 1999.

    Google Scholar 

  2. K. Bharat and A. Z. Broder. A technique for measuring the relative size and overlap of public Web search engines. In Proceedings of the Seventh International World Wide Web Conference 1998, pages 379–388.

    Google Scholar 

  3. K. Bharat, A. Z. Broder, J. Dean, and M. Henzinger. A comparison of Techniques to Find Mirrored Hosts on the World Wide Web. To appear in the Journal of the American Society for Information Science.

    Google Scholar 

  4. K. Bharat, A. Z. Broder, M. Henzinger, P. Kumar, and S. Venkatasubramanian. The connectivity server: Fast access to linkage information on the Web. In Proceedings of the Seventh International World Wide Web Conference 1998, pages 469–477.

    Google Scholar 

  5. S. Brin, J. Davis, and H. García-Molina. Copy detection mechanisms for digital documents. In M. J. Carey and D. A. Schneider, editors, Proceedings of the 1995 ACM SIGMOD International Conference on Management of Data, pages 398–409, San Jose, California, May 1995.

    Google Scholar 

  6. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In Proceedings of the Seventh International World Wide Web Conference 1998, pages 107–117.

    Google Scholar 

  7. A. Z. Broder, S. C. Glassman, M. S. Manasse, and G. Zweig. Syntactic clustering of the Web. In Proceedings of the Sixth International World Wide Web Conference 1997, pages 391–404.

    Google Scholar 

  8. A. Z. Broder and M. R. Henzinger. Algorithmic Aspects of Information Retrieval on the Web. In Handbook of Massive Data Sets. J. Abello, P.M. Pardalos, M.G.C. Resende (eds.), Kluwer Academic Publishers, Boston, forthcoming.

    Google Scholar 

  9. J. Carriere and R. Kazman. Webquery: Searching and visualizing the web through connectivity. In Proceedings of the Sixth International World Wide Web Conference 1997, pages 701–711.

    Google Scholar 

  10. S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1998, pages 307–318.

    Google Scholar 

  11. J. Cho and H. García-Molina. The Evolution of the Web and Implications for an incremental Crawler. Proceedings of the 26th International Conference on Very Large Databases (VLDB), 2000.

    Google Scholar 

  12. J. Cho, H. García-Molina, and L. Page. Efficient crawling through URL ordering. In Proceedings of the Seventh International World Wide Web Conference 1998, pages 161–172.

    Google Scholar 

  13. J. Cho, N. Shivakumar, and H. García-Molina. Finding replicated Web collections. Proceedings of the 2000 ACM International Conference on Management of Data (SIGMOD), 2000.

    Google Scholar 

  14. E. G. Coffman, Z. Liu, and R. R. Weber. Optimal robot scheduling for Web search engines. Technical Report 3317, INRIA, Dec. 1997.

    Google Scholar 

  15. J. Dean and M. R. Henzinger. Finding Related Web Pages in the World Wide Web. In Proceedings of the 8th International World Wide Web Conference 1998, pages 389–401.

    Google Scholar 

  16. R. B. Doorenbos, O. Etzioni, and D. S. Weld. A scalable comparison-shopping agent for the World-Wide Web. In W. L. Johnson and B. Hayes-Roth, editors, Proceedings of the 1st International Conference on Autonomous Agents, pages 39–48, New York, Feb. 1997. ACM Press.

    Google Scholar 

  17. E. Garfield. Citation analysis as a tool in journal evaluation. Science, 178, 1972.

    Google Scholar 

  18. E. Garfield. Citation Indexing. ISI Press, 1979.

    Google Scholar 

  19. T. Haveliwala. Efficient Computation of PageRank. Technical Report 1999-31, Stanford University, 1999.

    Google Scholar 

  20. M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. Measuring Search Engine Quality using Random Walks on the Web. In Proceedings of the 8th International World Wide Web Conference 1999, pages 213–225.

    Google Scholar 

  21. M. R. Henzinger, A. Heydon, M. Mitzenmacher, and M. Najork. On near-uniform URL sampling. In Proceedings of the Ninth International World Wide Web Conference 2000, pages 295–308.

    Google Scholar 

  22. B. J. Jansen, A. Spin, J. Bateman, and T. Saraceffic. Real Life Information Retrieval: A Study of User Queries on the Web. SIGIR FORUM, 32(1):5–17, 1998.

    Article  Google Scholar 

  23. M. M. Kessler. Bibliographic coupling between scientific papers. American Documentation, 14, 1963.

    Google Scholar 

  24. L. Katz. A new status index derived from sociometric analysis. Psychometrika, 18(1):39–43, March 1953.

    Google Scholar 

  25. J. Kleinberg. Authoritative sources in a hyperlinked environment. In Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 668–677, January 1998.

    Google Scholar 

  26. J. Kleinberg, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. The Web as a graph: Measurements, models and methods. Invited survey at the International Conference on Combinatorics and Computing, 1999.

    Google Scholar 

  27. S. Lawrence and C. L. Giles. Searching the World Wide Web. Science, 280(360):98, 1998.

    Article  Google Scholar 

  28. S. Lawrence and C. L. Giles. Accessibility of Information on the Web. Nature, 400(6740):107–109, 1999.

    Article  Google Scholar 

  29. Dharmendra S. Modha and W. Scott Spangler. Clustering Hypertext with Applications to Web Searching. Proceedings of the ACM Hypertext 2000 Conference, San Antonio, TX, 2000. Also appears as IBM Research Report RJ 10160 (95035), October 1999.

    Google Scholar 

  30. M. S. Mizruchi, P. Mariolis, M. Schwartz, and B. Mintz. Techniques for disaggregating centrality scores in social networks. In N. B. Tuma, editor, Sociological Methodology, pages 26–48. Jossey-Bass, San Francisco, 1986.

    Google Scholar 

  31. L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Stanford Digital Library Technologies, Working Paper 1999-0120, 1998.

    Google Scholar 

  32. C. Papadimitriou, P. Raghavan, H. Tamaki, and S. Vempala. Latent Semantic Indexing: A Probabilistic Analysis. In Proceedings of the 17th ACM Symposium on the Principles of Database Systems, 1998.

    Google Scholar 

  33. D. Rafiei, and A. Mendelzon. What is this page known for? Computing Web page reputations. In Proceedings of the Ninth International World Wide Web Conference 2000, pages 823–836.

    Google Scholar 

  34. G. Salton. The SMART System-Experiments in Automatic Document Processing. Prentice Hall.

    Google Scholar 

  35. N. Shivakumar and H. García-Molina. Finding near-replicas of documents on the Web. In Proceedings of Workshop on Web Databases (WebDB’98), March 1998.

    Google Scholar 

  36. C. Silverstein, M. Henzinger, H. Marais, and M. Moricz. Analysis of a Very Large AltaVista Query Log. Technical Note 1998-014, Compaq Systems Research Center, 1998. To appear in SIGIR FORUM.

    Google Scholar 

  37. H. Small. Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Amer. Soc. Info. Sci., 24, 1973.

    Google Scholar 

  38. O. Zamir and O. Etzioni. Web document clustering: A feasibility demonstration. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pages 46–54.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Henzinger, M. (2000). Web Information Retrieval - an Algorithmic Perspective. In: Paterson, M.S. (eds) Algorithms - ESA 2000. ESA 2000. Lecture Notes in Computer Science, vol 1879. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45253-2_1

Download citation

  • DOI: https://doi.org/10.1007/3-540-45253-2_1

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41004-1

  • Online ISBN: 978-3-540-45253-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics