Skip to main content

An Architecture for Hybrid P2P Free-Text Search

  • Conference paper
Cooperative Information Agents XI (CIA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4676))

Included in the following conference series:

Abstract

Recent advances in peer to peer (P2P) search algorithms have presented viable structured and unstructured approaches for full-text search. We posit that these existing approaches are each best suited for different types of queries. We present PHIRST, the first system to facilitate effective full-text search within P2P networks. PHIRST works by effectively leveraging between the relative strengths of these approaches. Similar to structured approaches, agents first publish terms within their stored documents. However, frequent terms are quickly identified and not exhaustively stored, resulting in a significantly reduction in the system’s storage requirements. During query lookup, agents use unstructured searches to compensate for the lack of fully published terms. Additionally, they explicitly weigh between the costs involved with structured and unstructured approaches, allowing for a significant reduction in query costs. We evaluated the effectiveness of our approach using both real-world and artificial queries. We found that in most situations our approach yields near perfect recall. We discuss the limitations of our system, as well as possible compensatory strategies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chawathe, Y., Ratnasamy, S., Breslau, L., Lanham, N., Shenker, S.: Making gnutella-like p2p systems scalable. In: SIGCOMM 2003, pp. 407–418 (2003)

    Google Scholar 

  2. Gravano, L., García-Molina, H., Tomasic, A.: Gloss: text-source discovery over the internet. ACM Trans. Database Syst. 24(2), 229–264 (1999)

    Article  Google Scholar 

  3. Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of ICML 1997, pp. 143–151 (1997)

    Google Scholar 

  4. Joung, Y.-J., Fang, C.-T., Yang, L.-W.: Keyword search in dht-based peer-to-peer networks. In: ICDCS 2005, pp. 339–348. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  5. Li, J., Loo, B., Hellerstein, J., Kaashoek, F., Karger, D., Morris, R.: On the feasibility of peer-to-peer web indexing and search. In: IPTPS. 2nd International Workshop on Peer-to-Peer Systems (2003)

    Google Scholar 

  6. Loo, B.T., Hellerstein, J.M., Huebsch, R., Shenker, S., Stoica, I.: Enhancing p2p file-sharing with an internet-scale query processor. In: Proceedings of VLDB, pp. 432–443 (2004)

    Google Scholar 

  7. Loo, B.T., Huebsch, R., Stoica, I., Hellerstein, J.M.: The case for a hybrid p2p search infrastructure. In: Voelker, G.M., Shenker, S. (eds.) IPTPS 2004. LNCS, vol. 3279, p. 2. Springer, Heidelberg (2005)

    Google Scholar 

  8. Lv, Q., Cao, P., Cohen, E., Li, K., Shenker, S.: Search and replication in unstructured peer-to-peer networks. In: ICS 2002, pp. 84–95 (2002)

    Google Scholar 

  9. Morris, R., Karger, D., Kaashoek, F., Balakrishnan, H.: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. In: ACM SIGCOMM 2001, pp. 149–160 (2001)

    Google Scholar 

  10. Paice, C.D.: Another stemmer. SIGIR Forum 24(3), 56–61 (1990)

    Article  Google Scholar 

  11. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up? sentiment classification using machine learning techniques. In: EMNLP 2002, pp. 79–86 (2002)

    Google Scholar 

  12. Reynolds, P., Vahdat, A.: Efficient peer-to-peer keyword searching. In: Middleware, pp. 21–40 (2003)

    Google Scholar 

  13. Kubiatowicz, J.: Handling churn in a DHT. In: USENIX 2004, pp. 127–140 (2004)

    Google Scholar 

  14. Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.D.: Tapestry: a resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications 22(1), 41–53 (2004)

    Article  Google Scholar 

  15. Yang, Y., Dunlap, R., Rexroad, M., Cooper, B.F.: Performance of full text search in structured and unstructured peer-to-peer systems. In: IEEE INFOCOM (2006)

    Google Scholar 

  16. http://www.google.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Matthias Klusch Koen V. Hindriks Mike P. Papazoglou Leon Sterling

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Rosenfeld, A., Goldman, C.V., Kaminka, G.A., Kraus, S. (2007). An Architecture for Hybrid P2P Free-Text Search. In: Klusch, M., Hindriks, K.V., Papazoglou, M.P., Sterling, L. (eds) Cooperative Information Agents XI. CIA 2007. Lecture Notes in Computer Science(), vol 4676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-75119-9_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-75119-9_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-75118-2

  • Online ISBN: 978-3-540-75119-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics