Skip to main content

Path Sampling Based Relevance Search in Heterogeneous Networks

  • Conference paper
  • First Online:
Big Data Computing and Communications (BigCom 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9784))

Included in the following conference series:

Abstract

With the boom of study on heterogeneous network, searching relevant objects of different types has become a research focus. For example, people are interested in finding actors who cooperate with the famous director Steven Spielberg the most frequently in movie network. Considering the time and memory consuming drawbacks of traditional random walk models, this paper presents a random path sampling measure RSSim, where the tradeoff can be made between efficiency and estimating accuracy, to discover relevant objects in heterogeneous network. The key idea of this algorithm is that we use a Monte Carlo simulation to make an \(\varepsilon \)-approximation to our relevance measure defined on meta path, an important concept to catch up the semantic meaning of a search. The lightweight property and quickness of Monte Carlo simulation make the algorithm applicable to large scale networks. Moreover, we give the theoretical proofs for the error bound and confidence followed in the process of estimation. Experiments validate that RSSim is 100 times faster than several optional methods and can make a good ranking accuracy approximation to the baseline with a small sample size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fogaras, D., Rácz, B.: Towards scaling fully personalized PageRank. In: Leonardi, S. (ed.) WAW 2004. LNCS, vol. 3243, pp. 105–117. Springer, Heidelberg (2004)

    Google Scholar 

  2. Jarrelin, B.K., Kekalainen, J.: (2002) cumulated gain based evaluation of ir techniques. In: ACM Transactions on Information system (2010)

    Google Scholar 

  3. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 538–543 (2002)

    Google Scholar 

  4. Jeh, G., Widom, J.: Scaling personalized web search. In: Proceedings of the 12th International Conference on World Wide Web, pp. 271–279 (2003)

    Google Scholar 

  5. Kusumoto, M., Maehara, T., Kawarabayashi, K.i.: Scalable similarity search for simrank. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, pp. 325–336. ACM (2014)

    Google Scholar 

  6. Lao, N., Cohen, W.W.: Fast query execution for retrieval models based on path-constrained random walks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 881–888 (2010)

    Google Scholar 

  7. Li, Z., Fang, Y., Liu, Q., Cheng, J., Cheng, R., Lui, J.: Walking in the cloud: parallel simrank at scale. Proc. VLDB Endowment 9(1), 24–35 (2015)

    Article  Google Scholar 

  8. Meng, X., Shi, C., Li, Y., Zhang, L., Wu, B.: Relevance measure in large-scale heterogeneous networks. In: Chen, L., Jia, Y., Sellis, T., Liu, G. (eds.) APWeb 2014. LNCS, vol. 8709, pp. 636–643. Springer, Heidelberg (2014)

    Google Scholar 

  9. Lao, N.: W.W.C.: relational retrieval using a combination of path-constrained random walks. Mach. Learn. 81, 53–67 (2010)

    Article  MathSciNet  Google Scholar 

  10. Shao, Y., Cui, B., Chen, L., Liu, M., Xie, X.: An efficient similarity search framework for simrank over large dynamic graphs. Proc. VLDB Endowment 8(8), 838–849 (2015)

    Article  Google Scholar 

  11. Shi, C., Kong, X., Huang, Y., Yu, P.S.: Hetesim: a general framework for relevance measure in heterogeneous networks. IEEE Trans. Knowl. Data Eng. 26(10), 2479–2492 (2014)

    Article  Google Scholar 

  12. Shi, C., Kong, X., Yu, P.S., Xie, S., Wu, B.: Relevance search in heterogeneous networks. In. In Proceedings of 2012 International Conference on Extending Database Technology (EDBT 2012), pp. 180–191 (2012)

    Google Scholar 

  13. Shi, C., Li, Y., Zhang, J., Sun, Y., Yu, P.S.: A survey of heterogeneous information network analysis. CoRR abs/1511.04854 (2015). http://arxiv.org/abs/1511.04854

  14. Sun, Y., Han, J., Yan, X., Yu, P.S., Wu, T.: Pathsim: meta path-based top-k similarity search in heterogeneous information networks. In: VLDB 2011 (2011)

    Google Scholar 

  15. Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theor. Probab. Appl. 17(2), 264–280 (1971)

    Article  MathSciNet  MATH  Google Scholar 

  16. Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., Li, J.: Panther: fast top-k similarity search in large networks. CoRR abs/1504.02577 (2015). http://arxiv.org/abs/1504.02577

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qiang Gu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Gu, Q., Zhang, C., Sun, T., Ji, Y., Hu, Z., Qiu, X. (2016). Path Sampling Based Relevance Search in Heterogeneous Networks. In: Wang, Y., Yu, G., Zhang, Y., Han, Z., Wang, G. (eds) Big Data Computing and Communications. BigCom 2016. Lecture Notes in Computer Science(), vol 9784. Springer, Cham. https://doi.org/10.1007/978-3-319-42553-5_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-42553-5_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-42552-8

  • Online ISBN: 978-3-319-42553-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics