Adaptive and Parallel Data Acquisition from Online Big Graphs

Yin, Zidu; Yue, Kun; Wu, Hao; Su, Yingjie

doi:10.1007/978-3-319-91452-7_21

Zidu Yin²⁴,
Kun Yue²⁴,
Hao Wu²⁴ &
…
Yingjie Su²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10827))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3314 Accesses
2 Citations

Abstract

Acquisition of contents from online big graphs (OBGs) like linked Web pages, social networks and knowledge graphs, is critical as data infrastructure for Web applications and massive data analysis. However, effective data acquisition is challenging due to the massive, heterogeneous, dynamically evolving properties of OBGs with unknown global topological structures. In this paper, we give an adaptive and parallel approach for effective data acquisition from OBGs. We adopt the ideas of Quasi Monte Carlo (QMC) and branch & bound methods to propose an adaptive Web-scale sampling algorithm for parallel data collection implemented upon Spark. Experimental results show the effectiveness and efficiency of our method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Yang, D., Xiao, Y., Tong, H., Zhang, J., Wang, W.: An integrated tag recommendation algorithm towards Weibo user profiling. In: Renz, M., Shahabi, C., Zhou, X., Cheema, M.A. (eds.) DASFAA 2015. LNCS, vol. 9049, pp. 353–373. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18120-2_21
Chapter Google Scholar
Faure, H., Lemieux, C.: Improved Halton sequences and discrepancy bounds. Monte Carlo Methods Appl. 16(3), 1–18 (2010)
MathSciNet MATH Google Scholar
Hammersley, J., Handscomb, D.: Monte Carlo methods. Appl. Stat. 14(2/3), 347–385 (1964)
MATH Google Scholar
Sharma, A., Baral, C.: Automatic extraction of events-based conditional commonsense knowledge. In: Proceedings of Workshops at the 30th AAAI Conference on Artificial Intelligence, Phoenix, USA, pp. 527–531. AAAI (2016)
Google Scholar
Surendran, S., Prasad, D., Kaimal, M.: A scalable geometric algorithm for community detection from social networks with incremental update. Soc. Netw. Anal. Min. 6(1), 90:1–90:13 (2016)
Article Google Scholar
Xi, S., Sun, F., Wang, J.: A cognitive crawler using structure pattern for incremental crawling and content extraction. In: IEEE International Conference on Cognitive Informatics, Beijing, China, pp. 238–244. IEEE (2010)
Google Scholar
Wu, X., Chen, H., Wu, G., Liu, J., et al.: Knowledge engineering with big data. IEEE Intell. Syst. 30(5), 46–55 (2015)
Article Google Scholar
Stivala, A., Koskinen, J., Rolls, D., Wang, P., Robins, G.: Snowball sampling for estimating exponential random graph models for large networks. Soc. Netw. 47, 167–188 (2016)
Article Google Scholar
Urbani, J., Dutta, S., Gurajada, S., Weikum, G.: KOGNAC: efficient encoding of large knowledge graphs. In: International Joint Conference on Artificial Intelligence, New York, USA, pp. 3896–3902 (2016)
Google Scholar
Wu, C., Hou, W., Shi, Y., Liu, T.: A Web search contextual crawler using ontology relation mining. In: International Conference on Computational Intelligence and Software Engineering, pp. 1–4. IEEE (2009)
Google Scholar
Tsai, C., Lin, W., Ke, S.: Big data mining with parallel computing: a comparison of distributed and MapReduce methodologies. J. Syst. Softw. 122, 83–92 (2016)
Article Google Scholar

Download references

Acknowledgment

This paper was supported by the National Natural Science Foundation of China (Nos. 61472345, 61562090), Program for Excellent Young Talents of Yunnan University (No. WX173602), Research Foundation of Yunnan University (No. 2017YDJQ06), and Research Foundation of Educational Department of Yunnan Province (No. 2017ZZX228).

Author information

Authors and Affiliations

School of Information Science and Engineering, Yunnan University, Kunming, China
Zidu Yin, Kun Yue, Hao Wu & Yingjie Su

Authors

Zidu Yin
View author publications
You can also search for this author in PubMed Google Scholar
Kun Yue
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yingjie Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kun Yue .

Editor information

Editors and Affiliations

Simon Fraser University, Burnaby, BC, Canada
Jian Pei
Aristotle University of Thessaloniki, Thessaloniki, Greece
Yannis Manolopoulos
University of Queensland, Brisbane, QLD, Australia
Shazia Sadiq
University of Western Australia, Crawley, WA, Australia
Jianxin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yin, Z., Yue, K., Wu, H., Su, Y. (2018). Adaptive and Parallel Data Acquisition from Online Big Graphs. In: Pei, J., Manolopoulos, Y., Sadiq, S., Li, J. (eds) Database Systems for Advanced Applications. DASFAA 2018. Lecture Notes in Computer Science(), vol 10827. Springer, Cham. https://doi.org/10.1007/978-3-319-91452-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-91452-7_21
Published: 13 May 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91451-0
Online ISBN: 978-3-319-91452-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics