Abstract
Models can provide mechanisms to improve system performance. This chapter presents the applied methods and techniques for modeling and controlling on micro-blog crawler. With the rapid development of social studies and social network, millions of people present or comment or share their opinions on the platform everyday, and as a result, produce or spread their opinions and sentiments on different topics. The microblog has been an effective platform to know or mine social opinions. In order to do so, crawling the relevant microblog data is necessary. But it is hard for a traditional web crawler to crawl micro-blog data as usual, as by using Web 2.0 techniques such as AJAX, the micro-blog data is dynamically generated rapidly. As most microblogs’ official platforms cannot offer some suitable tools or RPC interface to collect the big data effectively and efficiently, we present an algorithm on modeling and controlling on micro-blog data crawler based on simulating browsers’ behaviors. This needs to analyze the simulated browsers’ behaviors in order to obtain the requesting URLs to simulate and parse and analyze the sending URL requests according to the order of data sequence. The experimental results and the analysis show the feasibility of the approach. Further works are also presented at the end.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: 19th international conference on world wide web. ACM Press, USA, pp 591–600
Weng J, Lim EP, Jiang J, He Q (2010) TwitterRank: finding topic-sensitive influential twitterers. In: 3rd international conference on web search and web data mining. ACM Press, USA, pp 261–270
Cristian DNM, Lee L, Bo P, Kleinberg J (2012) Echoes of power: language effects and power differences in social interaction. In: 21th international conference on world wide web. ACM Press, France, pp 699–708
Wu S, Hofman JM, Mason WA, Watts DJ (2011) Who says what to whom on Twitter. In: 20th international conference on the world wide web. ACM Press, India, pp 705–714
Abel F, Gao Q, Houben GJ, Tao K (2011) Analyzing user modeling on Twitter for personalized news recommendations. In: International conference on user modeling, adaptation and personalization. LNCS, vol 6787. Springer, Spain, pp 1–12
Chen J, Nairn R, Nelson L, Bernstein M, Chi E (2010) Short and tweet: experiments on recommending content from information streams. In: 28th international conference on human factors in computing systems. ACM Press, USA, pp 1185–1194
Bakshy E, Hofman JM, Mason WA, Watts DJ (2011) Everyone’s an influencer: quantifying influence on Twitter. In: 3rd international conference on web search and data mining. ACM Press, Hong Kong, pp 65–74
Bakshy E, Rosenn I, Marlow C, Marlow C (2012) The role of social networks in information diffusion. In: International conference on world wide web. ACM Press, France, pp 519–528
Sachan M, Contractor D, Tanveer AF, Subramaniam LV (2012) Using content and interactions for discovering communities in social networks. In: International conference on world wide web. ACM Press, France, pp 331–340
Dan C, Shipman FM (2009) Capturing on-line social network link dynamics using event-driven sampling. In: International conference on computational science and engineering, vol 4. Vancouver, Canada, pp 284–291
Goyal A, Bonchi F, Lakshmanan LV (2010) Learning influence probabilities in social networks. In: 3th international conference on web search and data mining. ACM Press, USA, pp 241–250
Agarwal A, Durgesh S, Pandey AKA, Goel V (2012) Design of a parallel migrating web crawler. J Adv Res Comput Sci Softw Eng 2(4):147–153
Kim KS, Kim KY, Lee KH, Kim TK, Cho WS (2012) Design and implementation of web crawler based on dynamic web collection cycle. In: International conference on information networking (ICOIN). Bali, Indonesia, pp 562–566
Chandramouli A, Gauch S, Eno J (2012) A cooperative approach to web crawler URL ordering, human–computer systems interaction: backgrounds and applications. J Adv Intell Soft Comput 98:343–357
Lu G, Liu S, Lü K (2013) MBCrawler: a software architecture for micro-blog crawler. In: International conference on information technology and software engineering. Lecture Notes in Electrical Engineering, vol 212. Springer, Berlin, Heidelberg, pp 119–127
Gao K, Li SW (2010) The cooperation model for multi agents and the identification on replicated collections for web crawler. Int J Model Identif Control 11(3–4):224–231
Garg A, Tai K (2013) Comparison of statistical and machine learning methods in modelling of data with multicollinearity. Int J Model Identif Control 18(4):295–312
Han G, Zhu H, Ge J (2013) Effective search space reduction for human pose estimation with Viterbi recurrence algorithm. Int J Model Identif Control 18(4):341–348
Singh S, Mittal P, Kahlon KS (2013) Empirical model for predicting high, medium and low severity faults using object oriented metrics in Mozilla Firefox. Int J Comput Appl Technol 47(2/3):110–124
HttpWatch: Introduction to HttpWatch 8.x (2013). http://help.httpwatch.com/#introduction.html
Ajax: Introduction to Ajax (2013). http://api.jquery.com/category/ajax/
Json: Introduction to Json (2013). http://www.json.org/index.html
Acknowledgments
Some earlier works were done in Beijing Institute of Technology with the help of Dr. Hua-ping Zhang and Prof. Yin-ping Zhao. This work is sponsored by the National Science Foundation of Hebei Province (No. F2013208105) and the National Science Foundation of China (No. 61272362). It is also sponsored by Hebei Province Scientific and Technical Key Task (No. 12213516D).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gao, K., Zhou, EL., Grover, S. (2014). Applied Methods and Techniques for Modeling and Control on Micro-Blog Data Crawler. In: Liu, L., Zhu, Q., Cheng, L., Wang, Y., Zhao, D. (eds) Applied Methods and Techniques for Mechatronic Systems. Lecture Notes in Control and Information Sciences, vol 452. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36385-6_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-36385-6_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36384-9
Online ISBN: 978-3-642-36385-6
eBook Packages: EngineeringEngineering (R0)