Skip to main content

The Key Technologies of Real-Time Processing Large Scale Microblog Data Stream

  • Conference paper
  • First Online:
Cloud Computing and Big Data (CloudCom-Asia 2015)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9106))

Abstract

Real-time monitoring microblog data can find sensitive information in time and provide help for public sentiment management and control. However, it needs processing large-scale data stream. MapReduce is a framework of processing large-scale data in batch mode, its purpose is to increase throughput, but its real-time performance is limited. Aiming at the real-time performance limitation of MapReduce, RT-SSP (Real-Time Staged Stream Processing), a hybrid staged real-time stream processing scheme both for batch and real-time processing was proposed. By this method large-scale high-speed data stream is locally processed in stages, the communication cost is reduced by storing intermediate results to local node, and key technologies such as cache optimization are used to realize high concurrent read and write. Experiments show that RT-SSP scheme can improve the real-time performance of processing large-scale microblog data stream and achieve speed-up ratio of about 2.3.

Foundation items: National Natural Science Foundation of China (No. 60970012); Natural Science Foundation of Shandong Province (No. ZR2013FL005). Author introduction: Yunpeng Cao (1967-), male, master, associate professor, main research directions include large-scale data processing and parallel computing. Haifeng Wang (1976-), male, associate professor, doctor, main research directions include network computing, large-scale data processing and cloud computing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cao, Y.: Monitoring large-scale microblog on GPUs. J. Comput. Inf. Syst. 10(15), 6493–6500 (2014)

    Google Scholar 

  2. Cao, Y., Wang, H.: The key optimal parallel technologies of processing large-scale micro-blog data on GPUs. J. Comput. Inf. Syst. 10(18), 7731–7738 (2014)

    Google Scholar 

  3. Abadi, D.J., Ahmad, Y., Balazinska, M., et al.: The design of the Borealis stream processing engine. In: Proceedings of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR2005), pp. 277–289. Asilomar, USA (2005)

    Google Scholar 

  4. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. ACM Commun. 51(1), 107–113 (2008)

    Article  Google Scholar 

  5. Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., Kozyrakis, C.: Evaluating MapReduce for multi-core, multiprocessor systems. In: Proceedings of the 13th International Conference on High-Performance Computer Architecture (HPCA2007), Phoenix, USA, pp. 13–24 (2007)

    Google Scholar 

  6. Qi, K., Han, Y., Zhao, Z., Ma, Q.: Real-time data stream processing and key techniques oriented to large-scalr sensor data. Comput. Integr. Manuf. Syst. 19(3), 641–653 (2013)

    Google Scholar 

  7. Condie, T., Conway, N., Alvaro, P., Helerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: Proceedings of the 7th USENIX Symposium on Networked Systems Design, Implementation (NSDI2010), San Jose, USA, pp. 313–328 (2010)

    Google Scholar 

  8. Neumeyer, L., Robbins, L., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: Proceedings of the 10th IEEE International Conference on Data Mining Workshops (ICDMW2010), Sydney, Australia, pp. 170–177 (2010)

    Google Scholar 

  9. Chang, F., Dean, J., Ghemawat, S., et al.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI2006). Seattle, USA, pp. 205–218 (2006)

    Google Scholar 

  10. Lubomir, F.B., Show, A.C.: Operation System Principles. Prentice Hall, New Jersey (2003)

    Google Scholar 

  11. DeCandia, G., Hastorun, D., Jampani, M., et al.: Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP2007). Stevenson, USA, pp. 205–220 (2007)

    Google Scholar 

  12. Qi, K., Han, Y., Zhao, Z., Fang, J.: MapReduce intermediate result cache for concurrent data stream processing. J. Comput. Res. Dev. 50(1), 111–121 (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunpeng Cao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Cao, Y., Wang, H. (2015). The Key Technologies of Real-Time Processing Large Scale Microblog Data Stream. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-28430-9_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-28429-3

  • Online ISBN: 978-3-319-28430-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics