Skip to main content

Towards a Big Data Benchmarking and Demonstration Suite for the Online Social Network Era with Realistic Workloads and Live Data

  • Conference paper
  • First Online:
Big Data Benchmarks, Performance Optimization, and Emerging Hardware (BPOE 2015)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9495))

Included in the following conference series:

  • 903 Accesses

Abstract

The growing popularity of online social networks has taken big data analytics into uncharted territories. Newly developed platforms and analytics in these environments are in dire need for customized frameworks of evaluation and demonstration. This paper presents the first big data benchmark centering on online social network analytics and their underlying distributed platforms. The benchmark comprises of a novel data generator rooted in live online social network feeds, a uniquely comprehensive set of online social network analytics workloads, and evaluation metrics that are both system-aware and analytics-aware. In addition, the benchmark also provides application plug-ins that allow for compelling demonstration of big data solutions. We describe the benchmark design challenges, an early prototype and three use cases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 34.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 44.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://hadoop.apache.org.

  2. 2.

    http://spark.apache.org.

  3. 3.

    http://storm.apache.org.

  4. 4.

    https://parquet.apache.org/.

  5. 5.

    https://wiki.openstack.org/wiki/Swift.

  6. 6.

    https://dev.twitter.com/rest/public/search.

  7. 7.

    http://hipi.cs.virginia.edu/index.html.

  8. 8.

    https://github.com/lintool/Mr.LDA.

  9. 9.

    https://github.com/mertterzihan/pymc/tree/pyspark/pymc/examples/lda.

  10. 10.

    https://mahout.apache.org/users/recommender/intro-als-hadoop.html.

  11. 11.

    https://mahout.apache.org/users/recommender/intro-cooccurrence-spark.html.

  12. 12.

    https://github.com/intel-cloud/cosbench.

References

  1. Demchenko, Y., Grosso, P., De Laat, C., Membrey, P.: Addressing big data issues in scientific data infrastructure. In: 2013 International Conference on Collaboration Technologies and Systems (CTS), pp. 48–55. IEEE (2013)

    Google Scholar 

  2. Erling, O., Averbuch, A., Larriba-Pey, J., Chafi, H., Gubichev, A., Prat, A., Pham, M.D., Boncz, P.: The ldbc social network benchmark: interactive workload. In: Proceedings of SIGMOD (2015)

    Google Scholar 

  3. Ghazal, A., Rabl, T., Hu, M., Raab, F., Poess, M., Crolotte, A., Jacobsen, H.A.: Bigbench: towards an industry standard benchmark for big data analytics. In: Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data, pp. 1197–1208. SIGMOD, ACM (2013)

    Google Scholar 

  4. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The hibench benchmark suite: Characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51 (2010)

    Google Scholar 

  5. Mesnier, M., Ganger, G.R., Riedel, E.: Object-based storage. IEEE Commun. Mag. 41(8), 84–90 (2003)

    Article  Google Scholar 

  6. Ming, Z., Luo, C., Gao, W., Han, R., Yang, Q., Wang, L., Zhan, J.: Bdgs: A scalable big data generator suite in big data benchmarking. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, H.-A., Baru, C. (eds.) Advancing Big Data Benchmarks. Lecture Notes in Computer Science, vol. 8585, pp. 138–154. Springer, Heidelberg (2014)

    Google Scholar 

  7. Mislove, A., Marcon, M., Gummadi, K.P., Druschel, P., Bhattacharjee, B.: Measurement and analysis of online social networks. In: Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, pp. 29–42. IMC, ACM, New York (2007)

    Google Scholar 

  8. Oh, C., Sheng, O.: Investigating predictive power of stock micro blog sentiment in forecasting future stock price directional movement. In: Galletta, D.F., Liang, T.P. (eds.) International Conference on Information Systems. Association for Information Systems (2011)

    Google Scholar 

  9. Powers, D.M.: Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation. J. Mach. Learn. Technol. 2(1), 37–63 (2011)

    MathSciNet  Google Scholar 

  10. Rabl, T., Danisch, M., Frank, M., Schindler, S., Jacobsen, H.A.: Just can’t get enough - synthesizing big data. In: Proceedings of the ACM SIGMOD Conference (2015)

    Google Scholar 

  11. Wang, L., Zhan, J., Luo, C., Zhu, Y., Yang, Q., He, Y., Gao, W., Jia, Z., Shi, Y., Zhang, S., Zheng, C., Lu, G., Zhan, K., Li, X., Qiu, B.: Bigdatabench: a big data benchmark suite from internet services. In: 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA), pp. 488–499 (2014)

    Google Scholar 

  12. Zhang, R., Jain, R., Sarkar, P., Rupprecht, L.: Getting your big data priorities straight: a demonstration of priority-based qos using social-network-driven stock recommendation. Proc. VLDB Endow. 7(13), 1665–1668 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Zhang, R., Manotas, I., Li, M., Hildebrand, D. (2016). Towards a Big Data Benchmarking and Demonstration Suite for the Online Social Network Era with Realistic Workloads and Live Data. In: Zhan, J., Han, R., Zicari, R. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2015. Lecture Notes in Computer Science(), vol 9495. Springer, Cham. https://doi.org/10.1007/978-3-319-29006-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-29006-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-29005-8

  • Online ISBN: 978-3-319-29006-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics