HcBench: Methodology, Development, and Full-System Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark

Saletore, Vikram A.; Krishnan, Karthik; Viswanathan, Vish; Tolentino, Matthew E.

doi:10.1007/978-3-319-10596-3_7

Vikram A. Saletore¹⁹,
Karthik Krishnan²⁰,
Vish Viswanathan¹⁹ &
…
Matthew E. Tolentino¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8585))

Included in the following conference series:

1385 Accesses
1 Citations

Abstract

The Hadoop platform for Map-Reduce is extensively for Big Data batch analytics as well as interactive applications in e-commerce, telecom, media, retail, social networking, and other areas. However, to date no industry standard benchmarks exist to evaluate the true performance of a Hadoop cluster.

Current Hadoop benchmarks such as HiBench, Terasort, etc. in the open source domain fail to capture the real usages and performance of a Hadoop cluster in a datacenter. Given that typical Hadoop deployments process jobs under strict Service Level Agreement requirements, benchmarks are needed to evaluate the effects of concurrently running such diverse analytics jobs for performance comparison and cluster configuration.

In this paper, we present the methodology and the development of a customer usage representative Hadoop benchmark which includes a mix of job types, variety of data sizes, with inter-job arrival times as in a typical datacenter. We present the details of this benchmark and discuss application level, micro-architectural and cluster level performance characterization on an Intel Sandy Bridge Xeon Processor Hadoop cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: OSDI (2004)
Google Scholar
Chen, Y., Asplaugh, S., Katz, R.: Interactive analytical processing in big data systems: a cross industry study of MapReduce workloads. In: International Conference on Very Large Data Bases (VLDB), Aug 2012
Google Scholar
Chen, Y., Ganapathi, Griffith, R., Katz, R.; The case for evaluating MapReduce performance using workload suites. In: 19th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS) (2011)
Google Scholar
Baru, C., Bhandarkar, M., Nambiar, R., Poess, M., Rabl, T.: Benchmarking Big Data systems and the BigData Top100 list. Big Data (IMPETUS Innov. Archit.) 1(1), 60–64 (2013)
Google Scholar
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B., The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: ICDEW (2010)
Google Scholar
Wiki, PigMix Benchmark. http://wiki.apache.org/pig/PigMix
GridMix3 – Emulating Production Workload for Apache Hadoop. https://git.apache.org/hadoop-mapreduce.git/src/contrib/gridmix
STAC: Comparison of IBM Platform Symphony and Apache Hadoop Using Berkeley SWIM. STAC, LLC. Nov 2012
Google Scholar
SWIMProjectUCB, 2012. https://github.com/SWIMProjectUCB/SWIM/wiki
Jia, Y., Shao, Z.: A Benchmark for Hive, Pig, and Hadoop. https://issues.apache.org/jira/browse/hive-396
Thusoo, A., Sen-Sarma, J., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive - a warehouse solution over a Map-Reduce framework. In: VLDB (2009)
Google Scholar
Rumen. http://hadoop.apache.org/docs/r1.1.2/rumen.html
Zujie, R., Xu, X., Wan, J., Shi, W., Zhou, M.: Workload Characterization on a Production Hadoop Cluster: A Case Study on Taobao. In: IISWC (2012)
Google Scholar
TPC-C Benchmark. http://www.tpc.org
Poess, M., Floyd, C.: New TPC benchmarks for decision support and Web commerce. In: SIGMOD (2000)
Google Scholar
The HiBench Suite. https://github.com/intel-hadoop/HiBench
Wikipedia. Gamma Distribution. http://en.wikipedia.org/wiki/Gamma_distribution
Krishnan, K., Saletore, V.A.: Sysviz: system visualizer for cluster performance characterization. Internal report, Intel. Corp (2012)
Google Scholar

Download references

Acknowledgements

Karthik Krishnan, now with Amazon Web Services at Amazon.com Inc. contributed to the significant development of this benchmark when he was at Intel. We would also like to thank our manager, Intel Fellow and Chief Server Architect of the Data Center Group, Dr. Faye Briggs for encouraging us to develop this benchmark for platform performance architectures projections.

Author information

Authors and Affiliations

Data Center Group and Software Services Group, Intel Corporation, DuPont, USA
Vikram A. Saletore, Vish Viswanathan & Matthew E. Tolentino
Amazon Web Services, Amazon.com Inc., Seattle, USA
Karthik Krishnan

Authors

Vikram A. Saletore
View author publications
You can also search for this author in PubMed Google Scholar
Karthik Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Vish Viswanathan
View author publications
You can also search for this author in PubMed Google Scholar
Matthew E. Tolentino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vikram A. Saletore .

Editor information

Editors and Affiliations

University of Toronto, Toronto, Ontario, Canada
Tilmann Rabl
Cisco Systems, Inc., San José, USA
Nambiar Raghunath
Oracle Corporation, Redwood Shores, USA
Meikel Poess
Pivotal Software, Inc., Palo Alto, USA
Milind Bhandarkar
University of Toronto, Toronto, Canada
Hans-Arno Jacobsen
University of California at San Diego, La Jolla, USA
Chaitanya Baru

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saletore, V.A., Krishnan, K., Viswanathan, V., Tolentino, M.E. (2014). HcBench: Methodology, Development, and Full-System Characterization of a Customer Usage Representative Big Data/Hadoop Benchmark. In: Rabl, T., Raghunath, N., Poess, M., Bhandarkar, M., Jacobsen, HA., Baru, C. (eds) Advancing Big Data Benchmarks. WBDB WBDB 2013 2013. Lecture Notes in Computer Science(), vol 8585. Springer, Cham. https://doi.org/10.1007/978-3-319-10596-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-10596-3_7
Published: 09 October 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10595-6
Online ISBN: 978-3-319-10596-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics