Nephele streaming: stream processing under QoS constraints at scale

Lohrmann, Björn; Warneke, Daniel; Kao, Odej

doi:10.1007/s10586-013-0281-8

Nephele streaming: stream processing under QoS constraints at scale

Published: 25 July 2013

Volume 17, pages 61–78, (2014)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Björn Lohrmann¹,
Daniel Warneke² &
Odej Kao¹

654 Accesses
22 Citations
Explore all metrics

Abstract

The ability to process large numbers of continuous data streams in a near-real-time fashion has become a crucial prerequisite for many scientific and industrial use cases in recent years. While the individual data streams are usually trivial to process, their aggregated data volumes easily exceed the scalability of traditional stream processing systems.

At the same time, massively-parallel data processing systems like MapReduce or Dryad currently enjoy a tremendous popularity for data-intensive applications and have proven to scale to large numbers of nodes. Many of these systems also provide streaming capabilities. However, unlike traditional stream processors, these systems have disregarded QoS requirements of prospective stream processing applications so far.

In this paper we address this gap. First, we analyze common design principles of today’s parallel data processing frameworks and identify those principles that provide degrees of freedom in trading off the QoS goals latency and throughput. Second, we propose a highly distributed scheme which allows these frameworks to detect violations of user-defined QoS constraints and optimize the job execution without manual interaction. As a proof of concept, we implemented our approach for our massively-parallel data processing framework Nephele and evaluated its effectiveness through a comparison with Hadoop Online.

For an example streaming application from the multimedia domain running on a cluster of 200 nodes, our approach improves the processing latency by a factor of at least 13 while preserving high data throughput when needed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Hadoop online prototype—Google project hosting (2012). http://code.google.com/p/hop/
Justin.tv—streaming live video broadcasts for everyone (2012). http://www.justin.tv/
Livestream—be there (2012). http://www.livestream.com/
Nathanmarz/storm—GitHub (2012). https://github.com/nathanmarz/storm
Stratosphere—above the clouds (2012). http://stratosphere.eu/
USTREAM, you’re on (2012). http://www.ustream.tv/
Welcome to apache Hadoop! (2012). http://http://hadoop.apache.org/
Xuggle (2012). http://http://www.xuggle.com/
Abadi, D., Ahmad, Y., Balazinska, M., Cetintemel, U., Cherniack, M., Hwang, J., Lindner, W., Maskey, A., Rasin, A., Ryvkina, E., et al.: The design of the Borealis stream processing engine. In: Second Biennial Conference on Innovative Data Systems Research (CIDR ’05), pp. 277–289 (2005)
Google Scholar
Abadi, D., Carney, D., Çetintemel, U., Cherniack, M., Convey, C., Lee, S., Stonebraker, M., Tatbul, N., Zdonik, S.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)
Article Google Scholar
Aldinucci, M., Danelutto, M.: Stream parallel skeleton optimization. In: Proc. of the 11th IASTED International Conference on Parallel and Distributed Computing and Systems (PDCS ’99), pp. 955–962. IASTED/ACTA Press, Cambridge (1999). ftp://ftp.di.unipi.it/pub/Papers/aldinuc/302-114.ps.gz
Google Scholar
Alexandrov, A., Ewen, S., Heimel, M., Hueske, F., Kao, O., Markl, V., Nijkamp, E., Warneke, D.: MapReduce and PACT—comparing data parallel programming models. In: Proc. of the 14th Conference on Database Systems for Business, Technology, and Web (BTW ’11), pp. 25–44. GI, Bonn (2011)
Google Scholar
Babu, S., Widom, J.: Continuous queries over data streams. SIGMOD Rec. 30, 109–120 (2001)
Article Google Scholar
Battré, D., Ewen, S., Hueske, F., Kao, O., Markl, V., Warneke, D.: Nephele/PACTs: a programming model and execution framework for web-scale analytical processing. In: Proc. of the 1st ACM Symposium on Cloud Computing (SoCC ’10), pp. 119–130. ACM, New York (2010)
Chapter Google Scholar
Battré, D., Hovestadt, M., Lohrmann, B., Stanik, A., Warneke, D.: Detecting bottlenecks in parallel DAG-based data flow programs. In: Proc. of the 2010 IEEE Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS ’10), pp. 1–10. IEEE Press, New York (2010)
Chapter Google Scholar
Borkar, V., Carey, M., Grover, R., Onose, N., Vernica, R.: Hyracks: a flexible and extensible foundation for data-intensive computing. In: Proc. of the 2011 IEEE 27th International Conference on Data Engineering (ICDE ’11), pp. 1151–1162. IEEE Press, New York (2011). http://dx.doi.org/10.1109/ICDE.2011.5767921. doi:10.1109/ICDE.2011.5767921
Chapter Google Scholar
Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y., Zdonik, S.: Scalable distributed stream processing. In: Proc. of the First Biennial Conference on Innovative Data Systems Research (CIDR ’03), pp. 257–268 (2003)
Google Scholar
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: Proc. of the 7th USENIX Conference on Networked Systems Design and Implementation (NSDI ’10), USENIX Association, Berkeley (2010). p. 21
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Elnozahy, E.N.M., Alvisi, L., Wang, Y.M., Johnson, D.B.: A survey of rollback-recovery protocols in message-passing systems. ACM Comput. Surv. 34(3), 375–408 (2002). doi:10.1145/568522.568525
Article Google Scholar
Isard, M., Budiu, M., Yu, Y., Birrell, A., Fetterly, D.: Dryad: distributed data-parallel programs from sequential building blocks. Oper. Syst. Rev. 41(3), 59–72 (2007)
Article Google Scholar
Lam, W., Liu, L., Prasad, S., Rajaraman, A., Vacheri, Z., Doan, A.: Muppet: mapreduce-style processing of fast data. Proc. VLDB Endow. 5(12), 1814–1825 (2012)
Google Scholar
Li, B., Mazur, E., Diao, Y., McGregor, A., Shenoy, P.: A platform for scalable one-pass analytics using mapreduce. In: Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD ’11), pp. 985–996. ACM, New York (2011)
Chapter Google Scholar
Lohrmann, B., Warneke, D., Kao, O.: Massively-parallel stream processing under QoS constraints with Nephele. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing (HPDC ’12), pp. 271–282. ACM, New York (2012)
Chapter Google Scholar
Motwani, R., Widom, J., Arasu, A., Babcock, B., Babu, S., Datar, M., Manku, G., Olston, C., Rosenstein, J., Varma, R.: Query processing, approximation, and resource management in a data stream management system. In: First Biennial Conference on Innovative Data Systems Research (CIDR ’03), pp. 245–256 (2003)
Google Scholar
Murray, D., Schwarzkopf, M., Smowton, C., Smith, S., Madhavapeddy, A., Hand, S.: CIEL: a universal execution engine for distributed data-flow computing. In: Proc. of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI ’11), USENIX Association, Berkeley (2011). p. 9
Google Scholar
Neumeyer, L., Robbins, B., Nair, A., Kesari, A.: S4: distributed stream computing platform. In: 2010 IEEE International Conference on Data Mining Workshops (ICDMW ’10), pp. 170–177. IEEE Press, New York (2010)
Chapter Google Scholar
Warneke, D., Kao, O.: Exploiting dynamic resource allocation for efficient parallel data processing in the cloud. IEEE Trans. Parallel Distrib. Syst. 22(6), 985–997 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Berlin, Einsteinufer 17, 10587, Berlin, Germany
Björn Lohrmann & Odej Kao
International Computer Science Institute (ICSI), 1947 Center Street, Suite 600, Berkeley, CA, 94704, USA
Daniel Warneke

Authors

Björn Lohrmann
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Warneke
View author publications
You can also search for this author in PubMed Google Scholar
Odej Kao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Björn Lohrmann.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lohrmann, B., Warneke, D. & Kao, O. Nephele streaming: stream processing under QoS constraints at scale. Cluster Comput 17, 61–78 (2014). https://doi.org/10.1007/s10586-013-0281-8

Download citation

Received: 08 October 2012
Accepted: 15 May 2013
Published: 25 July 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10586-013-0281-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nephele streaming: stream processing under QoS constraints at scale

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Big data analytics: a survey

A brief introduction to distributed systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nephele streaming: stream processing under QoS constraints at scale

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

Big data analytics: a survey

A brief introduction to distributed systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation