Incremental Stream Processing of Nested-Relational Queries

Fegaras, Leonidas

doi:10.1007/978-3-319-44403-1_19

Leonidas Fegaras¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9827))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

846 Accesses
1 Citations

Abstract

Current work on stream processing is focused on approximation techniques that calculate approximate answers to simple queries by focusing on a fixed or sliding window that contains the most recent tuples from an input stream and by using condensed synopses to summarize the state. It is widely believed that without using approximation techniques, most interesting queries would be blocking (i.e., they would have to wait for the end of stream to release their results) or unbounded (i.e., their memory requirements would grow proportionally to the stream size, which may be infinite). The goal of this paper is to convert nested-relational queries to incremental stream processing programs automatically. In contrast to most current stream processing systems that calculate approximate answers, our system derives incremental programs that return accurate results. This is accomplished by retaining a state during the query evaluation lifetime and by using incremental evaluation techniques to return an accurate snapshot answer at each time interval that depends on the current state and the data in the current fixed window. Our methods can handle most forms of declarative queries on nested data collections, including arbitrarily nested queries, group-by with aggregation, and equi-joins. We report on a prototype system implementation and we show some preliminary results on evaluating queries on a small computer cluster running Spark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, D.J., Carney, D., Cetintemel, U., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)
Article Google Scholar
Acar, U.A., Blelloch, G.E., Blume, M., Harper, R., Tangwongsan, K.: An experimental analysis of self-adjusting computation. ACM Trans. Program. Lang. Syst. 32(1), 3:1–3:53 (2009)
Article Google Scholar
Acar, U.A., Chen, Y.: Streaming big data with self-adjusting computation. In: Workshop on Data Driven Functional Programming (DDFP) (2013)
Google Scholar
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Symposium on Principles of Database Systems (PODS), pp. 1–16 (2002)
Google Scholar
Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: International Conference on Very Large Data Bases (VLDB), pp. 953–964 (2006)
Google Scholar
Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An annotation management system for relational databases. In: International Conference on Very Large Data Bases (VLDB), pp. 900–911 (2004)
Google Scholar
Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., Pasquin, R.: Incoop: MapReduce for incremental computations. In: ACM Symposium on Cloud Computing (SoCC) (2011)
Google Scholar
Cai, Y., Giarrusso, P.G., Rendel, T., Ostermann, K.: A theory of changes for higher-order languages. Incrementalizing \(\lambda \)-calculi by static differentiation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 145–155 (2014)
Google Scholar
Chandramouli, B., Goldstein, J., Barnett, M., DeLine, R., Fisher, D., Platt, J.C., Terwilliger, J.F., Wernsing, J.: Trill: a high-performance incremental query processor for diverse analytics. In: International Conference on Very Large Data Bases (VLDB), pp. 401–412 (2014)
Google Scholar
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.: TelegraphCQ: continuous data flow processing for an uncertain world. In: Conference on Innovative Data System Research (CIDR) (2003)
Google Scholar
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: USENIX Symposium on Networked Systems Design and Implementation (NSDI), vol. 10, no. (4) (2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Symposium on Operating System Design and Implementation (OSDI) (2004)
Google Scholar
Fegaras, L., Li, C., Gupta, U., Philip, J.J.: XML query optimization in Map-Reduce. In: International Workshop on the Web and Databases (WebDB) (2011)
Google Scholar
Fegaras, L., Li, C., Gupta, U.: An optimization framework for Map-Reduce queries. In: International Conference on Extending Database Technology (EDBT), pp. 26–37 (2012)
Google Scholar
Fegaras, L., Maier, D.: Optimizing object queries using an effective calculus. ACM Trans. Database Syst. (TODS) 25(4), 457–516 (2000)
Article MATH Google Scholar
Apache Flink. http://flink.apache.org/
Gupta, A., Mumick, I.S.: Maintenance of materialized views: problems, techniques, and applications. IEEE Bull. Data Eng. 18(2), 145–157 (1995)
Google Scholar
Apache Hadoop. http://hadoop.apache.org/
Apache Hive. http://hive.apache.org/
Logothetis, D., Olston, C., Reed, B., Webb, K.C., Yocum, K.: Stateful bulk processing for incremental analytics. In: ACM Symposium on Cloud Computing (SoCC) (2010)
Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012)
Article Google Scholar
Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a System for large-scale graph processing. In: ACM Symposium on Principles of Distributed Computing (PODC) (2009)
Google Scholar
McSherry, F., Murray, D.G., Isaacs, R., Isard, M.: Differential dataflow. In: Conference on Innovative Data System Research (CIDR) (2013)
Google Scholar
Mihaylov, S.R., Ives, Z.G., Guha, S.: REX: recursive, delta-based data-centric computation. Proc. VLDB Endow. 5(11), 1280–1291 (2012)
Article Google Scholar
Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: ACM Symposium on Operating Systems Principles (SOSP) (2013)
Google Scholar
Apache MRQL (incubating). http://mrql.incubator.apache.org/
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: ACM SIGMOD International Conference on Management of Data, pp. 1099–1110 (2008)
Google Scholar
Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Symposium on Operating System Design and Implementation (OSDI) (2010)
Google Scholar
Power, R., Li, J.: Piccolo: building fast, distributed programs with partitioned tables. In: Symposium on Operating System Design and Implementation (OSDI) (2010)
Google Scholar
Shinnar, A., Cunningham, D., Herta, B., Saraswat, V.: M3R: increased performance for in-memory Hadoop jobs. Proc. VLDB Endow. 5(12), 1736–1747 (2012)
Article Google Scholar
Apache Spark. http://spark.apache.org/
Apache Storm: A System for Processing Streaming Data in Real Time. http://hortonworks.com/hadoop/storm/
Tangwongsan, K., Hirzel, M., Schneider, S., Wu, K.-L.: General incremental sliding-window aggregation. Proc. VLDB Endow. 8(7), 702–713 (2015)
Article Google Scholar
Valiant, L.G.: A bridging model for parallel computation. Commun. ACM (CACM) 33(8), 103–111 (1990)
Article Google Scholar
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2012)
Google Scholar
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Symposium on Operating Systems Principles (SOSP) (2013)
Google Scholar
Zhang, Y., Chen, S., Wang, Q., Yu, G.: \(i^2\) MapReduce: incremental MapReduce for mining evolving big data. IEEE Trans. Knowl. Data Eng. (TKDE) 27(7), 1906–1919 (2015)
Article Google Scholar

Download references

Acknowledgments

This work is supported in part by the National Science Foundation under the grant CCF-1117369. Our performance evaluations were performed at the Chameleon cloud computing infrastructure, www.chameleoncloud.org, supported by NSF.

Author information

Authors and Affiliations

University of Texas at Arlington, Arlington, USA
Leonidas Fegaras

Authors

Leonidas Fegaras
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Leonidas Fegaras .

Editor information

Editors and Affiliations

Clausthal University of Technology , Clausthal-Zellerfeld, Germany
Sven Hartmann
Victoria University of Wellington , Wellington, New Zealand
Hui Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fegaras, L. (2016). Incremental Stream Processing of Nested-Relational Queries. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-44403-1_19
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44402-4
Online ISBN: 978-3-319-44403-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics