Skip to main content

Incremental Stream Processing of Nested-Relational Queries

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9827))

Included in the following conference series:

Abstract

Current work on stream processing is focused on approximation techniques that calculate approximate answers to simple queries by focusing on a fixed or sliding window that contains the most recent tuples from an input stream and by using condensed synopses to summarize the state. It is widely believed that without using approximation techniques, most interesting queries would be blocking (i.e., they would have to wait for the end of stream to release their results) or unbounded (i.e., their memory requirements would grow proportionally to the stream size, which may be infinite). The goal of this paper is to convert nested-relational queries to incremental stream processing programs automatically. In contrast to most current stream processing systems that calculate approximate answers, our system derives incremental programs that return accurate results. This is accomplished by retaining a state during the query evaluation lifetime and by using incremental evaluation techniques to return an accurate snapshot answer at each time interval that depends on the current state and the data in the current fixed window. Our methods can handle most forms of declarative queries on nested data collections, including arbitrarily nested queries, group-by with aggregation, and equi-joins. We report on a prototype system implementation and we show some preliminary results on evaluating queries on a small computer cluster running Spark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, D.J., Carney, D., Cetintemel, U., et al.: Aurora: a new model and architecture for data stream management. VLDB J. 12(2), 120–139 (2003)

    Article  Google Scholar 

  2. Acar, U.A., Blelloch, G.E., Blume, M., Harper, R., Tangwongsan, K.: An experimental analysis of self-adjusting computation. ACM Trans. Program. Lang. Syst. 32(1), 3:1–3:53 (2009)

    Article  Google Scholar 

  3. Acar, U.A., Chen, Y.: Streaming big data with self-adjusting computation. In: Workshop on Data Driven Functional Programming (DDFP) (2013)

    Google Scholar 

  4. Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Symposium on Principles of Database Systems (PODS), pp. 1–16 (2002)

    Google Scholar 

  5. Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: databases with uncertainty and lineage. In: International Conference on Very Large Data Bases (VLDB), pp. 953–964 (2006)

    Google Scholar 

  6. Bhagwat, D., Chiticariu, L., Tan, W.C., Vijayvargiya, G.: An annotation management system for relational databases. In: International Conference on Very Large Data Bases (VLDB), pp. 900–911 (2004)

    Google Scholar 

  7. Bhatotia, P., Wieder, A., Rodrigues, R., Acar, U.A., Pasquin, R.: Incoop: MapReduce for incremental computations. In: ACM Symposium on Cloud Computing (SoCC) (2011)

    Google Scholar 

  8. Cai, Y., Giarrusso, P.G., Rendel, T., Ostermann, K.: A theory of changes for higher-order languages. Incrementalizing \(\lambda \)-calculi by static differentiation. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 145–155 (2014)

    Google Scholar 

  9. Chandramouli, B., Goldstein, J., Barnett, M., DeLine, R., Fisher, D., Platt, J.C., Terwilliger, J.F., Wernsing, J.: Trill: a high-performance incremental query processor for diverse analytics. In: International Conference on Very Large Data Bases (VLDB), pp. 401–412 (2014)

    Google Scholar 

  10. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.: TelegraphCQ: continuous data flow processing for an uncertain world. In: Conference on Innovative Data System Research (CIDR) (2003)

    Google Scholar 

  11. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce online. In: USENIX Symposium on Networked Systems Design and Implementation (NSDI), vol. 10, no. (4) (2010)

    Google Scholar 

  12. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. In: Symposium on Operating System Design and Implementation (OSDI) (2004)

    Google Scholar 

  13. Fegaras, L., Li, C., Gupta, U., Philip, J.J.: XML query optimization in Map-Reduce. In: International Workshop on the Web and Databases (WebDB) (2011)

    Google Scholar 

  14. Fegaras, L., Li, C., Gupta, U.: An optimization framework for Map-Reduce queries. In: International Conference on Extending Database Technology (EDBT), pp. 26–37 (2012)

    Google Scholar 

  15. Fegaras, L., Maier, D.: Optimizing object queries using an effective calculus. ACM Trans. Database Syst. (TODS) 25(4), 457–516 (2000)

    Article  MATH  Google Scholar 

  16. Apache Flink. http://flink.apache.org/

  17. Gupta, A., Mumick, I.S.: Maintenance of materialized views: problems, techniques, and applications. IEEE Bull. Data Eng. 18(2), 145–157 (1995)

    Google Scholar 

  18. Apache Hadoop. http://hadoop.apache.org/

  19. Apache Hive. http://hive.apache.org/

  20. Logothetis, D., Olston, C., Reed, B., Webb, K.C., Yocum, K.: Stateful bulk processing for incremental analytics. In: ACM Symposium on Cloud Computing (SoCC) (2010)

    Google Scholar 

  21. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012)

    Article  Google Scholar 

  22. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Czajkowski, G.: Pregel: a System for large-scale graph processing. In: ACM Symposium on Principles of Distributed Computing (PODC) (2009)

    Google Scholar 

  23. McSherry, F., Murray, D.G., Isaacs, R., Isard, M.: Differential dataflow. In: Conference on Innovative Data System Research (CIDR) (2013)

    Google Scholar 

  24. Mihaylov, S.R., Ives, Z.G., Guha, S.: REX: recursive, delta-based data-centric computation. Proc. VLDB Endow. 5(11), 1280–1291 (2012)

    Article  Google Scholar 

  25. Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: ACM Symposium on Operating Systems Principles (SOSP) (2013)

    Google Scholar 

  26. Apache MRQL (incubating). http://mrql.incubator.apache.org/

  27. Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig Latin: a not-so-foreign language for data processing. In: ACM SIGMOD International Conference on Management of Data, pp. 1099–1110 (2008)

    Google Scholar 

  28. Peng, D., Dabek, F.: Large-scale incremental processing using distributed transactions and notifications. In: Symposium on Operating System Design and Implementation (OSDI) (2010)

    Google Scholar 

  29. Power, R., Li, J.: Piccolo: building fast, distributed programs with partitioned tables. In: Symposium on Operating System Design and Implementation (OSDI) (2010)

    Google Scholar 

  30. Shinnar, A., Cunningham, D., Herta, B., Saraswat, V.: M3R: increased performance for in-memory Hadoop jobs. Proc. VLDB Endow. 5(12), 1736–1747 (2012)

    Article  Google Scholar 

  31. Apache Spark. http://spark.apache.org/

  32. Apache Storm: A System for Processing Streaming Data in Real Time. http://hortonworks.com/hadoop/storm/

  33. Tangwongsan, K., Hirzel, M., Schneider, S., Wu, K.-L.: General incremental sliding-window aggregation. Proc. VLDB Endow. 8(7), 702–713 (2015)

    Article  Google Scholar 

  34. Valiant, L.G.: A bridging model for parallel computation. Commun. ACM (CACM) 33(8), 103–111 (1990)

    Article  Google Scholar 

  35. Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: USENIX Symposium on Networked Systems Design and Implementation (NSDI) (2012)

    Google Scholar 

  36. Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., Stoica, I.: Discretized streams: fault-tolerant streaming computation at scale. In: Symposium on Operating Systems Principles (SOSP) (2013)

    Google Scholar 

  37. Zhang, Y., Chen, S., Wang, Q., Yu, G.: \(i^2\) MapReduce: incremental MapReduce for mining evolving big data. IEEE Trans. Knowl. Data Eng. (TKDE) 27(7), 1906–1919 (2015)

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported in part by the National Science Foundation under the grant CCF-1117369. Our performance evaluations were performed at the Chameleon cloud computing infrastructure, www.chameleoncloud.org, supported by NSF.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonidas Fegaras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Fegaras, L. (2016). Incremental Stream Processing of Nested-Relational Queries. In: Hartmann, S., Ma, H. (eds) Database and Expert Systems Applications. DEXA 2016. Lecture Notes in Computer Science(), vol 9827. Springer, Cham. https://doi.org/10.1007/978-3-319-44403-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44403-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44402-4

  • Online ISBN: 978-3-319-44403-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics