Abstract
Parallel processing is a flagship approach for answering analytical queries on large-scale database. As the database scale increases, a larger number of processing nodes are likely to be incorporated to increase the degree of parallelism. However, this solution results in an increased probability of node failure. If such a failure happens during query processing, the processing often has to restart from scratch. This temporal cost may not be acceptable for the user. In this paper, we propose PhoeniQ, a fault-tolerant query processing mechanism for analytical parallel database systems. PhoeniQ takes a package-level checkpoint for every operator pipeline and replicates the output of stateful operators among different processing nodes. If a single processing node fails during processing, another node is enabled to resume the execution state of the failed node, so that the query can continue to run. This paper presents our intensive experiments based on our prototype, which demonstrate that PhoeniQ can continue the query processing in the face of node failures with significantly smaller cost than the conventional approach.
Y. Bessho—Currently, he works for NTT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The idea of PhoeniQ can be easily extended to a shared-nothing architecture [26]. Due to the space limitation, we will present further discussion in a separate paper.
- 2.
For simplicity and due to the space limitation, this paper merely presumes a single-node crash failure of processing nodes. The same idea can be easily applied to other cases, such as a double-node failure. Another exploration is necessary to protect against a failure of the storage node.
- 3.
As long as all the non-tail operators are stateless as we have assumed, the reprocessing causes only marginal overhead compared to the entire pipeline processing.
References
Oracle Berkeley DB. https://www.oracle.com/database/berkeley-db/db.html
The Internet of Things: Data from Embedded Systems Will Account for 10% of the Digital Universe by 2020. https://www.emc.com/leadership/digital-universe/2014iview/internet-of-things.htm
The TPC-H benchmark. http://www.tpc.org/tpch/
Abadi, D.J., et al.: The design of the borealis stream processing engine. In: Proceedings CIDR, pp. 277–289 (2005)
Boral, H., et al.: Prototyping bubba, a highly parallel database system. IEEE Trans. Knowl. Data Eng. 2(1), 4–24 (1990)
Borthakur, D.: Petabyte scale databases and storage systems at facebook. In: Proceedings SIGMOD, pp. 1267–1268 (2013)
Carney, D., et al.: Monitoring streams - a new class of data management applications. In: Proceedings VLDB, pp. 215–226 (2002)
Chandramouli, B., Bond, C.N., Babu, S., Yang, J.: Query suspend and resume. In: Proceedings SIGMOD, pp. 557–568 (2007)
Chandrasekaran, S., et al.: Telegraphcq: continuous dataflow processing for an uncertain world. In: Proceedings CIDR (2003)
Chaudhuri, S., Kaushik, R., Ramamurthy, R., Pol, A.: Stop-and-restart style execution for long running decision support queries. In: Proceedings VLDB, pp. 735–745 (2007)
Daniel Weeks: Netflix: Integrating Spark at petabyte scale. https://conferences.oreilly.com/strata/big-data-conference-ny-2015/public/schedule/detail/43373
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
DeWitt, D.J., Gray, J.: Parallel database systems: the future of high performance database systems. Commun. ACM 35(6), 85–98 (1992)
DeWitt, D.J., Madden, S., Stonebraker, M.: How to build a high-performance data warehouse how to build a high-performance data warehouse. http://db.csail.mit.edu/madden/high_perf.pdf
Ghandeharizadeh, S., DeWitt, D.J.: Hybrid-range partitioning strategy: a new declustering strategy for multiprocessor database machines. In: Proceedings VLDB, pp. 481–492 (1990)
Goda, K., Tamura, T., Oguchi, M., Kitsuregawa, M.: Run-time load balancing system on san-connected PC cluster for dynamic injection of CPU and disk resource - a case study of data mining application. Proc. DEXA. 2453, 182–192 (2002)
Han, B., Omiecinski, E., Mark, L., Liu, L.: OTPM: failure handling in data-intensive analytical processing. In: Proceedings CollaborateCom, pp. 35–44. IEEE (2011)
Hauglid, J.O., Nørvåg, K.: Proqid: partial restarts of queries in distributed databases. In: Proceedings CIKM, pp. 1251–1260. ACM (2008)
Hwang, J., Xing, Y., Çetintemel, U., Zdonik, S.B.: A cooperative, self-configuring high-availability solution for stream processing. In: Proceedings ICDE, pp. 176–185 (2007)
Jeff Barr: Migration Complete - Amazon’s Consumer Business Just Turned off its Final Oracle Database. https://aws.amazon.com/blogs/aws/migration-complete-amazons-consumer-business-just-turned-off-its-final-oracle-database/
Kwon, Y., Balazinska, M., Greenberg, A.G.: Fault-tolerant stream processing using a distributed, replicated file system. Proc. VLDB 1(1), 574–585 (2008)
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings SIGMOD, pp. 165–178 (2009)
Reza, S.: Uber’s Big Data Platform: 100+ Petabytes with Minute Latency. https://eng.uber.com/uber-big-data-platform/
Shah, M.A., Hellerstein, J.M., Brewer, E.: Highly available, fault-tolerant, parallel dataflows. In: Proceedings SIGMOD, pp. 827–838. ACM (2004)
Smith, J.E.T., Watson, P.: A rollback-recovery protocol for wide area pipelined data flow computations (2004)
Stonebraker, M.: The case for shared nothing. IEEE Database Eng. Bull. 9, 4–9 (1985)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Bessho, Y., Hayamizu, Y., Goda, K., Kitsuregawa, M. (2020). PhoeniQ: Failure-Tolerant Query Processing in Multi-node Environments. In: Hartmann, S., Küng, J., Kotsis, G., Tjoa, A.M., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2020. Lecture Notes in Computer Science(), vol 12391. Springer, Cham. https://doi.org/10.1007/978-3-030-59003-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-59003-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59002-4
Online ISBN: 978-3-030-59003-1
eBook Packages: Computer ScienceComputer Science (R0)