Skip to main content

Designing Parallel Relational Data Warehouses: A Global, Comprehensive Approach

  • Conference paper
New Trends in Databases and Information Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 241))

  • 1408 Accesses

Abstract

The process of designing a parallel data warehouse has two main steps: (1) fragmentation and (2) allocation of so-generated fragments at various nodes. Usually, we split the data warehouse horizontally, allocate fragments over nodes, and finally balance the load over the nodes of the parallel machine. The main drawback of such design approach is that the high communication cost. Therefore, Data Replication (DR) has become a requirement for availability on the one hand but also for minimizing the communication cost on the other hand. In this paper, we present a redundant allocation algorithm for designing shared-nothing parallel relational data warehouses, which is based on the well-known fuzzy k-means clustering algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agrawal, D., Das, S., El Abbadi, A.: Data Management in the Cloud: Challenges and Opportunities. Synthesis Lectures on Data Management. Morgan & Claypool Publishers (2012)

    Google Scholar 

  2. Ahmad, I., Karlapalem, K., Ghafoor, R.A.: Evolutionary algorithms for allocating data in distributed database systems. In: Distributed Database Systems, Distributed and Parallel Databases, pp. 5–32 (2002)

    Google Scholar 

  3. Akal, F., Böhm, K., Schek, H.-J.: OLAP query evaluation in a database cluster: A performance study on intra-query parallelism. In: Manolopoulos, Y., Návrat, P. (eds.) ADBIS 2002. LNCS, vol. 2435, pp. 218–231. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  4. Apers, P.M.G.: Data allocation in distributed database systems. ACM Transactions on Database Systems 13(3), 263–304 (1988)

    Article  Google Scholar 

  5. Bellatreche, L., Benkrid, S.: A joint design approach of partitioning and allocation in parallel data warehouses. In: Pedersen, T.B., Mohania, M.K., Tjoa, A.M. (eds.) DaWaK 2009. LNCS, vol. 5691, pp. 99–110. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Bellatreche, L., Benkrid, S., Crolotte, A., Cuzzocrea, A., Ghazal, A.: The f&a methodology and its experimental validation on a real-life parallel processing database system. In: CISIS 2012, pp. 114–121 (2012)

    Google Scholar 

  7. Bellatreche, L., Cuzzocrea, A., Benkrid, S.: \(\mathcal{F}\)&\(\mathcal{A}\): A methodology for effectively and efficiently designing parallel relational data warehouses on heterogenous database clusters. In: Bach Pedersen, T., Mohania, M.K., Tjoa, A.M. (eds.) DAWAK 2010. LNCS, vol. 6263, pp. 89–104. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  8. Bergsten, B., Couprie, M., Valduriez, P.: Overview of parallel architectures for databases. Comput. J. 36(8), 734–740 (1993)

    Article  Google Scholar 

  9. Bezdek, J.C., Ehrlich, R., Full, W.: Fcm: The fuzzy c-means clustering algorithm. Computers and Geo-sciences 10(2-3), 191–203 (1984)

    Article  Google Scholar 

  10. Ciciani, B., Dias, D.M., Yu, P.S.: Analysis of replication in distributed database systems. IEEE Trans. on Knowl. and Data Eng., 247–261 (1990)

    Google Scholar 

  11. Cuzzocrea, A.: Theoretical and practical aspects of warehousing, querying and mining sensor and streaming data. Journal of Computer and System Science 79(3), 309–311 (2013)

    Article  MathSciNet  Google Scholar 

  12. DeWitt, D., Madden, S., Stonebraker, M.: How to build a high-performance data warehouse, http://db.lcs.mit.edu/madden/high_perf.pdf

  13. Hsiao, H.I., Dewitt, D.J.: Chained declustering: A new availability strategy for multiprocssor database machines. In: ICDE 1990, pp. 456–465 (1990)

    Google Scholar 

  14. Coffman Jr., E.G., Leung, Joseph, Y.-T., Ting, D.W.: Bin packing: Maximizing the number of pieces packed 9, 263–271 (1978)

    Google Scholar 

  15. Lima, A.A.B., Mattoso, M., Valduriez, P.: Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster. In: Lifschitz, S. (ed.) SBBD 2004, Brasilia, Brésil, pp. 92–105 (2004)

    Google Scholar 

  16. Lima, A.B., Furtado, C., Valduriez, P., Mattoso, M.: Parallel olap query processing in database clusters with data replication. distributed and parallel databases. Distributed and Parallel Database Journal 25(1-2), 97–123 (2009)

    Article  Google Scholar 

  17. Loukopoulos, T., Ahmad, I.: Static and adaptive distributed data replication using genetic algorithms. Journal of Parallel and Distributed Computing 64(11), 1270–1285 (2004)

    Article  MATH  Google Scholar 

  18. Menon, S.: Allocating fragments in distributed databases. IEEE Transactions on Parallel and Distributed Systems 16(7), 577–585 (2005)

    Article  Google Scholar 

  19. Nehme, R.V., Bruno, N.: Automated partitioning design in parallel database systems. In: ACM SIGMOD 2011, pp. 1137–1148 (2011)

    Google Scholar 

  20. Pavlo, A., Curino, C., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In: ACM SIGMOD 2012, pp. 61–72. ACM, New York (2012)

    Google Scholar 

  21. Rao, J., Zhang, C., Lohman, G., Megiddo, N.: Automating physical database design in a parallel database. In: ACM SIGMOD 2002, pp. 558–569 (June 2002)

    Google Scholar 

  22. Stöhr, T., Märtens, H., Rahm, E.: Multi-dimensional database allocation for parallel data warehouses. In: VLDB 2000, pp. 273–284 (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Soumia Benkrid .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Benkrid, S., Bellatreche, L., Cuzzocrea, A. (2014). Designing Parallel Relational Data Warehouses: A Global, Comprehensive Approach. In: Catania, B., et al. New Trends in Databases and Information Systems. Advances in Intelligent Systems and Computing, vol 241. Springer, Cham. https://doi.org/10.1007/978-3-319-01863-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-01863-8_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-01862-1

  • Online ISBN: 978-3-319-01863-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics