Skip to main content

Efficient Distributed Parallel Top-Down Computation of ROLAP Data Cube Using MapReduce

  • Conference paper
Data Warehousing and Knowledge Discovery (DaWaK 2012)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7448))

Included in the following conference series:

Abstract

The computation of multidimensional OLAP(On-Line Analytical Processing) data cube takes much time, because a data cube with D dimensions consists of 2D cuboids. To build ROLAP(Relational OLAP) data cubes efficiently, existing algorithms (e.g., GBLP, PipeSort, PipeHash, BUC, etc) use several strategies sharing sort cost and input data scan, reducing data computation, and utilizing parallel processing techniques. On the other hand, MapReduce is recently emerging for the framework processing a huge volume of data like web-scale data in a distributed/parallel manner by using a large number of computers (e.g., several hundred or thousands). In the MapReduce framework, the degree of parallel processing is more important to reduce total execution time than elaborate strategies. In this paper, we propose a distributed parallel processing algorithm, called MRPipeLevel, which takes advantage of the MapReduce framework. It is based on the existing PipeSort algorithm which is one of the most efficient ones for top-down cube computation. The proposed MRPipeLevel algorithm parallelizes cube computation and reduces the number of data scan by pipelining at the same time. We implemented and evaluated the proposed algorithm under the MapReduce framework. Through the experiments, we also identify factors for performance enhancement in MapReduce to process very huge data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gray, J., et al.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. In: Proc. Int’l Conf. on Data Engineering, New Orleans, LA, pp. 152–199 (February 1996)

    Google Scholar 

  2. Konstantinos, M., Stratis, K., Yannis, I., Nikolaos, K.: ROLAP implementations of the data cube. Journal ACM Computing Surveys (CSUR) 39(4), Article No. 12 (2007)

    Google Scholar 

  3. Agarwal, S., et al.: On the Computation of Multidimensional Aggregates. In: Proc. the 22nd Int’l Conf. on Very Large Data Bases, Bombay, India, pp. 506–521 (September 1996)

    Google Scholar 

  4. Kevin, B., Raghu, R.: Bottom-up Computation of Sparse and Iceberg Cubes. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Phiiladelphia, PA, pp. 359–370 (June 1999)

    Google Scholar 

  5. Dehne, F., Eavis, T., Rau-Chaplin, A.: The cgmCUBE Project: Optimizing Parallel Data Cube Generation for ROLAP. Distributed and Parallel Databases 19(1), 29–62 (2006)

    Article  Google Scholar 

  6. Raymond, T.N., Alan, W., Yu, Y.: Iceberg-cube Computation with PC Clusters. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Santa Barbara, CA, pp. 25–36 (June 2001)

    Google Scholar 

  7. Hadoop, http://hadoop.apache.org/

  8. HDFS, http://hadoop.apache.org/hdfs/

  9. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Communication of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  10. Jinguo, Y., Jianging, X., Pingjian, Z., Hu, C.: A Parallel Algorithm for Closed Cube Computation. In: Proc. 7th Int’l Conf. on Computer annd Information Science, Portland, OR, pp. 95–99 (May 2008)

    Google Scholar 

  11. Yuxiang, W., Aibo, S., Junzhou, L.: A MapReduceMerge-based Data Cube Construction Method. In: Proc. 9th Int’l Conf. on Grid and Cooperative Computing, Nanjing, China, pp. 1–6 (November 2010)

    Google Scholar 

  12. Suan, L., Yang-Sae, M., Jinho, K.: Distributed Parallel Top-Down Computation of Data Cube using MapReduce. In: Proc. 3rd Int’l Conf. on Emerging Databases, Incheon, Korea, pp. 303–306 (August 2011)

    Google Scholar 

  13. Arnab, N., Cong, Y., Philip, B., Raghu, R.: Distributed Cube Materialization on Holistic Measures. In: Proc. 27th Int’l Conf. on Data Engineering, Hannover, Germany, pp. 183–194 (April 2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, S., Kim, J., Moon, YS., Lee, W. (2012). Efficient Distributed Parallel Top-Down Computation of ROLAP Data Cube Using MapReduce. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2012. Lecture Notes in Computer Science, vol 7448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32584-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32584-7_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32583-0

  • Online ISBN: 978-3-642-32584-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics