Efficient Distributed Parallel Top-Down Computation of ROLAP Data Cube Using MapReduce

Lee, Suan; Kim, Jinho; Moon, Yang-Sae; Lee, Wookey

doi:10.1007/978-3-642-32584-7_14

Suan Lee¹⁸,
Jinho Kim¹⁸,
Yang-Sae Moon¹⁸ &
…
Wookey Lee¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7448))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

2391 Accesses
9 Citations

Abstract

The computation of multidimensional OLAP(On-Line Analytical Processing) data cube takes much time, because a data cube with D dimensions consists of 2^D cuboids. To build ROLAP(Relational OLAP) data cubes efficiently, existing algorithms (e.g., GBLP, PipeSort, PipeHash, BUC, etc) use several strategies sharing sort cost and input data scan, reducing data computation, and utilizing parallel processing techniques. On the other hand, MapReduce is recently emerging for the framework processing a huge volume of data like web-scale data in a distributed/parallel manner by using a large number of computers (e.g., several hundred or thousands). In the MapReduce framework, the degree of parallel processing is more important to reduce total execution time than elaborate strategies. In this paper, we propose a distributed parallel processing algorithm, called MRPipeLevel, which takes advantage of the MapReduce framework. It is based on the existing PipeSort algorithm which is one of the most efficient ones for top-down cube computation. The proposed MRPipeLevel algorithm parallelizes cube computation and reduces the number of data scan by pipelining at the same time. We implemented and evaluated the proposed algorithm under the MapReduce framework. Through the experiments, we also identify factors for performance enhancement in MapReduce to process very huge data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Gray, J., et al.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. In: Proc. Int’l Conf. on Data Engineering, New Orleans, LA, pp. 152–199 (February 1996)
Google Scholar
Konstantinos, M., Stratis, K., Yannis, I., Nikolaos, K.: ROLAP implementations of the data cube. Journal ACM Computing Surveys (CSUR) 39(4), Article No. 12 (2007)
Google Scholar
Agarwal, S., et al.: On the Computation of Multidimensional Aggregates. In: Proc. the 22nd Int’l Conf. on Very Large Data Bases, Bombay, India, pp. 506–521 (September 1996)
Google Scholar
Kevin, B., Raghu, R.: Bottom-up Computation of Sparse and Iceberg Cubes. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Phiiladelphia, PA, pp. 359–370 (June 1999)
Google Scholar
Dehne, F., Eavis, T., Rau-Chaplin, A.: The cgmCUBE Project: Optimizing Parallel Data Cube Generation for ROLAP. Distributed and Parallel Databases 19(1), 29–62 (2006)
Article Google Scholar
Raymond, T.N., Alan, W., Yu, Y.: Iceberg-cube Computation with PC Clusters. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Santa Barbara, CA, pp. 25–36 (June 2001)
Google Scholar
Hadoop, http://hadoop.apache.org/
HDFS, http://hadoop.apache.org/hdfs/
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Communication of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Jinguo, Y., Jianging, X., Pingjian, Z., Hu, C.: A Parallel Algorithm for Closed Cube Computation. In: Proc. 7th Int’l Conf. on Computer annd Information Science, Portland, OR, pp. 95–99 (May 2008)
Google Scholar
Yuxiang, W., Aibo, S., Junzhou, L.: A MapReduceMerge-based Data Cube Construction Method. In: Proc. 9th Int’l Conf. on Grid and Cooperative Computing, Nanjing, China, pp. 1–6 (November 2010)
Google Scholar
Suan, L., Yang-Sae, M., Jinho, K.: Distributed Parallel Top-Down Computation of Data Cube using MapReduce. In: Proc. 3rd Int’l Conf. on Emerging Databases, Incheon, Korea, pp. 303–306 (August 2011)
Google Scholar
Arnab, N., Cong, Y., Philip, B., Raghu, R.: Distributed Cube Materialization on Holistic Measures. In: Proc. 27th Int’l Conf. on Data Engineering, Hannover, Germany, pp. 183–194 (April 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Kangwon National University, 192-1, Hyoja2-Dong, Chuncheon, Kangwon, Korea
Suan Lee, Jinho Kim & Yang-Sae Moon
Department of Industrial Engineering, Inha University, 100 Inha-ro, Nam-gu, Incheon, Korea
Wookey Lee

Authors

Suan Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jinho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Yang-Sae Moon
View author publications
You can also search for this author in PubMed Google Scholar
Wookey Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICAR-CNR and University of Calabria, via P. Bucci 41C, 87036, Rende (CS), Italy
Alfredo Cuzzocrea
Hewlett Packard Labs, 1501 Page Mill Road, MS 1142, 94304, Palo Alto, CA, USA
Umeshwar Dayal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, S., Kim, J., Moon, YS., Lee, W. (2012). Efficient Distributed Parallel Top-Down Computation of ROLAP Data Cube Using MapReduce. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2012. Lecture Notes in Computer Science, vol 7448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32584-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-642-32584-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32583-0
Online ISBN: 978-3-642-32584-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics