Abstract
The computation of multidimensional OLAP(On-Line Analytical Processing) data cube takes much time, because a data cube with D dimensions consists of 2D cuboids. To build ROLAP(Relational OLAP) data cubes efficiently, existing algorithms (e.g., GBLP, PipeSort, PipeHash, BUC, etc) use several strategies sharing sort cost and input data scan, reducing data computation, and utilizing parallel processing techniques. On the other hand, MapReduce is recently emerging for the framework processing a huge volume of data like web-scale data in a distributed/parallel manner by using a large number of computers (e.g., several hundred or thousands). In the MapReduce framework, the degree of parallel processing is more important to reduce total execution time than elaborate strategies. In this paper, we propose a distributed parallel processing algorithm, called MRPipeLevel, which takes advantage of the MapReduce framework. It is based on the existing PipeSort algorithm which is one of the most efficient ones for top-down cube computation. The proposed MRPipeLevel algorithm parallelizes cube computation and reduces the number of data scan by pipelining at the same time. We implemented and evaluated the proposed algorithm under the MapReduce framework. Through the experiments, we also identify factors for performance enhancement in MapReduce to process very huge data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Gray, J., et al.: Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. In: Proc. Int’l Conf. on Data Engineering, New Orleans, LA, pp. 152–199 (February 1996)
Konstantinos, M., Stratis, K., Yannis, I., Nikolaos, K.: ROLAP implementations of the data cube. Journal ACM Computing Surveys (CSUR)Â 39(4), Article No. 12 (2007)
Agarwal, S., et al.: On the Computation of Multidimensional Aggregates. In: Proc. the 22nd Int’l Conf. on Very Large Data Bases, Bombay, India, pp. 506–521 (September 1996)
Kevin, B., Raghu, R.: Bottom-up Computation of Sparse and Iceberg Cubes. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Phiiladelphia, PA, pp. 359–370 (June 1999)
Dehne, F., Eavis, T., Rau-Chaplin, A.: The cgmCUBE Project: Optimizing Parallel Data Cube Generation for ROLAP. Distributed and Parallel Databases 19(1), 29–62 (2006)
Raymond, T.N., Alan, W., Yu, Y.: Iceberg-cube Computation with PC Clusters. In: Proc. Int’l Conf. on Management of Data, ACM SIGMOD, Santa Barbara, CA, pp. 25–36 (June 2001)
Hadoop, http://hadoop.apache.org/
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Communication of the ACM 51(1), 107–113 (2008)
Jinguo, Y., Jianging, X., Pingjian, Z., Hu, C.: A Parallel Algorithm for Closed Cube Computation. In: Proc. 7th Int’l Conf. on Computer annd Information Science, Portland, OR, pp. 95–99 (May 2008)
Yuxiang, W., Aibo, S., Junzhou, L.: A MapReduceMerge-based Data Cube Construction Method. In: Proc. 9th Int’l Conf. on Grid and Cooperative Computing, Nanjing, China, pp. 1–6 (November 2010)
Suan, L., Yang-Sae, M., Jinho, K.: Distributed Parallel Top-Down Computation of Data Cube using MapReduce. In: Proc. 3rd Int’l Conf. on Emerging Databases, Incheon, Korea, pp. 303–306 (August 2011)
Arnab, N., Cong, Y., Philip, B., Raghu, R.: Distributed Cube Materialization on Holistic Measures. In: Proc. 27th Int’l Conf. on Data Engineering, Hannover, Germany, pp. 183–194 (April 2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, S., Kim, J., Moon, YS., Lee, W. (2012). Efficient Distributed Parallel Top-Down Computation of ROLAP Data Cube Using MapReduce. In: Cuzzocrea, A., Dayal, U. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2012. Lecture Notes in Computer Science, vol 7448. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32584-7_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-32584-7_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32583-0
Online ISBN: 978-3-642-32584-7
eBook Packages: Computer ScienceComputer Science (R0)