List Representation Applied to Sparse Datacubes for Data Warehousing and Data Mining

Wang, Frank; Marir, Frahi; Gordon, John; Helian, Na

doi:10.1007/978-3-540-45080-1_121

Frank Wang⁷,
Frahi Marir⁷,
John Gordon⁸ &
…
Na Helian⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2690))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1265 Accesses

Abstract

Typically 80% of the data in the logical OLAP datacube, the core engine of data warehouses, are zero. When it comes to sparse, the performance quickly degrades due to the heavy I/O overheads in sorting and merging intermediate results. In this work, we first introduce a list representation in main memory for storing and computing datasets. The sparse transaction dataset is compressed as the empty cells are removed Accordingly we propose a new algorithm for association rule mining on the platform of list representation, which just needs to scan the transaction database once to generate all the possible rules. In contrast, the well-known apriori algorithm requires repeated scans of the databases, thereby resulting in heavy I/O accesses particularly when considering large candidate datasets. In our opinion, this new algorithm using list representation economizes storage space and accesses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Colliat, G.: OLAP, relational and multidimensional database system. SIGMOD Record 25, 64–69 (1996)
Article Google Scholar
Shaffer, C.A.: Data Structures and Algorithm Analysis. Prentice-Hall, Englewood Cliffs (2001)
Google Scholar
Pugh, W.: Skip Lists: A Probabilistic Alternative to Balanced Trees. Communications of the ACM 33(6), 668–676 (1990)
Article MathSciNet Google Scholar
Borgelt, C., Kruse, R.: Induction of Association Rules: Apriori Implementation. In: Accepted to the 14th Conference on Computational Statistics, Compstat 2002, Berlin, Germany (2002)
Google Scholar
Agrawal, R., Imielienski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. Conf. on Management of Data, pp. 207–216. ACM Press, New York (1993)
Google Scholar
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. Conf. on the Management of Data (SIGMOD 2000, Dallas, TX). ACM Press, New York (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computing, London Metropolitan University, 166-220 Holloway Road, London, N7 8DB, United Kingdom
Frank Wang, Frahi Marir & Na Helian
e-Science Centre, CCLRC Rutherford Appleton Laboratory, Didcot Oxfordshire, OX11 0QX, United Kingdom
John Gordon

Authors

Frank Wang
View author publications
You can also search for this author in PubMed Google Scholar
Frahi Marir
View author publications
You can also search for this author in PubMed Google Scholar
John Gordon
View author publications
You can also search for this author in PubMed Google Scholar
Na Helian
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Hong Kong
Jiming Liu
Department of Computer Science, Hong Kong Baptist University, Hong Kong
Yiu-ming Cheung
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, F., Marir, F., Gordon, J., Helian, N. (2003). List Representation Applied to Sparse Datacubes for Data Warehousing and Data Mining. In: Liu, J., Cheung, Ym., Yin, H. (eds) Intelligent Data Engineering and Automated Learning. IDEAL 2003. Lecture Notes in Computer Science, vol 2690. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45080-1_121

Download citation

DOI: https://doi.org/10.1007/978-3-540-45080-1_121
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40550-4
Online ISBN: 978-3-540-45080-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics