Abstract
Frequent pattern mining is an important data mining problem with wide applications. The huge number of discovered frequent patterns pose great challenge for users to explore and understand them. It is desirable to accurately summarizing the set of frequent patterns into a small number of patterns or profiles so that users can easily explore them. In this paper, we employ a probability model to represent a set of frequent patterns and give two methods of estimating the support of a pattern from the model. Based on the model, we develop an approach to grouping a set of frequent patterns into k profiles and the support of frequent pattern can be estimated fairly accurately from a relative small number of profiles. Empirical studies show that our method can achieve compact and accurate summarization in real-life data and the support of frequent patterns can be restored much more accurately than the previous method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Afrati, F., Gionis, A., Mannila, H.: Approximating a collection of frequent sets. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 12–19 (2004)
Agarwal, R., Aggarwal, C., Prasad, V.V.V.: Depth first generation of long patterns. In: Proc. of KDD (2000)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. 1994 Int. Conf. Very Large Data Bases (VLDB 1994), pp. 487–499 (September 1994)
Bayardo, R.J., Agrawal, R.: Mining the most intersting rules. In: Proc. of ACM SIGKDD (1999)
Calders, T., Goethals, B.: Mining all non-derivable frequent itemsets. In: Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery (2002)
Chow, C., Liu, C.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14, 462–467 (1968)
Cong, G., Tan, K.-L., Tung, A.K.H., Xu, X.: Mining top-k covering rule groups for gene expression data. In: Proceedings of the ACM SIGMOD international conference on Management of data (2005)
Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. The MIT Press, Cambridge (1990)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD 2000) (2000)
Han, J., Wang, J., Lu, Y., Tzvetkov, P.: Mining top.k frequent closed patterns without minimum support. In: Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM) (2002)
Liu, B., Hsu, W., Ma, Y.: Pruning and summarizing the discovered associations. In: ACM KDD (1999)
Omiecinski, E.R.: Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering 15(1), 57–69 (2003)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, Springer, Heidelberg (1998)
Pavlov, D., Mannila, H., Smyth, P.: Beyond independence: Probabilistic models for query approximation on binary transaction data. IEEE Transactions on Knowledge and Data Engineering 15(6), 1409–1421 (2003)
Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. k Morgan Kaufmann Publishers Inc, San Francisco (1988)
Pei, J., Dong, G., Zou, W., Han, J.: Mining condensed frequent-pattern bases. Knowl. Inf. Syst. 6(5), 570–594 (2004)
Wang, J., Karypis, G.: SUMMARY: Efficiently summarizing transactions for clustering. In: Proceedings of the 2004 IEEE International Conference on Data Mining (ICDM) (2004)
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: VLDB (2005)
Yan, X., Cheng, H., Han, J., Xin, D.: Summarizing itemset patterns: a profile-based approach. In: Proceeding of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining (2005)
Yang, C., Fayyad, U., Bradley, P.S.: Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining (2001)
Zaki, M.: Generating non-redundant association rules. In: Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining (KDD 2000) (2000)
Zheng, Z., Kohavi, R., Mason, L.: Real world performance of association rule algorithms. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Cong, G., Cui, B., Li, Y., Zhang, Z. (2006). Summarizing Frequent Patterns Using Profiles. In: Li Lee, M., Tan, KL., Wuwongse, V. (eds) Database Systems for Advanced Applications. DASFAA 2006. Lecture Notes in Computer Science, vol 3882. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733836_14
Download citation
DOI: https://doi.org/10.1007/11733836_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33337-1
Online ISBN: 978-3-540-33338-8
eBook Packages: Computer ScienceComputer Science (R0)