Skip to main content

Reducing Rule Covers with Deterministic Error Bounds

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2637))

Included in the following conference series:

Abstract

The output of boolean association rule mining algorithms is often too large for manual examination. For dense datasets, it is often impractical to even generate all frequent itemsets. The closed itemset approach handles this information overload by pruning “uninteresting” rules following the observation that most rules can be derived from other rules. In this paper, we propose a new framework, namely, the generalized closed (or g-closed) itemset framework. By allowing for a small tolerance in the accuracy of itemset supports, we show that the number of such redundant rules is far more than what was previously estimated. Our scheme can be integrated into both levelwise algorithms (Apriori) and two-pass algorithms (ARMOR). We evaluate its performance by measuring the reduction in output size as well as in response time. Our experiments show that incorporating g-closed itemsets provides significant performance improvements on a variety of databases.

A poster of this paper appeared in Proc. of IEEE Intl. Conf. on Data Engineering (ICDE), March 2003, Bangalore, India.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Aggarwal and P. Yu. Online generation of association rules. In Intl. Conf. on Data Engineering (ICDE), February 1998.

    Google Scholar 

  2. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of Intl. Conf. on Very Large Databases (VLDB), September 1994.

    Google Scholar 

  3. R. Bayardo, R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. In Intl. Conf. on Data Engineering (ICDE), February 1999.

    Google Scholar 

  4. J-F. Boulicaut and A. Bykowski. Frequent closures as a concise representation for binary data mining. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), April 2000.

    Google Scholar 

  5. J-F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by means of free-sets. In European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), September 2000.

    Google Scholar 

  6. G. Dong and J. Li. Interestingness of discovered association rules in terms of neighborhood-based unexpectedness. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 1998.

    Google Scholar 

  7. C. Hidber. Online association rule mining. In Proc. of ACM SIGMOD Intl. Conf. on Management of Data, June 1999.

    Google Scholar 

  8. M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding interesting rules from large sets of discovered association rules. In Intl. Conf. on Information and Knowledge Management (CIKM), November 1994.

    Google Scholar 

  9. B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered association rules. In Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 1999.

    Google Scholar 

  10. G. Manku and R. Motwani. Approximate frequency counts over streaming data. In Proc. of Intl. Conf. on Very Large Databases (VLDB), August 2002.

    Google Scholar 

  11. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. of Intl. Conference on Database Theory (ICDT), January 1999.

    Google Scholar 

  12. J. Pei et al. H-mine: Hyper-structure mining of frequent patterns in large databases. In Intl. Conf. on Data Mining (ICDM), December 2001.

    Google Scholar 

  13. V. Pudi and J. Haritsa. Generalized closed itemsets: Improving the conciseness of rule covers. Technical Report TR-2002-02, DSL, Indian Institute of Science, 2002.

    Google Scholar 

  14. V. Pudi and J. Haritsa. On the efficiency of association-rule mining algorithms. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), May 2002.

    Google Scholar 

  15. R. Taouil, N. Pasquier, Y. Bastide, and L. Lakhal. Mining basis for association rules using closed sets. In Intl. Conf. on Data Engineering (ICDE), February 2000.

    Google Scholar 

  16. M. J. Zaki. Generating non-redundant association rules. In Intl. Conf. on Knowledge Discovery and Data Mining (KDD), August 2000.

    Google Scholar 

  17. M. J. Zaki and C. Hsiao. Charm: An efficient algorithm for closed itemset mining. In SIAM International Conference on Data Mining, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Pudi, V., Haritsa, J.R. (2003). Reducing Rule Covers with Deterministic Error Bounds. In: Whang, KY., Jeon, J., Shim, K., Srivastava, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2003. Lecture Notes in Computer Science(), vol 2637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36175-8_31

Download citation

  • DOI: https://doi.org/10.1007/3-540-36175-8_31

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-04760-5

  • Online ISBN: 978-3-540-36175-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics