Sets of Robust Rules, and How to Find Them

Fischer, Jonas; Vreeken, Jilles

doi:10.1007/978-3-030-46150-8_3

Jonas Fischer^14,15 &
Jilles Vreeken^14,15

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11906))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2070 Accesses
9 Citations

Abstract

Association rules are among the most important concepts in data mining. Rules of the form \(X \rightarrow Y\) are simple to understand, simple to act upon, yet can model important local dependencies in data. The problem is, however, that there are so many of them. Both traditional and state-of-the-art frameworks typically yield millions of rules, rather than identifying a small set of rules that capture the most important dependencies of the data. In this paper, we define the problem of association rule mining in terms of the Minimum Description Length principle. That is, we identify the best set of rules as the one that most succinctly describes the data. We show that the resulting optimization problem does not lend itself for exact search, and hence propose Grab, a greedy heuristic to efficiently discover good sets of noise-resistant rules directly from data. Through extensive experiments we show that, unlike the state-of-the-art, Grab does reliably recover the ground truth. On real world data we show it finds reasonable numbers of rules, that upon close inspection give clear insight in the local distribution of the data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://eda.mmci.uni-saarland.de/grab/.
2.
http://eda.mmci.uni-saarland.de/grab/.
3.
No relation to the first author.

References

Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: VLDB, pp. 487–499 (1994)
Google Scholar
Bayardo, R.: Efficiently mining long patterns from databases. In: SIGMOD, pp. 85–93 (1998)
Google Scholar
Calders, T., Goethals, B.: Non-derivable itemset mining. Data Min. Knowl. Disc. 14(1), 171–206 (2007). https://doi.org/10.1007/s10618-006-0054-6
Article MathSciNet Google Scholar
De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min. Knowl. Disc. 23(3), 407–446 (2011). https://doi.org/10.1007/s10618-010-0209-3
Article MathSciNet MATH Google Scholar
Fowkes, J., Sutton, C.: A subsequence interleaving model for sequential pattern mining. In: KDD (2016)
Google Scholar
Grünwald, P.: The Minimum Description Length Principle. MIT Press, Cambridge (2007)
Book Google Scholar
Hämäläinen, W.: Kingfisher: an efficient algorithm for searching for both positive and negative dependency rules with statistical significance measures. Knowl. Inf. Syst. 32(2), 383–414 (2012). https://doi.org/10.1007/s10115-011-0432-2
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. In: SIGMOD, pp. 1–12. ACM (2000)
Google Scholar
Jaroszewicz, S., Simovici, D.A.: Interestingness of frequent itemsets using Bayesian networks as background knowledge. In: KDD, pp. 178–186. ACM (2004)
Google Scholar
Kontkanen, P., Myllymäki, P.: MDL histogram density estimation. In: AISTATS (2007)
Google Scholar
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applications. Springer, New York (1993). https://doi.org/10.1007/978-1-4757-3860-5
Book MATH Google Scholar
Lucchese, C., Orlando, S., Perego, R.: Mining top-k patterns from binary datasets in presence of noise. In: SDM, pp. 165–176 (2010)
Google Scholar
Mampaey, M., Vreeken, J., Tatti, N.: Summarizing data succinctly with the most informative itemsets. ACM TKDD 6, 1–44 (2012)
Article Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Efficient algorithms for discovering association rules. In: KDD, pp. 181–192 (1994)
Google Scholar
Miettinen, P., Vreeken, J.: MDL4BMF: minimum description length for Boolean matrix factorization. ACM TKDD 8(4), A18:1–31 (2014)
Google Scholar
Mitchell-Jones, T.: Societas Europaea Mammalogica (1999). http://www.european-mammals.org
Moerchen, F., Thies, M., Ultsch, A.: Efficient mining of all margin-closed itemsets with applications in temporal knowledge discovery and classification by compression. Knowl. Inf. Syst. 29(1), 55–80 (2011). https://doi.org/10.1007/s10115-010-0329-5
Article Google Scholar
Myllykangas, S., Himberg, J., Böhling, T., Nagy, B., Hollmén, J., Knuutila, S.: DNA copy number amplification profiling of human neoplasms. Oncogene 25(55), 7324–7332 (2006)
Article Google Scholar
Papaxanthos, L., Llinares-López, F., Bodenham, D.A., Borgwardt, K.M.: Finding significant combinations of features in the presence of categorical covariates. In: NIPS, pp. 2271–2279 (2016)
Google Scholar
Pearl, J.: Causality: Models, Reasoning and Inference, 2nd edn. Cambridge University Press, Cambridge (2009)
Book Google Scholar
Pellegrina, L., Vandin, F.: Efficient mining of the most significant patterns with permutation testing. In: KDD, pp. 2070–2079 (2018)
Google Scholar
Rissanen, J.: Modeling by shortest data description. Automatica 14(1), 465–471 (1978)
Article Google Scholar
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Ann. Stat. 11(2), 416–431 (1983)
Article MathSciNet Google Scholar
Tatti, N.: Maximum entropy based significance of itemsets. Knowl. Inf. Syst. 17(1), 57–77 (2008)
Article Google Scholar
Tatti, N., Vreeken, J.: Finding good itemsets by packing data. In: ICDM, pp. 588–597 (2008)
Google Scholar
Vreeken, J., Tatti, N.: Interesting patterns. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 105–134. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2_5
Chapter Google Scholar
Vreeken, J., van Leeuwen, M., Siebes, A.: KRIMP: mining itemsets that compress. Data Min. Knowl. Disc. 23(1), 169–214 (2011). https://doi.org/10.1007/s10618-010-0202-x
Article MathSciNet MATH Google Scholar
Wang, F., Rudin, C.: Falling rule lists. In: AISTATS (2015)
Google Scholar
Webb, G.I.: Discovering significant patterns. Mach. Learn. 68(1), 1–33 (2007). https://doi.org/10.1007/s10994-007-5006-x
Article MathSciNet Google Scholar
Xiang, Y., Jin, R., Fuhry, D., Dragan, F.F.: Succinct summarization of transactional databases: an overlapped hyperrectangle scheme. In: KDD, pp. 758–766 (2008)
Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W.: New algorithms for fast discovery of association rules. In: KDD, August 1997
Google Scholar
Zimmermann, A., Nijssen, S.: Supervised pattern mining and applications to classification. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 425–442. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2_17
Chapter MATH Google Scholar

Download references

Author information

Authors and Affiliations

Max Planck Institute for Informatics, and Saarland University, Saarbrücken, Germany
Jonas Fischer & Jilles Vreeken
CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Jonas Fischer & Jilles Vreeken

Authors

Jonas Fischer
View author publications
You can also search for this author in PubMed Google Scholar
Jilles Vreeken
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonas Fischer .

Editor information

Editors and Affiliations

Leuphana University, Lüneburg, Germany
Ulf Brefeld
IRISA/Inria, Rennes, France
Elisa Fromont
University of Würzburg, Würzburg, Germany
Andreas Hotho
Leiden University, Leiden, The Netherlands
Arno Knobbe
ETH Zurich, Zurich, Switzerland
Marloes Maathuis
Institut National des Sciences Appliquées, Villeurbanne, France
Céline Robardet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fischer, J., Vreeken, J. (2020). Sets of Robust Rules, and How to Find Them. In: Brefeld, U., Fromont, E., Hotho, A., Knobbe, A., Maathuis, M., Robardet, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2019. Lecture Notes in Computer Science(), vol 11906. Springer, Cham. https://doi.org/10.1007/978-3-030-46150-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-46150-8_3
Published: 30 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-46149-2
Online ISBN: 978-3-030-46150-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)