Two Decades of Pattern Mining: Principles and Methods

Soulet, Arnaud

doi:10.1007/978-3-319-61164-8_3

Arnaud Soulet⁸

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 280))

Included in the following conference series:

European Business Intelligence Summer School

2175 Accesses
2 Citations

Abstract

In 1993, Rakesh Agrawal, Tomasz Imielinski and Arun N. Swami published one of the founding papers of pattern mining: “Mining Association Rules Between Sets of Items in Large Databases”. It aimed at enumerating the complete collection of regularities observed in a given dataset like for instance sets of products purchased together in a supermarket. For two decades, pattern mining has been one of the most active fields in Knowledge Discovery in Databases. This paper presents an overview of pattern mining. We first present the principles of language and interestingness that are two key dimensions for defining a pattern mining process to suit a specific task and a specific dataset. The language defines which patterns can be enumerated (itemsets, sequences, graphs). The interestingness measure defines the archetype of patterns to mine (regularities, contrasts or anomalies). Starting from a language and an interestingness measure, we depict the two main categories of pattern mining methods: enumerating all the patterns whose interestingness exceeds a user-specified threshold (satisfaction problem) or enumerating all the patterns whose interest is maximum (optimization problem). Finally, we present an overview of interactive pattern mining which aims at discovering the user’s interest while mining relevant patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
www.core.edu.au, 2010.
2.
www.kdd.org.
3.
PKDD was attached in 2001 to ECML (European Conference on Machine Learning) then the two conferences merged in 2008. Since 2008, PKDD corresponds to ECML/PKDD.
4.
www.ecmlpkdd.org.
5.
www.pakdd.org.
6.
www.cs.uvm.edu/~icdm.
7.
www.siam.org/meetings/archives.php#sdm.
8.
Conversely, pattern mining is not well adapted to continuous data even if there exist proposals [15, 16].
9.
In this paper, we use the same language for mined patterns and the dataset. It is nevertheless possible to use distinct languages benefiting from a cover relation.
10.
This toy example is inspired from an article found on the Web site mic.com.

References

Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM Sigmod Record 22(2), 207–216 (1993)
Article Google Scholar
Giacometti, A., Li, D.H., Marcel, P., Soulet, A.: 20 years of pattern mining: a bibliometric survey. ACM SIGKDD Explor. Newslett. 15(1), 41–50 (2014)
Article Google Scholar
Mitchell, T.M.: Generalization as search. Artif. Intell. 18(2), 203–226 (1982)
Article Google Scholar
Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1(3), 241–258 (1997)
Article Google Scholar
Nijssen, S., Zimmermann, A.: Constraint-Based Pattern Mining, pp. 147–163. Springer, Cham (2014)
Google Scholar
Bonchi, F., Lucchese, C.: Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl. Eng. 60(2), 377–399 (2007)
Article Google Scholar
Pei, J., Han, J.: Constrained frequent pattern mining: a pattern-growth view. ACM SIGKDD Explor. Newslett. 4(1), 31–39 (2002)
Article Google Scholar
Soulet, A., Crémilleux, B.: Mining constraint-based patterns using automatic relaxation. Intell. Data Anal. 13(1), 109–133 (2009)
Google Scholar
Negrevergne, B., Dries, A., Guns, T., Nijssen, S.: Dominance programming for itemset mining. In: 2013 IEEE 13th International Conference on Data Mining, pp. 557–566. IEEE (2013)
Google Scholar
Ugarte Rojas, W., Boizumault, P., Loudni, S., Crémilleux, B., Lepailleur, A.: Mining (Soft-) skypatterns using dynamic CSP. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 71–87. Springer, Cham (2014). doi:10.1007/978-3-319-07046-9_6
Chapter Google Scholar
Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_9
Chapter Google Scholar
Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 582–590. ACM (2011)
Google Scholar
De Raedt, L., Zimmermann, A.: Constraint-based pattern set mining. In: SDM, SIAM, pp. 237–248 (2007)
Google Scholar
Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12), 1951–1983 (2011)
Article Google Scholar
Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Acm Sigmod Record. vol. 25, pp. 1–12. ACM (1996)
Google Scholar
Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting numerical pattern mining with formal concept analysis. In: International Conference on Artificial Intelligence (IJCAI 2011) (2011)
Google Scholar
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)
Google Scholar
Zhao, Q., Bhowmick, S.S.: Sequential pattern mining: a survey. ITechnical Report CAIS Nayang Technological University Singapore, pp. 1–26 (2003)
Google Scholar
Jiang, C., Coenen, F., Zito, M.: A survey of frequent subgraph mining algorithms. Knowl. Eng. Rev. 28(01), 75–105 (2013)
Article Google Scholar
Arimura, H., Uno, T.: Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In: SDM (2009)
Google Scholar
Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceeding of 20th International Conference Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)
Google Scholar
Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 9 (2006)
Article Google Scholar
Vreeken, J., Tatti, N.: Interesting Patterns. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 105–134. Springer, Cham (2014). doi:10.1007/978-3-319-07821-2_5
Google Scholar
Leman, D., Feelders, A., Knobbe, A.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 1–16. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_1
Chapter Google Scholar
Webb, G.I.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data (TKDD) 4(1), 3 (2010)
Google Scholar
Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)
Google Scholar
Winslett, M.: Interview with Rakesh Agrawal. SIGMOD Rec. 32(3), 83–90 (2003)
Google Scholar
Fayyad, U.M., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the KDD-03 panel: data mining: the next 10 years. ACM SIGKDD Explor. Newslett. 5(2), 191–196 (2003)
Article Google Scholar
Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Disc. 15(1), 55–86 (2007)
Article Google Scholar
Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. KDD 97, 283–286 (1997)
Google Scholar
Goethals, B.: Survey on Frequent Pattern Mining. University of Helsinki (2003)
Google Scholar
Cerf, L., Besson, J., Robardet, C., Boulicaut, J.F.: Data peeler: contraint-based closed pattern mining in n-ary relations. In: SDM, vol. 8, pp. 37–48. SIAM (2008)
Google Scholar
Ugarte, W., Boizumault, P., Loudni, S., Crémilleux, B.: Modeling and mining optimal patterns using dynamic CSP. In: 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 33–40. IEEE (2015)
Google Scholar
Fu, A., Kwong, R., Tang, J.: Mining N-most interesting itemsets. In: Raś, Z.W., Ohsuga, S. (eds.) ISMIS 2000. LNCS, vol. 1932, pp. 59–67. Springer, Heidelberg (2000). doi:10.1007/3-540-39963-1_7
Chapter Google Scholar
Herrera, F., Carmona, C.J., González, P., Del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)
Article Google Scholar
Soulet, A., Raïssi, C., Plantevit, M., Cremilleux, B.: Mining dominant patterns in the sky. In: 2011 IEEE 11th International Conference on Data Mining, pp. 655–664. IEEE (2011)
Google Scholar
Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J.-F., Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS, vol. 3848, pp. 64–80. Springer, Heidelberg (2006). doi:10.1007/11615576_4
Chapter Google Scholar
Hamrouni, T.: Key roles of closed sets and minimal generators in concise representations of frequent patterns. Intell. Data Anal. 16(4), 581–631 (2012)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999). doi:10.1007/3-540-49257-7_25
Chapter Google Scholar
Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer Science & Business Media, Heidelberg (2012)
Google Scholar
Dzyuba, V., Leeuwen, M.V., Nijssen, S., De Raedt, L.: Interactive learning of pattern rankings. Int. J. Artif. Intell. Tools 23(06), 1460026 (2014)
Article Google Scholar
Bhuiyan, M., Mukhopadhyay, S., Hasan, M.A.: Interactive pattern mining on hidden data: a sampling-based solution. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 95–104. ACM (2012)
Google Scholar
Rueping, S.: Ranking interesting subgroups. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 913–920. ACM (2009)
Google Scholar
Hasan, M.A., Zaki, M.J.: Output space sampling for graph patterns. PVLDB 2(1), 730–741 (2009)
Google Scholar
Bendimerad, A.A., Plantevit, M., Robardet, C.: Unsupervised exceptional attributed sub-graph mining in urban data. In: IEEE 16th International Conference on Data Mining (ICDM), pp. 21–30. IEEE (2016)
Google Scholar
Moens, S., Boley, M., Goethals, B.: Providing concise database covers instantly by recursive tile sampling. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 216–227. Springer, Cham (2014). doi:10.1007/978-3-319-11812-3_19
Google Scholar
Giacometti, A., Soulet, A.: Frequent pattern outlier detection without exhaustive mining. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS, vol. 9652, pp. 196–207. Springer, Cham (2016). doi:10.1007/978-3-319-31750-2_16
Chapter Google Scholar

Download references

Acknowledgments

The author would like to thank Bruno Crémilleux, Arnaud Giacometti and Marc Plantevit for many fruitful discussions. The author would also like to thank the anonymous reviewers and Patrick Marcel for their helpful comments that greatly contributed to improve the final version of the paper.

Author information

Authors and Affiliations

University of Tours, Blois, France
Arnaud Soulet

Authors

Arnaud Soulet
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnaud Soulet .

Editor information

Editors and Affiliations

University of Tours, Tours, France
Patrick Marcel
Dept. of Computer and Decision Engg. (CoDE), Universite Libre de Bruxelles, Brussels, Belgium
Esteban Zimányi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soulet, A. (2017). Two Decades of Pattern Mining: Principles and Methods. In: Marcel, P., Zimányi, E. (eds) Business Intelligence. eBISS 2016. Lecture Notes in Business Information Processing, vol 280. Springer, Cham. https://doi.org/10.1007/978-3-319-61164-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-61164-8_3
Published: 04 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61163-1
Online ISBN: 978-3-319-61164-8
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics