Skip to main content

Two Decades of Pattern Mining: Principles and Methods

  • Conference paper
  • First Online:
Business Intelligence (eBISS 2016)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 280))

Included in the following conference series:

Abstract

In 1993, Rakesh Agrawal, Tomasz Imielinski and Arun N. Swami published one of the founding papers of pattern mining: “Mining Association Rules Between Sets of Items in Large Databases”. It aimed at enumerating the complete collection of regularities observed in a given dataset like for instance sets of products purchased together in a supermarket. For two decades, pattern mining has been one of the most active fields in Knowledge Discovery in Databases. This paper presents an overview of pattern mining. We first present the principles of language and interestingness that are two key dimensions for defining a pattern mining process to suit a specific task and a specific dataset. The language defines which patterns can be enumerated (itemsets, sequences, graphs). The interestingness measure defines the archetype of patterns to mine (regularities, contrasts or anomalies). Starting from a language and an interestingness measure, we depict the two main categories of pattern mining methods: enumerating all the patterns whose interestingness exceeds a user-specified threshold (satisfaction problem) or enumerating all the patterns whose interest is maximum (optimization problem). Finally, we present an overview of interactive pattern mining which aims at discovering the user’s interest while mining relevant patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    www.core.edu.au, 2010.

  2. 2.

    www.kdd.org.

  3. 3.

    PKDD was attached in 2001 to ECML (European Conference on Machine Learning) then the two conferences merged in 2008. Since 2008, PKDD corresponds to ECML/PKDD.

  4. 4.

    www.ecmlpkdd.org.

  5. 5.

    www.pakdd.org.

  6. 6.

    www.cs.uvm.edu/~icdm.

  7. 7.

    www.siam.org/meetings/archives.php#sdm.

  8. 8.

    Conversely, pattern mining is not well adapted to continuous data even if there exist proposals [15, 16].

  9. 9.

    In this paper, we use the same language for mined patterns and the dataset. It is nevertheless possible to use distinct languages benefiting from a cover relation.

  10. 10.

    This toy example is inspired from an article found on the Web site mic.com.

References

  1. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. ACM Sigmod Record 22(2), 207–216 (1993)

    Article  Google Scholar 

  2. Giacometti, A., Li, D.H., Marcel, P., Soulet, A.: 20 years of pattern mining: a bibliometric survey. ACM SIGKDD Explor. Newslett. 15(1), 41–50 (2014)

    Article  Google Scholar 

  3. Mitchell, T.M.: Generalization as search. Artif. Intell. 18(2), 203–226 (1982)

    Article  Google Scholar 

  4. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Min. Knowl. Disc. 1(3), 241–258 (1997)

    Article  Google Scholar 

  5. Nijssen, S., Zimmermann, A.: Constraint-Based Pattern Mining, pp. 147–163. Springer, Cham (2014)

    Google Scholar 

  6. Bonchi, F., Lucchese, C.: Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl. Eng. 60(2), 377–399 (2007)

    Article  Google Scholar 

  7. Pei, J., Han, J.: Constrained frequent pattern mining: a pattern-growth view. ACM SIGKDD Explor. Newslett. 4(1), 31–39 (2002)

    Article  Google Scholar 

  8. Soulet, A., Crémilleux, B.: Mining constraint-based patterns using automatic relaxation. Intell. Data Anal. 13(1), 109–133 (2009)

    Google Scholar 

  9. Negrevergne, B., Dries, A., Guns, T., Nijssen, S.: Dominance programming for itemset mining. In: 2013 IEEE 13th International Conference on Data Mining, pp. 557–566. IEEE (2013)

    Google Scholar 

  10. Ugarte Rojas, W., Boizumault, P., Loudni, S., Crémilleux, B., Lepailleur, A.: Mining (Soft-) skypatterns using dynamic CSP. In: Simonis, H. (ed.) CPAIOR 2014. LNCS, vol. 8451, pp. 71–87. Springer, Cham (2014). doi:10.1007/978-3-319-07046-9_6

    Chapter  Google Scholar 

  11. Leeuwen, M.: Interactive data exploration using pattern mining. In: Holzinger, A., Jurisica, I. (eds.) Interactive Knowledge Discovery and Data Mining in Biomedical Informatics. LNCS, vol. 8401, pp. 169–182. Springer, Heidelberg (2014). doi:10.1007/978-3-662-43968-5_9

    Chapter  Google Scholar 

  12. Boley, M., Lucchese, C., Paurat, D., Gärtner, T.: Direct local pattern sampling by efficient two-step random procedures. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 582–590. ACM (2011)

    Google Scholar 

  13. De Raedt, L., Zimmermann, A.: Constraint-based pattern set mining. In: SDM, SIAM, pp. 237–248 (2007)

    Google Scholar 

  14. Guns, T., Nijssen, S., De Raedt, L.: Itemset mining: a constraint programming perspective. Artif. Intell. 175(12), 1951–1983 (2011)

    Article  Google Scholar 

  15. Srikant, R., Agrawal, R.: Mining quantitative association rules in large relational tables. In: Acm Sigmod Record. vol. 25, pp. 1–12. ACM (1996)

    Google Scholar 

  16. Kaytoue, M., Kuznetsov, S.O., Napoli, A.: Revisiting numerical pattern mining with formal concept analysis. In: International Conference on Artificial Intelligence (IJCAI 2011) (2011)

    Google Scholar 

  17. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14. IEEE (1995)

    Google Scholar 

  18. Zhao, Q., Bhowmick, S.S.: Sequential pattern mining: a survey. ITechnical Report CAIS Nayang Technological University Singapore, pp. 1–26 (2003)

    Google Scholar 

  19. Jiang, C., Coenen, F., Zito, M.: A survey of frequent subgraph mining algorithms. Knowl. Eng. Rev. 28(01), 75–105 (2013)

    Article  Google Scholar 

  20. Arimura, H., Uno, T.: Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems. In: SDM (2009)

    Google Scholar 

  21. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceeding of 20th International Conference Very Large Data Bases, VLDB, vol. 1215, pp. 487–499 (1994)

    Google Scholar 

  22. Geng, L., Hamilton, H.J.: Interestingness measures for data mining: a survey. ACM Comput. Surv. (CSUR) 38(3), 9 (2006)

    Article  Google Scholar 

  23. Vreeken, J., Tatti, N.: Interesting Patterns. In: Aggarwal, C.C., Han, J. (eds.) Frequent Pattern Mining, pp. 105–134. Springer, Cham (2014). doi:10.1007/978-3-319-07821-2_5

    Google Scholar 

  24. Leman, D., Feelders, A., Knobbe, A.: Exceptional model mining. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008. LNCS, vol. 5212, pp. 1–16. Springer, Heidelberg (2008). doi:10.1007/978-3-540-87481-2_1

    Chapter  Google Scholar 

  25. Webb, G.I.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Discov. Data (TKDD) 4(1), 3 (2010)

    Google Scholar 

  26. Tan, P.N., Kumar, V., Srivastava, J.: Selecting the right interestingness measure for association patterns. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 32–41 (2002)

    Google Scholar 

  27. Winslett, M.: Interview with Rakesh Agrawal. SIGMOD Rec. 32(3), 83–90 (2003)

    Google Scholar 

  28. Fayyad, U.M., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the KDD-03 panel: data mining: the next 10 years. ACM SIGKDD Explor. Newslett. 5(2), 191–196 (2003)

    Article  Google Scholar 

  29. Han, J., Cheng, H., Xin, D., Yan, X.: Frequent pattern mining: current status and future directions. Data Min. Knowl. Disc. 15(1), 55–86 (2007)

    Article  Google Scholar 

  30. Zaki, M.J., Parthasarathy, S., Ogihara, M., Li, W., et al.: New algorithms for fast discovery of association rules. KDD 97, 283–286 (1997)

    Google Scholar 

  31. Goethals, B.: Survey on Frequent Pattern Mining. University of Helsinki (2003)

    Google Scholar 

  32. Cerf, L., Besson, J., Robardet, C., Boulicaut, J.F.: Data peeler: contraint-based closed pattern mining in n-ary relations. In: SDM, vol. 8, pp. 37–48. SIAM (2008)

    Google Scholar 

  33. Ugarte, W., Boizumault, P., Loudni, S., Crémilleux, B.: Modeling and mining optimal patterns using dynamic CSP. In: 27th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 33–40. IEEE (2015)

    Google Scholar 

  34. Fu, A., Kwong, R., Tang, J.: Mining N-most interesting itemsets. In: Raś, Z.W., Ohsuga, S. (eds.) ISMIS 2000. LNCS, vol. 1932, pp. 59–67. Springer, Heidelberg (2000). doi:10.1007/3-540-39963-1_7

    Chapter  Google Scholar 

  35. Herrera, F., Carmona, C.J., González, P., Del Jesus, M.J.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2011)

    Article  Google Scholar 

  36. Soulet, A., Raïssi, C., Plantevit, M., Cremilleux, B.: Mining dominant patterns in the sky. In: 2011 IEEE 11th International Conference on Data Mining, pp. 655–664. IEEE (2011)

    Google Scholar 

  37. Calders, T., Rigotti, C., Boulicaut, J.-F.: A survey on condensed representations for frequent sets. In: Boulicaut, J.-F., Raedt, L., Mannila, H. (eds.) Constraint-Based Mining and Inductive Databases. LNCS, vol. 3848, pp. 64–80. Springer, Heidelberg (2006). doi:10.1007/11615576_4

    Chapter  Google Scholar 

  38. Hamrouni, T.: Key roles of closed sets and minimal generators in concise representations of frequent patterns. Intell. Data Anal. 16(4), 581–631 (2012)

    Google Scholar 

  39. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Buneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1999). doi:10.1007/3-540-49257-7_25

    Chapter  Google Scholar 

  40. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer Science & Business Media, Heidelberg (2012)

    Google Scholar 

  41. Dzyuba, V., Leeuwen, M.V., Nijssen, S., De Raedt, L.: Interactive learning of pattern rankings. Int. J. Artif. Intell. Tools 23(06), 1460026 (2014)

    Article  Google Scholar 

  42. Bhuiyan, M., Mukhopadhyay, S., Hasan, M.A.: Interactive pattern mining on hidden data: a sampling-based solution. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 95–104. ACM (2012)

    Google Scholar 

  43. Rueping, S.: Ranking interesting subgroups. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 913–920. ACM (2009)

    Google Scholar 

  44. Hasan, M.A., Zaki, M.J.: Output space sampling for graph patterns. PVLDB 2(1), 730–741 (2009)

    Google Scholar 

  45. Bendimerad, A.A., Plantevit, M., Robardet, C.: Unsupervised exceptional attributed sub-graph mining in urban data. In: IEEE 16th International Conference on Data Mining (ICDM), pp. 21–30. IEEE (2016)

    Google Scholar 

  46. Moens, S., Boley, M., Goethals, B.: Providing concise database covers instantly by recursive tile sampling. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds.) DS 2014. LNCS, vol. 8777, pp. 216–227. Springer, Cham (2014). doi:10.1007/978-3-319-11812-3_19

    Google Scholar 

  47. Giacometti, A., Soulet, A.: Frequent pattern outlier detection without exhaustive mining. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J.Z., Wang, R. (eds.) PAKDD 2016. LNCS, vol. 9652, pp. 196–207. Springer, Cham (2016). doi:10.1007/978-3-319-31750-2_16

    Chapter  Google Scholar 

Download references

Acknowledgments

The author would like to thank Bruno Crémilleux, Arnaud Giacometti and Marc Plantevit for many fruitful discussions. The author would also like to thank the anonymous reviewers and Patrick Marcel for their helpful comments that greatly contributed to improve the final version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Arnaud Soulet .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Soulet, A. (2017). Two Decades of Pattern Mining: Principles and Methods. In: Marcel, P., Zimányi, E. (eds) Business Intelligence. eBISS 2016. Lecture Notes in Business Information Processing, vol 280. Springer, Cham. https://doi.org/10.1007/978-3-319-61164-8_3

Download citation

Publish with us

Policies and ethics