Skip to main content
Log in

Parallel Bifold: Large-scale parallel pattern mining with constraints

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

When computationally feasible, mining huge databases produces tremendously large numbers of frequent patterns. In many cases, it is impractical to mine those datasets due to their sheer size; not only the extent of the existing patterns, but mainly the magnitude of the search space. Many approaches have suggested the use of constraints to apply to the patterns or searching for frequent patterns in parallel. So far, those approaches are still not genuinely effective to mine extremely large datasets.

We propose a method that combines both strategies efficiently, i.e. mining in parallel for the set of patterns while pushing constraints. Using this approach we could mine significantly large datasets; with sizes never reported in the literature before. We are able to effectively discover frequent patterns in a database made of billion transactions using a 32 processors cluster in less than an hour and a half.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Agrawal, T. Imielinski, and A. Swami, “Mining association rules between sets of items in large databases,” in Proceedings of the 1993 ACM-SIGMOD International Conference on Management of Data, Washington, DC, May 1993, pp. 207–216.

  2. R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proceedings of the 1994 International Conference on Very Large Data Bases, Santiago, Chile, September 1994 pp. 487–499

  3. M.-L. Antonie and O.R. Zaïane, “Text document categorization by term association,” in Proceedings of the IEEE 2002 International Conference on Data Mining, Maebashi City, Japan, 2002, pp. 19–26.

  4. W.H.B. Liu and Y. Ma, “Integrating classification and association rule mining,” in 4th International Conference on Knowledge Discovery and Data Mining (KDD’98), New York City, NY, August 1998, pp. 80–86.

  5. R. J. Bayardo, “Efficiently mining long patterns from databases,” in ACM SIGMOD, 1998.

  6. C. Bucila, J. Gehrke, D. Kifer, and W. White, “Dualminer: A dual-pruning algorithm for itemsets with constraints,” in Eight ACM SIGKDD Internationa Conference on Knowledge Discovery and Data Mining, Edmonton, 2002, pp. 42–51.

  7. D. Cheung, K. Hu, and S. Xia, “Asynchronous parallel algorithm for mining association rules on a shared-memory multi-processors,” in Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures, ACM Press, New York, 1998, pp. 279–288.

  8. D.W.-L. Cheung, J. Han, V. Ng, A.W.-C. Fu, and Y. Fu, “A fast distributed algorithm for mining association rules,” in PDIS, 1996, pp. 31–42.

  9. S.M. Chung and C. Luo, “Parallel mining of maximal frequent itemsets from databases,” in 15th IEEE International Conference on Tools with Artificial Intelligence, 2003.

  10. N.L.D. Gamberger and V. Jovanoski, “High confidence association rules for medical diagnosis,” in Intelligent Data Analysis in Medicine and Pharmacology, (IDAMAP’99), Washington, DC, November 1999.

  11. S. Downs and M. Wallace, “Mining association rules from a pediatric primary care decision support system,” in Proceedings of the American Medical Informatics Association Annual Symposium, 2000.

  12. M. El-Hajj and O.R. Zaïane, “Mining with constraints by pruning and avoiding ineffectual processing,” in The 18th Australian Joint Conference on Artificial Intelligence, Springer Verlag LNCS 3809, Sydney, Australia, 2005, pp. 1001–1004.

  13. M. El-Hajj, O.R. Zaïane, and P. Nalos, “Bifold constraint-based mining by simultaneous monotone and anti-monotone checking,” in The Fifth IEEE International Conference on Data Mining (ICDM’05), Houston, TX, 2005.

  14. R. Feldman and H. Hirsh, “Mining associations in text in the presence of background knowledge,” in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, Oregon, August 1996, pp. 343–346.

  15. A. Freitas, “Survey of parallel data mining,” in Proceedings of the 2nd International Conference on the Practical Applications of Knowledge Discovery and Data Mining, January 1996, pp. 287–300.

  16. E.-H. Han, G. Karypis, and V. Kumar, “Scalable parallel data mining for association rules,” in In ACM SIGMOD Conference on Management of Data, 1997.

  17. J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in ACM-SIGMOD, Dallas, 2000.

  18. IBM_Almaden. Quest synthetic data generation code. http://www.almaden.ibm.com/cs/quest/syndata.html.

  19. M.D.J. Srivastava, R. Cooley, and P.-N. Tan, “Web usage mining: Discovery and applications of usage patterns form web data,” in SIGKDD Explorations, vol. 1, no. 2, January 2000.

  20. L. Lakshmanan, R. Ng, J. Han, and A. Pang, “Optimization of constrained frequent set queries with 2-variable constraints,” in Proceedings of ACM SIGMOD Conference on Management of Data (SIGMOD’99), 1999, pp. 157–168.

  21. H.M.M. Klemettinen and H. Toivonen, “Rule discovery in telecommunication alarm data,” Journal of Network and Systems Management, vol. 7, no. 4, 1999.

  22. S. Park, M. Chen, and P.S. Yu, “Efficient parallel data mining for association rules,” in ACM International Conference on Infomration and Knowledge Management, 1995.

  23. S. Parthasarathy, M. Zaki, M. Ogihara, and W. Li, “Parallel data mining for association rules on shared-memory systems,” Knowledge and Information Systems, vol. 3, no. 1, 2001, pp. 1–29.

  24. J. Pie and J. Han, “Can we push more constraints into frequent pattern mining?” in ACM SIGKDD Conference, 2000, pp. 350–354.

  25. I. Pramudiono and M. Kitsuregawa, “Tree structure based parallel frequent pattern mining on pc cluster,” in DEXA, 2003, pp. 537–547.

  26. A.R. and S.J., “Parallel mining of association rules,” in IEEE Transactions in Knowledge and Data Eng., 1996, pp. 962–969.

  27. A. Savasere, E. Omiecinski, and S. Navathe, “An efficient algorithm for mining association rules in large databases,” in Proceedings of the 21st International Conference on Very Large Data Bases, 1995, pp. 432–444.

  28. R.M. Ting, J. Bailey, and K. Ramamohanarao, “Paradualminer: An efficient parallel implementation of the dualminer algorithm,” in Eight Pacific-Asia Conference, PAKDD 2004, Sydney, Australia, pp. 96–105, May 2004.

  29. S.A.A.W. Lin and C. Ruiz, “Efficient adaptive-support association rule mining for recommender systems,” Data Mining and Knowledge Discovery, vol. 6, no. 1, pp. 83–105, January 2002.

  30. B. Wilkinson and M. Allen, “Parallel programming techniques and applications using networked workstations and parallel computers,” in Alan Apt, New Jersy, USA, 1999.

  31. O.R. Zaïane and M. El-Hajj, “Pattern lattice traversal by selective jumps,” in Proceedings of the 2005 International Conference on Data Mining and Knowledge Discovery (ACM-SIGKDD), August 2005.

  32. O.R. Zaïane, M. El-Hajj, and P. Lu, “Fast parallel association rule mining without candidacy generation,” in Proceedings of the IEEE 2001 International Conference on Data Mining, 2001.

  33. O.R. Zaïane, J. Han, and H. Zhu, “Mining recurrent items in multimedia with progressive resolution refinment,” in International Conference on Data Engineering (ICDE’2000), San Diego, CA, pp. 461–470, February 2000.

  34. M.J. Zaki and C.-T. Ho, “Parallel and distributed association mining: A survey, in ieee concurrency,” special issue on parallel mechanisms for data mining. In vol. 7, no. 4, 1999.

  35. M.J. Zaki and C.-T. Ho, “Large-scale parallel data mining,” lecture notes in artificial intelligence, state-of-the-art-survey. In vol. 1759, Springer-Verlag, 2000.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Osmar R. Zaïane.

Additional information

Recommended by: Ahmed Elmagarmid

Rights and permissions

Reprints and permissions

About this article

Cite this article

El-Hajj, M., Zaïane, O.R. Parallel Bifold: Large-scale parallel pattern mining with constraints. Distrib Parallel Databases 20, 225–243 (2006). https://doi.org/10.1007/s10619-006-0445-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10619-006-0445-0

Keywords

Navigation