Skip to main content
Log in

Incrementally mining high utility patterns based on pre-large concept

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

In traditional association rule mining, most algorithms are designed to discover frequent itemsets from a binary database. Utility mining was thus proposed to measure the utility values of purchased items for revealing high utility itemsets from a quantitative database. In the past, a two-phase high utility mining algorithm was thus proposed for efficiently discovering high utility itemsets from a quantitative database. In dynamic data mining, transactions may be inserted, deleted, or modified from a database. In this case, a batch mining procedure must rescan the whole updated database to maintain the up-to-date information. Designing an efficient approach for handling dynamic databases is thus a critical research issue in utility mining. In this paper, an incremental mining algorithm is proposed for efficiently maintaining discovered high utility itemsets based on pre-large concepts. Itemsets are first partitioned into three parts according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in inserted transactions. Individual procedures are then executed for each part. Experimental results show that the proposed incremental high utility mining algorithm outperforms existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Abbreviations

I :

A set of m items, I={i 1,i 2,…,i j ,…,i m }, in which each item i j has its own profit value p j ;

P :

The profit table, {p 1,p 2,…,p j ,…,p m }, in which p j is the profit value of item i j ;

D :

The original quantitative database, D={T 1,T 2,…,T k ,…,T n }, in which each transaction contains several items with purchase quantities;

d :

The new transactions, d={t 1,t 2,…,t k ,…,t n }, in which each transaction contains several items with purchase quantities;

U :

The entire updated database, i.e., Dd;

TU D :

The total utility of the transactions in D;

TU d :

The total utility of the transactions in d;

TU U :

The total utility of the transactions in U;

q kj :

The quantity of item i j in transaction t k ;

u kj :

The utility of item i j in transaction t k , which is calculated as q kj ×p j ;

tu k :

The transaction utility of currently processed transaction t k ;

buf :

A buffer used to store the total utility of the last processed transactions for transaction insertion. It is set to 0 after the database is rescanned;

X :

An itemset containing several items i j ;

S u :

The upper utility threshold for large (high) transaction-weighted utilization and high utility itemsets. It is the same as the high utility threshold in traditional utility mining;

S l :

The lower utility threshold for pre-large transaction-weighted utilization and pre-large itemsets, where S u >S l ;

f :

The safety transaction utility bound for new transactions;

C r :

The set of candidate r-itemsets;

Rescan_Items :

The set of the itemsets that must be rescanned in original database;

\(\mathit{HTWU}_{r}^{D}\) :

The set of large (high) transaction-weighted utilization r-itemsets in the original database;

\(\mathit{PTWU}_{r}^{D}\) :

The set of pre-large transaction-weighted utilization r-itemsets in the original database;

HTWU D :

The set of large (high) transaction-weighted utilization itemsets in the original database;

PTWU D :

The set of pre-large transaction-weighted utilization itemsets in the original database;

\(\mathit{HTWU}_{r}^{U}\) :

The set of large (high) transaction-weighted utilization r-itemsets in the updated database;

\(\mathit{PTWU}_{r}^{U}\) :

The set of pre-large transaction-weighted utilization r-itemsets in the updated database;

HTWU U :

The set of large (high) transaction-weighted utilization itemsets in the updated database;

PTWU U :

The set of pre-large transaction-weighted utilization itemsets in the updated database;

HU U :

The set of high-utility itemsets in the updated database;

twu D(X):

The transaction-weighted utilization of itemset X in the original database;

twu d(X):

The transaction-weighted utilization of itemset X in the new transactions;

twu U(X):

The transaction-weighted utilization of itemset X in the updated database;

au D(X):

The actual utility of itemset X in the original database;

au d(X):

The actual utility of itemset X in the new transactions;

au U(X):

The actual utility of itemset X in the updated database.

References

  1. Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The 20th international conference on very large data bases, pp 487–499

    Google Scholar 

  2. Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925

    Article  Google Scholar 

  3. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: International conference on management of data, pp 207–216

    Google Scholar 

  4. Berzal F, Cubero JC, Marín N, Serrano JM (2001) Tbar: an efficient method for association rule mining in relational databases. Data Knowl Eng 37(1):47–64

    Article  MATH  Google Scholar 

  5. Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec 26(2):255–264

    Article  Google Scholar 

  6. Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE international conference on data mining, pp 19–26

    Chapter  Google Scholar 

  7. Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883

    Article  Google Scholar 

  8. Cheung DW, Jiawei H, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering, pp 106–114

    Google Scholar 

  9. Hong TP, Wu CH (2011) An improved weighted clustering algorithm for determination of application nodes in heterogeneous sensor networks. J Inf Hiding Multimed Signal Process 2:173–184

    Google Scholar 

  10. Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129

    MATH  Google Scholar 

  11. Hong TP, Lin CW, Wu YL (2008) Incrementally fast updated frequent pattern trees. Expert Syst Appl 34(4):2424–2435

    Article  Google Scholar 

  12. Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510

    Article  Google Scholar 

  13. Hu K, Lu Y, Zhou L, Shi C (1999) Integrating classification and association rule mining: a concept lattice framework. In: The international workshop on new directions in rough sets, data mining, and granular-soft computing, pp 443–447

    Chapter  Google Scholar 

  14. IBM quest data mining project, Quest synthetic data generation code. http://www.almaden.ibm.com/cs/quest/syndata.html

  15. Lent B, Swami A, Widom J (1997) Clustering association rules. In: The international conference on data engineering, pp 220–231

    Google Scholar 

  16. Li YC, Yeh JS, Chang CC (2005) Direct candidates generation: a novel algorithm for discovering complete share-frequent itemsets. In: Lecture notes in computer science, vol 3614, pp 551–560

    Google Scholar 

  17. Li YC, Yeh JS, Chang CC (2005) Efficient algorithms for mining share-frequent itemsets. In: The world congress of international fuzzy systems association, pp 539–543

    Google Scholar 

  18. Li YC, Yeh JS, Chang CC (2005) Fast algorithm for mining share-frequent itemsets. In: The Asia Pacific web conference, pp 417–428

    Google Scholar 

  19. Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev: Data Min Knowl Discov 3(3):190–199

    MathSciNet  Google Scholar 

  20. Lin CW, Lan GC, Hong TP (2012) An incremental mining algorithm for high utility itemsets. Expert Syst Appl 39(8):7173–7180

    Article  Google Scholar 

  21. Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4(4):201–227

    Google Scholar 

  22. Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell. doi:10.1007/s10489-012-0415-3

    Google Scholar 

  23. Liu Y, Liao W-k, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: The international workshop on utility-based data mining, pp 90–99

    Chapter  Google Scholar 

  24. Liu Y, Liao W-k, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Lecture notes in computer science, pp 689–695

    Google Scholar 

  25. Microsoft Example database foodmart of Microsoft analysis services. http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx

  26. Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. SIGMOD Rec 24(2):175–186

    Article  Google Scholar 

  27. Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5):813–825

    Article  Google Scholar 

  28. Sarda NL, Srinivas NV (1998) An adaptive algorithm for incremental mining of association rules. In: The international workshop on database and expert systems applications, pp 240–245

    Google Scholar 

  29. Song W, Liu Y, Li J (2013) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell. doi:10.1007/s10489-013-0443-7

    Google Scholar 

  30. Sucahyo Y, Gopalan R (2005) Building a more accurate classifier based on strong frequent patterns. In Lecture notes in computer science, pp 1036–1042

    Google Scholar 

  31. Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626

    Article  Google Scholar 

  32. Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: The SIAM international conference on data mining, pp 211–225

    Google Scholar 

  33. Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: International conference on knowledge discovery and data mining, pp 283–286

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tzung-Pei Hong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, CW., Hong, TP., Lan, GC. et al. Incrementally mining high utility patterns based on pre-large concept. Appl Intell 40, 343–357 (2014). https://doi.org/10.1007/s10489-013-0467-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-013-0467-z

Keywords

Navigation