Incrementally mining high utility patterns based on pre-large concept

Lin, Chun-Wei; Hong, Tzung-Pei; Lan, Guo-Cheng; Wong, Jia-Wei; Lin, Wen-Yang

doi:10.1007/s10489-013-0467-z

Incrementally mining high utility patterns based on pre-large concept

Published: 27 August 2013

Volume 40, pages 343–357, (2014)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chun-Wei Lin^1,2,
Tzung-Pei Hong^3,4,
Guo-Cheng Lan⁵,
Jia-Wei Wong⁴ &
…
Wen-Yang Lin³

677 Accesses
38 Citations
Explore all metrics

Abstract

In traditional association rule mining, most algorithms are designed to discover frequent itemsets from a binary database. Utility mining was thus proposed to measure the utility values of purchased items for revealing high utility itemsets from a quantitative database. In the past, a two-phase high utility mining algorithm was thus proposed for efficiently discovering high utility itemsets from a quantitative database. In dynamic data mining, transactions may be inserted, deleted, or modified from a database. In this case, a batch mining procedure must rescan the whole updated database to maintain the up-to-date information. Designing an efficient approach for handling dynamic databases is thus a critical research issue in utility mining. In this paper, an incremental mining algorithm is proposed for efficiently maintaining discovered high utility itemsets based on pre-large concepts. Itemsets are first partitioned into three parts according to whether they have large (high), pre-large, or small transaction-weighted utilization in the original database and in inserted transactions. Individual procedures are then executed for each part. Experimental results show that the proposed incremental high utility mining algorithm outperforms existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Abbreviations

I :: A set of m items, I={i ₁,i ₂,…,i _j,…,i _m}, in which each item i _j has its own profit value p _j;
P :: The profit table, {p ₁,p ₂,…,p _j,…,p _m}, in which p _j is the profit value of item i _j;
D :: The original quantitative database, D={T ₁,T ₂,…,T _k,…,T _n}, in which each transaction contains several items with purchase quantities;
d :: The new transactions, d={t ₁,t ₂,…,t _k,…,t _n}, in which each transaction contains several items with purchase quantities;
U :: The entire updated database, i.e., D∪d;
TU ^D :: The total utility of the transactions in D;
TU ^d :: The total utility of the transactions in d;
TU ^U :: The total utility of the transactions in U;
q _kj :: The quantity of item i _j in transaction t _k;
u _kj :: The utility of item i _j in transaction t _k, which is calculated as q _kj×p _j;
tu _k :: The transaction utility of currently processed transaction t _k;
buf :: A buffer used to store the total utility of the last processed transactions for transaction insertion. It is set to 0 after the database is rescanned;
X :: An itemset containing several items i _j;
S _u :: The upper utility threshold for large (high) transaction-weighted utilization and high utility itemsets. It is the same as the high utility threshold in traditional utility mining;
S _l :: The lower utility threshold for pre-large transaction-weighted utilization and pre-large itemsets, where S _u>S _l;
f :: The safety transaction utility bound for new transactions;
C _r :: The set of candidate r-itemsets;
Rescan_Items :: The set of the itemsets that must be rescanned in original database;
\(\mathit{HTWU}_{r}^{D}\) :: The set of large (high) transaction-weighted utilization r-itemsets in the original database;
\(\mathit{PTWU}_{r}^{D}\) :: The set of pre-large transaction-weighted utilization r-itemsets in the original database;
HTWU ^D :: The set of large (high) transaction-weighted utilization itemsets in the original database;
PTWU ^D :: The set of pre-large transaction-weighted utilization itemsets in the original database;
\(\mathit{HTWU}_{r}^{U}\) :: The set of large (high) transaction-weighted utilization r-itemsets in the updated database;
\(\mathit{PTWU}_{r}^{U}\) :: The set of pre-large transaction-weighted utilization r-itemsets in the updated database;
HTWU ^U :: The set of large (high) transaction-weighted utilization itemsets in the updated database;
PTWU ^U :: The set of pre-large transaction-weighted utilization itemsets in the updated database;
HU ^U :: The set of high-utility itemsets in the updated database;
twu ^D(X):: The transaction-weighted utilization of itemset X in the original database;
twu ^d(X):: The transaction-weighted utilization of itemset X in the new transactions;
twu ^U(X):: The transaction-weighted utilization of itemset X in the updated database;
au ^D(X):: The actual utility of itemset X in the original database;
au ^d(X):: The actual utility of itemset X in the new transactions;
au ^U(X):: The actual utility of itemset X in the updated database.

References

Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: The 20th international conference on very large data bases, pp 487–499
Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925
Article Google Scholar
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: International conference on management of data, pp 207–216
Google Scholar
Berzal F, Cubero JC, Marín N, Serrano JM (2001) Tbar: an efficient method for association rule mining in relational databases. Data Knowl Eng 37(1):47–64
Article MATH Google Scholar
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. SIGMOD Rec 26(2):255–264
Article Google Scholar
Chan R, Yang Q, Shen YD (2003) Mining high utility itemsets. In: IEEE international conference on data mining, pp 19–26
Chapter Google Scholar
Chen MS, Han J, Yu PS (1996) Data mining: an overview from a database perspective. IEEE Trans Knowl Data Eng 8(6):866–883
Article Google Scholar
Cheung DW, Jiawei H, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: The international conference on data engineering, pp 106–114
Google Scholar
Hong TP, Wu CH (2011) An improved weighted clustering algorithm for determination of application nodes in heterogeneous sensor networks. J Inf Hiding Multimed Signal Process 2:173–184
Google Scholar
Hong TP, Wang CY, Tao YH (2001) A new incremental data mining algorithm using pre-large itemsets. Intell Data Anal 5:111–129
MATH Google Scholar
Hong TP, Lin CW, Wu YL (2008) Incrementally fast updated frequent pattern trees. Expert Syst Appl 34(4):2424–2435
Article Google Scholar
Hong TP, Lin CW, Yang KT, Wang SL (2013) Using TF-IDF to hide sensitive itemsets. Appl Intell 38(4):502–510
Article Google Scholar
Hu K, Lu Y, Zhou L, Shi C (1999) Integrating classification and association rule mining: a concept lattice framework. In: The international workshop on new directions in rough sets, data mining, and granular-soft computing, pp 443–447
Chapter Google Scholar
IBM quest data mining project, Quest synthetic data generation code. http://www.almaden.ibm.com/cs/quest/syndata.html
Lent B, Swami A, Widom J (1997) Clustering association rules. In: The international conference on data engineering, pp 220–231
Google Scholar
Li YC, Yeh JS, Chang CC (2005) Direct candidates generation: a novel algorithm for discovering complete share-frequent itemsets. In: Lecture notes in computer science, vol 3614, pp 551–560
Google Scholar
Li YC, Yeh JS, Chang CC (2005) Efficient algorithms for mining share-frequent itemsets. In: The world congress of international fuzzy systems association, pp 539–543
Google Scholar
Li YC, Yeh JS, Chang CC (2005) Fast algorithm for mining share-frequent itemsets. In: The Asia Pacific web conference, pp 417–428
Google Scholar
Lin CW, Hong TP (2013) A survey of fuzzy web mining. Wiley Interdiscip Rev: Data Min Knowl Discov 3(3):190–199
MathSciNet Google Scholar
Lin CW, Lan GC, Hong TP (2012) An incremental mining algorithm for high utility itemsets. Expert Syst Appl 39(8):7173–7180
Article Google Scholar
Lin CW, Hong TP, Chang CC, Wang SL (2013) A greedy-based approach for hiding sensitive itemsets by transaction insertion. J Inf Hiding Multimed Signal Process 4(4):201–227
Google Scholar
Liu YH (2013) Stream mining on univariate uncertain data. Appl Intell. doi:10.1007/s10489-012-0415-3
Google Scholar
Liu Y, Liao W-k, Choudhary A (2005) A fast high utility itemsets mining algorithm. In: The international workshop on utility-based data mining, pp 90–99
Chapter Google Scholar
Liu Y, Liao W-k, Choudhary A (2005) A two-phase algorithm for fast discovery of high utility itemsets. In: Lecture notes in computer science, pp 689–695
Google Scholar
Microsoft Example database foodmart of Microsoft analysis services. http://msdn.microsoft.com/en-us/library/aa217032(SQL.80).aspx
Park JS, Chen MS, Yu PS (1995) An effective hash-based algorithm for mining association rules. SIGMOD Rec 24(2):175–186
Article Google Scholar
Park JS, Chen MS, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE Trans Knowl Data Eng 9(5):813–825
Article Google Scholar
Sarda NL, Srinivas NV (1998) An adaptive algorithm for incremental mining of association rules. In: The international workshop on database and expert systems applications, pp 240–245
Google Scholar
Song W, Liu Y, Li J (2013) Mining high utility itemsets by dynamically pruning the tree structure. Appl Intell. doi:10.1007/s10489-013-0443-7
Google Scholar
Sucahyo Y, Gopalan R (2005) Building a more accurate classifier based on strong frequent patterns. In Lecture notes in computer science, pp 1036–1042
Google Scholar
Yao H, Hamilton HJ (2006) Mining itemset utilities from transaction databases. Data Knowl Eng 59(3):603–626
Article Google Scholar
Yao H, Hamilton HJ, Butz CJ (2004) A foundational approach to mining itemset utilities from databases. In: The SIAM international conference on data mining, pp 211–225
Google Scholar
Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: International conference on knowledge discovery and data mining, pp 283–286
Google Scholar

Download references

Author information

Authors and Affiliations

Innovative Information Industry Research Center (IIIRC), Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, 518055, P.R. China
Chun-Wei Lin
Shenzhen Key Laboratory of Internet Information Collaboration, School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, 518055, P.R. China
Chun-Wei Lin
Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung, 811, Taiwan, R.O.C.
Tzung-Pei Hong & Wen-Yang Lin
Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung, 804, Taiwan, R.O.C.
Tzung-Pei Hong & Jia-Wei Wong
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, 701, Taiwan, R.O.C.
Guo-Cheng Lan

Authors

Chun-Wei Lin
View author publications
You can also search for this author in PubMed Google Scholar
Tzung-Pei Hong
View author publications
You can also search for this author in PubMed Google Scholar
Guo-Cheng Lan
View author publications
You can also search for this author in PubMed Google Scholar
Jia-Wei Wong
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Yang Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tzung-Pei Hong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lin, CW., Hong, TP., Lan, GC. et al. Incrementally mining high utility patterns based on pre-large concept. Appl Intell 40, 343–357 (2014). https://doi.org/10.1007/s10489-013-0467-z

Download citation

Published: 27 August 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10489-013-0467-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incrementally mining high utility patterns based on pre-large concept

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

A comprehensive survey of data mining

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Incrementally mining high utility patterns based on pre-large concept

Abstract

Access this article

Similar content being viewed by others

Trends and Future Perspective Challenges in Big Data

A comprehensive survey of data mining

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation