Abstract
We consider the problem of mining strongly closed itemsets from transactional data streams. Compactness and stability against changes in the input are two characteristic features of this kind of itemsets that make them appealing for different applications. Utilizing their algebraic and algorithmic properties, we propose an algorithm based on reservoir sampling for approximating this type of itemsets in the landmark streaming setting, prove its correctness, and show empirically that it yields a considerable speed-up over a straightforward naive algorithm without any significant loss in precision and recall. As a motivating application, we experimentally demonstrate the suitability of strongly closed itemsets to concept drift detection in transactional data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Due to space limitations we omit frequency constraints in this short version.
- 2.
- 3.
- 4.
We are going to present further practical applications (e.g., computer aided product configuration) in the long version of this paper.
References
Boley, M., Horváth, T., Poigné, A., Wrobel, S.: Listing closed sets of strongly accessible set systems with applications to data mining. Theoret. Comput. Sci. 411(3), 691–700 (2010)
Boley, M., Horváth, T., Wrobel, S.: Efficient discovery of interesting patterns based on strong closedness. Stat. Anal. Data Mining 2(5–6), 346–360 (2009)
Gama, J., Žliobaitė, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. 46(4), 44:1–44:37 (2014)
Gély, A.: A generic algorithm for generating closed sets of a binary relation. In: Ganter, B., Godin, R. (eds.) ICFCA 2005. LNCS, vol. 3403, pp. 223–234. Springer, Heidelberg (2005). doi:10.1007/978-3-540-32262-7_15
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Iwanuma, K., Yamamoto, Y., Fukuda, S.: An on-line approximation algorithm for mining frequent closed itemsets based on incremental intersection. In: Proceedings of the 19th International Conference on Extending Database Technology, pp. 704–705 (2016)
Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. 2. Addison-Wesley, Reading (1997)
Lichman, M.: UCI machine learning repository (2013)
Liu, X., Guan, J., Hu, P.: Mining frequent closed itemsets from a landmark window over online data streams. Comput. Math. Appl. 57(6), 927–936 (2009)
Manku, G.S., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the 28th International Conference on Very Large Data Bases (VLDB), pp. 346–357. VLDB Endowment (2002)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)
Serfling, R.J.: Probability inequalities for the sum in sampling without replacement. Ann. Statist. 2(1), 39–48 (1974)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
Yen, S.J., Wu, C.W., Lee, Y.S., Tseng, V.S., Hsieh, C.H.: A fast algorithm for mining frequent closed itemsets over stream sliding window. In: IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), pp. 996–1002 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Trabold, D., Horváth, T. (2017). Mining Strongly Closed Itemsets from Data Streams. In: Yamamoto, A., Kida, T., Uno, T., Kuboyama, T. (eds) Discovery Science. DS 2017. Lecture Notes in Computer Science(), vol 10558. Springer, Cham. https://doi.org/10.1007/978-3-319-67786-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-67786-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67785-9
Online ISBN: 978-3-319-67786-6
eBook Packages: Computer ScienceComputer Science (R0)