Abstract
Attribute selection procedure is a key step in the process of Knowledge Discovery in Database (KDD). Majority of the earlier contributions of selection methods can handle easier attribute types. Such methods are not for multivalued attributes that comprise multiple values in simultaneously. Majority of the existing attribute selection methods can manage simple attribute types like the numerical and categorical. The methods cannot fit multivalued attributes, which are attributes that constitute multiple values simultaneously in the dataset for same instance. In this manuscript, a contemporary approach for selecting optimal values for features of multivalued attributes is proposed. In the proposed solution, the method is about adaptation of utility mining based pattern discovery approach. For evaluating the proposed approach, experiments are carried out with multivalued and multiclass datasets that are submitted to k-means clustering technique. The experiments show that the proposed method is optimal to assess the relevance of multivalued attributes toward mining models such as clustering.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998).
Deng, H., Runger, G., Tuv, E.: Bias of Importance Measures for Multi-valued Attributes and Solutions. In: Honkela, T. (ed.) ICANN 2011, Part II. LNCS, vol. 6792, pp. 293–300. Springer, Heidelberg (2011).
Dzeroski, S.: Multi-relational data mining: an introduction. SIGKDD Explorations Newsletter 5(1), 1–16 (2003).
Garriga, G.C., Khardon, R., De Raedt, L.: On mining closed sets in multi-relational data. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 804–809. Morgan Kaufmann Publishers Inc., San Francisco (2007).
Goethals, B., Page, W., Mampaey, M.: Mining interesting sets and rules in relational databases. In: Proceedings of the ACM Symposium on Applied Computing, pp. 997–1001. ACM, New York (2010).
Nijssen, S., Jimenez, A., Guns, T.: Constraint-based pattern mining in multi relational databases. In: Proceedings of the 11th IEEE International Conference on Data Mining Workshops, pp. 1120–1127 (2011).
Siebes, A., Koopman, A.: Discovering relational item sets efficiently. In: Proceedings of the SIAM International Conference on Data Mining, pp. 108–119 (2008).
Spyropoulou, E., De Bie, T., Boley, M.: Interesting pattern mining in multi relational data. Data Mining and Knowledge Discovery 28(3), 808–849 (2014).
Elmasri, R., Navathe, S.B.: Fundamentals of Database System, 6th edn. Addison- Wesley, USA (2010).
Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(3), 1437–1447 (2003).
Cormen, Thomas H. Introduction to algorithms. MIT press, 2009.
Geiser, Jürgen. “Discretization methods with analytical solutions for a convection–reaction equation with higher-order discretization’s.” International Journal of Computer Mathematics 86.1 (2009): 163–183.
Mizianty, Marcin J., Lukasz A. Kurgan, and Marek R. Ogiela. “Discretization as the enabling technique for the Naive Bayes and semi-Naive Bayes-based classification.” The Knowledge Engineering Review 25.04 (2010): 421–449.
Yang, Y., Webb, G.I., Wu, X. Discretization Methods. Data Mining and Knowledge Discovery Hand- book, 2nd ed. Springer, Berlin, pp. 101–116 (2010).
Zhou, Lu, and Birsen Yazici. “Discretization error analysis and adaptive meshing algorithms for fluorescence diffuse optical tomography in the presence of measurement noise.” IEEE Transactions on Image Processing 20.4 (2011): 1094–1111.
Cios, Krzysztof J., and Lukasz A. Kurgan. “CLIP4: Hybrid inductive machine learning algorithm that generates inequality rules.” Information Sciences 163.1 (2004): 37–83.
Liu, Huan, et al. “Discretization: An enabling technique.” Data mining and knowledge discovery 6.4 (2002): 393–423.
Ferreira, Artur, and Mario Figueiredo. “Unsupervised joint feature discretization and selection.” Iberian Conference on Pattern Recognition and Image Analysis. Springer Berlin Heidelberg, 2011.
Jiang, ShengYi, and Wen Yu. “A local density approach for unsupervised feature discretization.” International Conference on Advanced Data Mining and Applications. Springer Berlin Heidelberg, 2009.
Zeng, an, Qi-Gang GAO, and Dan Pan. “A global unsupervised data discretization algorithm based on collective correlation coefficient.” Modern Approaches in Applied Intelligence (2011): 146–155.
Wu, Qing Xiang, et al. “Improvement of decision accuracy using discretization of continuous attributes.” International Conference on Fuzzy Systems and Knowledge Discovery. Springer Berlin Heidelberg, 2006.
Chiu, David KY, Andrew KC Wong, and Benny Cheung. “Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis.” (1991): 125–140.
Wong, Andrew KC, and David KY Chiu. “Synthesizing statistical knowledge from incomplete mixed-mode data.” IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1987).
Kurgan, Lukasz A., and Krzysztof J. Cios. “CAIM discretization algorithm.” IEEE transactions on Knowledge and Data Engineering 16.2 (2004): 145–153.
Ching, John Y., Andrew K. C. Wong, and Keith C. C. Chan. “Class-dependent discretization for inductive learning from continuous and mixed-mode data.” IEEE Transactions on Pattern Analysis and Machine Intelligence 17.7 (1995): 641–651.
Tsai, Cheng-Jung, Chien-I. Lee, and Wei-Pang Yang. “A discretization algorithm based on class-attribute contingency coefficient.” Information Sciences 178.3 (2008): 714–731.
Kurgan, Lukasz A., and Krzysztof J. Cios. “Fast Class-Attribute Interdependence Maximization (CAIM) Discretization Algorithm.” ICMLA. 2003.
Kerber, Randy. “Chi merge: Discretization of numeric attributes.” Proceedings of the tenth national conference on Artificial intelligence. Aaai Press, 1992.
Liu, Huan, and Rudy Setiono. “Feature selection via discretization.” IEEE Transactions on knowledge and Data Engineering 9.4 (1997): 642–645.
Tay, Francis EH, and Lixiang Shen. “A modified Chi2 algorithm for discretization.” IEEE Transactions on knowledge and data engineering 14.3 (2002): 666–670.
Su, Chao-Ton, and Jyh-Hwa Hsu. “An extended chi2 algorithm for discretization of real value attributes.” IEEE transactions on knowledge and data engineering 17.3 (2005): 437–441.
Chen, Yen-Liang, Chang-Ling Hsu, and Shih-Chieh Chou. “Constructing a multi-valued decision tree.” Expert Systems with Applications 25.2 (2003): 199–209.
Chou, Shihchieh, and Chang-Ling Hsu. “MMDT: a multi-valued decision tree classifier for data mining.” Expert Systems with Applications 28.4 (2005): 799–812.
Hartigan, John A., and Manchek A. Wong. “Algorithm AS 136: A k-means clustering algorithm.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 28.1 (1979): 100–108.
Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of computational and graphical statistics. 1996 Sep 1; 5(3):299–314.
Zhao Y, Karypis G. Criterion functions for document clustering: Experiments and analysis.
Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques. In KDD workshop on text mining 2000 Aug 20 (Vol. 400, No. 1, pp. 525–526).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Lnc Prakash, K., Anuradha, K. (2018). Optimal Feature Selection for Multivalued Attributes Using Transaction Weights as Utility Scale. In: Bhateja, V., Tavares, J., Rani, B., Prasad, V., Raju, K. (eds) Proceedings of the Second International Conference on Computational Intelligence and Informatics . Advances in Intelligent Systems and Computing, vol 712. Springer, Singapore. https://doi.org/10.1007/978-981-10-8228-3_49
Download citation
DOI: https://doi.org/10.1007/978-981-10-8228-3_49
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8227-6
Online ISBN: 978-981-10-8228-3
eBook Packages: EngineeringEngineering (R0)