Optimal Feature Selection for Multivalued Attributes Using Transaction Weights as Utility Scale

Lnc Prakash, K.; Anuradha, K.

doi:10.1007/978-981-10-8228-3_49

K. Lnc Prakash¹⁹ &
K. Anuradha²⁰

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 712))

877 Accesses
3 Citations

Abstract

Attribute selection procedure is a key step in the process of Knowledge Discovery in Database (KDD). Majority of the earlier contributions of selection methods can handle easier attribute types. Such methods are not for multivalued attributes that comprise multiple values in simultaneously. Majority of the existing attribute selection methods can manage simple attribute types like the numerical and categorical. The methods cannot fit multivalued attributes, which are attributes that constitute multiple values simultaneously in the dataset for same instance. In this manuscript, a contemporary approach for selecting optimal values for features of multivalued attributes is proposed. In the proposed solution, the method is about adaptation of utility mining based pattern discovery approach. For evaluating the proposed approach, experiments are carried out with multivalued and multiclass datasets that are submitted to k-means clustering technique. The experiments show that the proposed method is optimal to assess the relevance of multivalued attributes toward mining models such as clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998).
Book Google Scholar
Deng, H., Runger, G., Tuv, E.: Bias of Importance Measures for Multi-valued Attributes and Solutions. In: Honkela, T. (ed.) ICANN 2011, Part II. LNCS, vol. 6792, pp. 293–300. Springer, Heidelberg (2011).
Google Scholar
Dzeroski, S.: Multi-relational data mining: an introduction. SIGKDD Explorations Newsletter 5(1), 1–16 (2003).
Article Google Scholar
Garriga, G.C., Khardon, R., De Raedt, L.: On mining closed sets in multi-relational data. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 804–809. Morgan Kaufmann Publishers Inc., San Francisco (2007).
Google Scholar
Goethals, B., Page, W., Mampaey, M.: Mining interesting sets and rules in relational databases. In: Proceedings of the ACM Symposium on Applied Computing, pp. 997–1001. ACM, New York (2010).
Google Scholar
Nijssen, S., Jimenez, A., Guns, T.: Constraint-based pattern mining in multi relational databases. In: Proceedings of the 11th IEEE International Conference on Data Mining Workshops, pp. 1120–1127 (2011).
Google Scholar
Siebes, A., Koopman, A.: Discovering relational item sets efficiently. In: Proceedings of the SIAM International Conference on Data Mining, pp. 108–119 (2008).
Google Scholar
Spyropoulou, E., De Bie, T., Boley, M.: Interesting pattern mining in multi relational data. Data Mining and Knowledge Discovery 28(3), 808–849 (2014).
Article MathSciNet Google Scholar
Elmasri, R., Navathe, S.B.: Fundamentals of Database System, 6th edn. Addison- Wesley, USA (2010).
Google Scholar
Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(3), 1437–1447 (2003).
Article Google Scholar
Cormen, Thomas H. Introduction to algorithms. MIT press, 2009.
Google Scholar
Geiser, Jürgen. “Discretization methods with analytical solutions for a convection–reaction equation with higher-order discretization’s.” International Journal of Computer Mathematics 86.1 (2009): 163–183.
Article MathSciNet Google Scholar
Mizianty, Marcin J., Lukasz A. Kurgan, and Marek R. Ogiela. “Discretization as the enabling technique for the Naive Bayes and semi-Naive Bayes-based classification.” The Knowledge Engineering Review 25.04 (2010): 421–449.
Article Google Scholar
Yang, Y., Webb, G.I., Wu, X. Discretization Methods. Data Mining and Knowledge Discovery Hand- book, 2nd ed. Springer, Berlin, pp. 101–116 (2010).
Chapter Google Scholar
Zhou, Lu, and Birsen Yazici. “Discretization error analysis and adaptive meshing algorithms for fluorescence diffuse optical tomography in the presence of measurement noise.” IEEE Transactions on Image Processing 20.4 (2011): 1094–1111.
Article MathSciNet Google Scholar
Cios, Krzysztof J., and Lukasz A. Kurgan. “CLIP4: Hybrid inductive machine learning algorithm that generates inequality rules.” Information Sciences 163.1 (2004): 37–83.
Article Google Scholar
Liu, Huan, et al. “Discretization: An enabling technique.” Data mining and knowledge discovery 6.4 (2002): 393–423.
Google Scholar
Ferreira, Artur, and Mario Figueiredo. “Unsupervised joint feature discretization and selection.” Iberian Conference on Pattern Recognition and Image Analysis. Springer Berlin Heidelberg, 2011.
Chapter Google Scholar
Jiang, ShengYi, and Wen Yu. “A local density approach for unsupervised feature discretization.” International Conference on Advanced Data Mining and Applications. Springer Berlin Heidelberg, 2009.
Chapter Google Scholar
Zeng, an, Qi-Gang GAO, and Dan Pan. “A global unsupervised data discretization algorithm based on collective correlation coefficient.” Modern Approaches in Applied Intelligence (2011): 146–155.
Google Scholar
Wu, Qing Xiang, et al. “Improvement of decision accuracy using discretization of continuous attributes.” International Conference on Fuzzy Systems and Knowledge Discovery. Springer Berlin Heidelberg, 2006.
Chapter Google Scholar
Chiu, David KY, Andrew KC Wong, and Benny Cheung. “Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis.” (1991): 125–140.
Google Scholar
Wong, Andrew KC, and David KY Chiu. “Synthesizing statistical knowledge from incomplete mixed-mode data.” IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1987).
Google Scholar
Kurgan, Lukasz A., and Krzysztof J. Cios. “CAIM discretization algorithm.” IEEE transactions on Knowledge and Data Engineering 16.2 (2004): 145–153.
Article Google Scholar
Ching, John Y., Andrew K. C. Wong, and Keith C. C. Chan. “Class-dependent discretization for inductive learning from continuous and mixed-mode data.” IEEE Transactions on Pattern Analysis and Machine Intelligence 17.7 (1995): 641–651.
Article Google Scholar
Tsai, Cheng-Jung, Chien-I. Lee, and Wei-Pang Yang. “A discretization algorithm based on class-attribute contingency coefficient.” Information Sciences 178.3 (2008): 714–731.
Article Google Scholar
Kurgan, Lukasz A., and Krzysztof J. Cios. “Fast Class-Attribute Interdependence Maximization (CAIM) Discretization Algorithm.” ICMLA. 2003.
Google Scholar
Kerber, Randy. “Chi merge: Discretization of numeric attributes.” Proceedings of the tenth national conference on Artificial intelligence. Aaai Press, 1992.
Google Scholar
Liu, Huan, and Rudy Setiono. “Feature selection via discretization.” IEEE Transactions on knowledge and Data Engineering 9.4 (1997): 642–645.
Article Google Scholar
Tay, Francis EH, and Lixiang Shen. “A modified Chi2 algorithm for discretization.” IEEE Transactions on knowledge and data engineering 14.3 (2002): 666–670.
Article Google Scholar
Su, Chao-Ton, and Jyh-Hwa Hsu. “An extended chi2 algorithm for discretization of real value attributes.” IEEE transactions on knowledge and data engineering 17.3 (2005): 437–441.
Article Google Scholar
Chen, Yen-Liang, Chang-Ling Hsu, and Shih-Chieh Chou. “Constructing a multi-valued decision tree.” Expert Systems with Applications 25.2 (2003): 199–209.
Article Google Scholar
Chou, Shihchieh, and Chang-Ling Hsu. “MMDT: a multi-valued decision tree classifier for data mining.” Expert Systems with Applications 28.4 (2005): 799–812.
Article Google Scholar
Hartigan, John A., and Manchek A. Wong. “Algorithm AS 136: A k-means clustering algorithm.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 28.1 (1979): 100–108.
Google Scholar
Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of computational and graphical statistics. 1996 Sep 1; 5(3):299–314.
Google Scholar
https://relational.fit.cvut.cz/dataset/CORA.
Zhao Y, Karypis G. Criterion functions for document clustering: Experiments and analysis.
Google Scholar
Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques. In KDD workshop on text mining 2000 Aug 20 (Vol. 400, No. 1, pp. 525–526).
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, AITS, Rajampet, Andhra Pradesh, India
K. Lnc Prakash
Department of Computer Science and Engineering, GRIET, Hyderabad, Telangana, India
K. Anuradha

Authors

K. Lnc Prakash
View author publications
You can also search for this author in PubMed Google Scholar
K. Anuradha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Lnc Prakash .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Shri Ramswaroop Memorial Group of Professional Colleges, Lucknow, Uttar Pradesh, India
Vikrant Bhateja
Departamento de Engenharia Mecânica, Universidade do Porto, Porto, Portugal
João Manuel R.S. Tavares
Department of Computer Science and Engineering, JNTUH College of Engineering Hyderabad (Autonomous), Hyderabad, Telangana, India
B. Padmaja Rani
Department of Computer Science and Engineering, JNTUH College of Engineering Hyderabad (Autonomous), Hyderabad, Telangana, India
V. Kamakshi Prasad
CMR Technical Campus, Hyderabad, Telangana, India
K. Srujan Raju

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lnc Prakash, K., Anuradha, K. (2018). Optimal Feature Selection for Multivalued Attributes Using Transaction Weights as Utility Scale. In: Bhateja, V., Tavares, J., Rani, B., Prasad, V., Raju, K. (eds) Proceedings of the Second International Conference on Computational Intelligence and Informatics . Advances in Intelligent Systems and Computing, vol 712. Springer, Singapore. https://doi.org/10.1007/978-981-10-8228-3_49

Download citation

DOI: https://doi.org/10.1007/978-981-10-8228-3_49
Published: 24 July 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8227-6
Online ISBN: 978-981-10-8228-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics