Skip to main content

Optimal Feature Selection for Multivalued Attributes Using Transaction Weights as Utility Scale

  • Conference paper
  • First Online:
Proceedings of the Second International Conference on Computational Intelligence and Informatics

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 712))

Abstract

Attribute selection procedure is a key step in the process of Knowledge Discovery in Database (KDD). Majority of the earlier contributions of selection methods can handle easier attribute types. Such methods are not for multivalued attributes that comprise multiple values in simultaneously. Majority of the existing attribute selection methods can manage simple attribute types like the numerical and categorical. The methods cannot fit multivalued attributes, which are attributes that constitute multiple values simultaneously in the dataset for same instance. In this manuscript, a contemporary approach for selecting optimal values for features of multivalued attributes is proposed. In the proposed solution, the method is about adaptation of utility mining based pattern discovery approach. For evaluating the proposed approach, experiments are carried out with multivalued and multiclass datasets that are submitted to k-means clustering technique. The experiments show that the proposed method is optimal to assess the relevance of multivalued attributes toward mining models such as clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Norwell (1998).

    Book  Google Scholar 

  2. Deng, H., Runger, G., Tuv, E.: Bias of Importance Measures for Multi-valued Attributes and Solutions. In: Honkela, T. (ed.) ICANN 2011, Part II. LNCS, vol. 6792, pp. 293–300. Springer, Heidelberg (2011).

    Google Scholar 

  3. Dzeroski, S.: Multi-relational data mining: an introduction. SIGKDD Explorations Newsletter 5(1), 1–16 (2003).

    Article  Google Scholar 

  4. Garriga, G.C., Khardon, R., De Raedt, L.: On mining closed sets in multi-relational data. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 804–809. Morgan Kaufmann Publishers Inc., San Francisco (2007).

    Google Scholar 

  5. Goethals, B., Page, W., Mampaey, M.: Mining interesting sets and rules in relational databases. In: Proceedings of the ACM Symposium on Applied Computing, pp. 997–1001. ACM, New York (2010).

    Google Scholar 

  6. Nijssen, S., Jimenez, A., Guns, T.: Constraint-based pattern mining in multi relational databases. In: Proceedings of the 11th IEEE International Conference on Data Mining Workshops, pp. 1120–1127 (2011).

    Google Scholar 

  7. Siebes, A., Koopman, A.: Discovering relational item sets efficiently. In: Proceedings of the SIAM International Conference on Data Mining, pp. 108–119 (2008).

    Google Scholar 

  8. Spyropoulou, E., De Bie, T., Boley, M.: Interesting pattern mining in multi relational data. Data Mining and Knowledge Discovery 28(3), 808–849 (2014).

    Article  MathSciNet  Google Scholar 

  9. Elmasri, R., Navathe, S.B.: Fundamentals of Database System, 6th edn. Addison- Wesley, USA (2010).

    Google Scholar 

  10. Hall, M.A., Holmes, G.: Benchmarking attribute selection techniques for discrete class data mining. IEEE Transactions on Knowledge and Data Engineering 15(3), 1437–1447 (2003).

    Article  Google Scholar 

  11. Cormen, Thomas H. Introduction to algorithms. MIT press, 2009.

    Google Scholar 

  12. Geiser, Jürgen. “Discretization methods with analytical solutions for a convection–reaction equation with higher-order discretization’s.” International Journal of Computer Mathematics 86.1 (2009): 163–183.

    Article  MathSciNet  Google Scholar 

  13. Mizianty, Marcin J., Lukasz A. Kurgan, and Marek R. Ogiela. “Discretization as the enabling technique for the Naive Bayes and semi-Naive Bayes-based classification.” The Knowledge Engineering Review 25.04 (2010): 421–449.

    Article  Google Scholar 

  14. Yang, Y., Webb, G.I., Wu, X. Discretization Methods. Data Mining and Knowledge Discovery Hand- book, 2nd ed. Springer, Berlin, pp. 101–116 (2010).

    Chapter  Google Scholar 

  15. Zhou, Lu, and Birsen Yazici. “Discretization error analysis and adaptive meshing algorithms for fluorescence diffuse optical tomography in the presence of measurement noise.” IEEE Transactions on Image Processing 20.4 (2011): 1094–1111.

    Article  MathSciNet  Google Scholar 

  16. Cios, Krzysztof J., and Lukasz A. Kurgan. “CLIP4: Hybrid inductive machine learning algorithm that generates inequality rules.” Information Sciences 163.1 (2004): 37–83.

    Article  Google Scholar 

  17. Liu, Huan, et al. “Discretization: An enabling technique.” Data mining and knowledge discovery 6.4 (2002): 393–423.

    Google Scholar 

  18. Ferreira, Artur, and Mario Figueiredo. “Unsupervised joint feature discretization and selection.” Iberian Conference on Pattern Recognition and Image Analysis. Springer Berlin Heidelberg, 2011.

    Chapter  Google Scholar 

  19. Jiang, ShengYi, and Wen Yu. “A local density approach for unsupervised feature discretization.” International Conference on Advanced Data Mining and Applications. Springer Berlin Heidelberg, 2009.

    Chapter  Google Scholar 

  20. Zeng, an, Qi-Gang GAO, and Dan Pan. “A global unsupervised data discretization algorithm based on collective correlation coefficient.” Modern Approaches in Applied Intelligence (2011): 146–155.

    Google Scholar 

  21. Wu, Qing Xiang, et al. “Improvement of decision accuracy using discretization of continuous attributes.” International Conference on Fuzzy Systems and Knowledge Discovery. Springer Berlin Heidelberg, 2006.

    Chapter  Google Scholar 

  22. Chiu, David KY, Andrew KC Wong, and Benny Cheung. “Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis.” (1991): 125–140.

    Google Scholar 

  23. Wong, Andrew KC, and David KY Chiu. “Synthesizing statistical knowledge from incomplete mixed-mode data.” IEEE Transactions on Pattern Analysis and Machine Intelligence 6 (1987).

    Google Scholar 

  24. Kurgan, Lukasz A., and Krzysztof J. Cios. “CAIM discretization algorithm.” IEEE transactions on Knowledge and Data Engineering 16.2 (2004): 145–153.

    Article  Google Scholar 

  25. Ching, John Y., Andrew K. C. Wong, and Keith C. C. Chan. “Class-dependent discretization for inductive learning from continuous and mixed-mode data.” IEEE Transactions on Pattern Analysis and Machine Intelligence 17.7 (1995): 641–651.

    Article  Google Scholar 

  26. Tsai, Cheng-Jung, Chien-I. Lee, and Wei-Pang Yang. “A discretization algorithm based on class-attribute contingency coefficient.” Information Sciences 178.3 (2008): 714–731.

    Article  Google Scholar 

  27. Kurgan, Lukasz A., and Krzysztof J. Cios. “Fast Class-Attribute Interdependence Maximization (CAIM) Discretization Algorithm.” ICMLA. 2003.

    Google Scholar 

  28. Kerber, Randy. “Chi merge: Discretization of numeric attributes.” Proceedings of the tenth national conference on Artificial intelligence. Aaai Press, 1992.

    Google Scholar 

  29. Liu, Huan, and Rudy Setiono. “Feature selection via discretization.” IEEE Transactions on knowledge and Data Engineering 9.4 (1997): 642–645.

    Article  Google Scholar 

  30. Tay, Francis EH, and Lixiang Shen. “A modified Chi2 algorithm for discretization.” IEEE Transactions on knowledge and data engineering 14.3 (2002): 666–670.

    Article  Google Scholar 

  31. Su, Chao-Ton, and Jyh-Hwa Hsu. “An extended chi2 algorithm for discretization of real value attributes.” IEEE transactions on knowledge and data engineering 17.3 (2005): 437–441.

    Article  Google Scholar 

  32. Chen, Yen-Liang, Chang-Ling Hsu, and Shih-Chieh Chou. “Constructing a multi-valued decision tree.” Expert Systems with Applications 25.2 (2003): 199–209.

    Article  Google Scholar 

  33. Chou, Shihchieh, and Chang-Ling Hsu. “MMDT: a multi-valued decision tree classifier for data mining.” Expert Systems with Applications 28.4 (2005): 799–812.

    Article  Google Scholar 

  34. Hartigan, John A., and Manchek A. Wong. “Algorithm AS 136: A k-means clustering algorithm.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 28.1 (1979): 100–108.

    Google Scholar 

  35. Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of computational and graphical statistics. 1996 Sep 1; 5(3):299–314.

    Google Scholar 

  36. https://relational.fit.cvut.cz/dataset/CORA.

  37. Zhao Y, Karypis G. Criterion functions for document clustering: Experiments and analysis.

    Google Scholar 

  38. Steinbach M, Karypis G, Kumar V. A comparison of document clustering techniques. In KDD workshop on text mining 2000 Aug 20 (Vol. 400, No. 1, pp. 525–526).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Lnc Prakash .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lnc Prakash, K., Anuradha, K. (2018). Optimal Feature Selection for Multivalued Attributes Using Transaction Weights as Utility Scale. In: Bhateja, V., Tavares, J., Rani, B., Prasad, V., Raju, K. (eds) Proceedings of the Second International Conference on Computational Intelligence and Informatics . Advances in Intelligent Systems and Computing, vol 712. Springer, Singapore. https://doi.org/10.1007/978-981-10-8228-3_49

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8228-3_49

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8227-6

  • Online ISBN: 978-981-10-8228-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics