Abstract
In this chapter an axiomatic characterisation of feature subset selection is presented. Two axioms are presented: sufficiency axiom — preservation of learning information, and necessity axiom — minimising encoding length. The sufficiency axiom concerns the existing dataset and is derived based on the following understanding: any selected feature subset should be able to describe the training dataset without losing information, i.e., it is consistent with the training dataset. The necessity axiom concerns predictability and is derived from Occam’s razor, which states that the simplest among different alternatives is preferred for prediction. The two axioms are then re-stated in terms of relevance in a concise form: maximising both the r(X; Y) and r(Y; X) relevance. Based on the relevance characterisation, a heuristic selection algorithm is presented and experimented with. The results support the axioms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aha, D. W. and Bankert, R. L. (1994). Feature selection for case-based classification of cloud types. In Working notes of the AAAI94 Workshop on Case-based Reasoning, pages 106–112. AAAI Press.
Almuallim, H. and Dietterich, T. G. (1991). Learning with many irrelevant features. In Proc. Ninth National Conference on Artificial Intelligence, pages 547–552. MIT Press.
Amirikian, B. and Nishimura, H. (1994). What size network is good for generalization of a specific task of interest? Neural Networks, 7(2):321–329.
Blumer, A., Ehrenfeucht, A., Haussier, D., and Warmuth, M. K. (1987). Occam’s Razor. Information Processing Letters, 24:377–380.
Caruana, R. and Freitag, D. (1994). How useful is relevance? In Proceedings of the 1994 AAAI Fall Symposium on Relevance, pages 21–25. AAAI Press.
Cover, T. M. and Thomas, J. A. (1991). Elements of information theory. John Wiley & Sons, Inc.
Data mining system, C. A. Integral Solutions Limited (ISL).http://www.isl.co.uk/.
Davies, S. and Russell, S. (1994). NP-Completeness of Searches for Smallest Possible Feature Sets. In Proceedings of the 1994 AAAI Fall Symposium on Relevance,pages 37–39. AAAI Press.
Devijver, P. A. and Kittler, J. (1982). Pattern recognition: A statistical approach. New York: Prentice-Hall.
Fayyad, U. and Irani, K. (1990). What should be minimized in a decision tree? In AAAI-90: Proceedings of 8th National Conference on Artificial Intelligence.
Fayyad, U. and Irani, K. (1992). The attribute selection problem in decision tree generation. In AAAI-92: Proceedings of 10th National Conference on Artificial Intelligence.
Hill, J. R. (1991). Relational Databases: A Tutorial for Statisticians. In Keramidas, E. M. and Kaufman, S. M., editors, Computing Science and Statistics: Proc. of the 23rd Symposium on the Interface, pages 86–93.
John, G. H., Kohavi, R., and Pfleger, K. (1994). Irrelevant features and the subset selection problem. In Proceedings of the 11th international conference on machine learning, pages 121–129. New Brunswick, NJ: Morgan Kaufmann
Kira, K. and Rendell, L. A. (1992). The feature selection problem: traditional methods and a new algorithm. In AAAI-92, pages 129–134.
Kohavi, R. and Sommerfield, D. (1995). Feature Subset Selection Using the Wrapper Method: Overfitting and Dynamic Search Space Topology. In Fayyad, U. M. and Uthurusamy, R., editors, Proceedings of KDD’95, pages 192–197.
Kononenko, I. (1994). Estimating attributes: analysis and extensions of RELIEF. In Proceedings of the 1994 European Conference on Machine Learning.
Langley, P. (1994). Selection of relevant features in machine learning. In Relevance: proc. 1994 AAAI Fall Symposium, pages 127–131. AAAI Press.
Murphy, P. M. and Aha, D. W. (1994). UCI Repository of Machine Learning Databases and Domain Theories. Irvin, CA. ftp://ftp.ics.uci.edu.
Quinlan, J. and Rivest, R. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80:227–248.
Rissanen, J. (1986). Stochastic complexity and modeling. Ann. Statist.,14:1080–1100.
Schlimmer, J. C. (1993). Efficiently inducing determinations: a complete and systematic search algorithm that uses optimal pruning. In ML93, pages 284–290.
Schweitzer, H. (1995). Occam algorithms for computing visual motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(11):1033–1042.
Wang, H. (1996). Towards a unified framework of relevance. PhD thesis, Faculty of Informatics, University of Ulster, N. Ireland, UK.http://www.infm.ulst.ac.uk/‐hwang/thesis.ps.
Wolpert, D. H. (1990). The relationship between Occam’s Razor and convergent guessing. Complex Systems, 4:319–368.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer Science+Business Media New York
About this chapter
Cite this chapter
Wang, H., Bell, D., Murtagh, F. (1998). Relevance Approach to Feature Subset Selection. In: Liu, H., Motoda, H. (eds) Feature Extraction, Construction and Selection. The Springer International Series in Engineering and Computer Science, vol 453. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-5725-8_6
Download citation
DOI: https://doi.org/10.1007/978-1-4615-5725-8_6
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7622-4
Online ISBN: 978-1-4615-5725-8
eBook Packages: Springer Book Archive