Skip to main content

Feature Subset Selection by Estimation of Distribution Algorithms

  • Chapter
Estimation of Distribution Algorithms

Part of the book series: Genetic Algorithms and Evolutionary Computation ((GENA,volume 2))

Abstract

Feature Subset Selection is a well known task in the Machine Learn-ing, Data Mining, Pattern Recognition and Text Learning paradigms. In this chapter, we present a set of Estimation of Distribution Algorihtms (EDAs) inspired techniques to tackle the Feature Subset Selection problem in Machine Learning and Data Mining tasks. Bayesian networks are used to factorize the probability distribution of best solutions in small and medium dimensionality datasets, and simpler probabilistic models are used in larger dimensionality domains. In a comparison with different sequential and genetic-inspired algorithms in natural and artificial datasets, EDA-based approaches have obtained encouraging accuracy results and need a smaller number of evaluations than genetic approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aha, D.W. and Bankert, R.L. (1994). Feature selection for case-based classification of cloud types: An empirical comparison. In Proceedings of the AAAIā€™94 Workshop on Case-Based Reasoning, pages 106ā€“112.

    Google ScholarĀ 

  • Alpaydin, E. (1999). Combined 5x2cv f test for comparing supervised classification learning algorithms. Neural Computation, 11:1885ā€“1892.

    ArticleĀ  Google ScholarĀ 

  • BƤck, T. (1996). Evolutionary Algorithms in Theory and Practice. Oxford University Press.

    Google ScholarĀ 

  • Baluja, S. (1994). Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning. Technical Report CMU-CS-94ā€“163, Carnegie Mellon University, Pittsburgh, PA.

    Google ScholarĀ 

  • Blum, A.L. and Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245ā€“271.

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  • Buntine, W. (1991). Theory refinement in Bayesian networks. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, pages 52ā€“60.

    Google ScholarĀ 

  • Cestnik, B. (1990). Estimating probabilities: a crucial task in machine learning. In Proceedings of the European Conference on Artificial Intelligence, pages 147ā€“149.

    Google ScholarĀ 

  • Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14:462467.

    Google ScholarĀ 

  • De Bonet, J.D., Isbell, C.L., and Viola, P. (1997). MIMIC: Finding optima by estimating probability densities. In Advances in Neural Information Processing Systems, volume 9. MIT Press.

    Google ScholarĀ 

  • Doak, J. (1992). An evaluation of feature selection methods and their application to computer security. Technical Report CSE-92ā€“18, University of California at Davis.

    Google ScholarĀ 

  • Etxeberria, R. and LarraƱaga, P. (1999). Global optimization with Bayesian networks. In II Symposium on Artificial Intelligence. CIMAF99. Special Session on Distributions and Evolutionary Optimization, pages 332ā€“339.

    Google ScholarĀ 

  • Ferri, F.J., Pudil, P., Hatef, M., and Kittler, J. (1994). Comparative study of techniques for large scale feature selection. In Gelsema, E.S. and Kanal, L.N., editors, Multiple Paradigms, Comparative Studies and Hybrid Systems, pages 403ā€“413. North Holland.

    Google ScholarĀ 

  • Friedman, N. and Yakhini, Z. (1996). On the sample complexity of learning Bayesian networks. In Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence, pages 274ā€“282.

    Google ScholarĀ 

  • GonzĆ”lez, C., Lozano, J. A., and LarraƱaga, P. (2001). The converge behavior of PBIL algorithm: a preliminary approach. In KurkovĆ”, V., Steel, N. C., Neruda, R., and KĆ”rnĆæ, M., editors, International Conference on Artificial Neural Networks and Genetic Algorithms. ICANNGA-2001, pages 228ā€“231. Springer.

    Google ScholarĀ 

  • Grefenstette, J.J. (1986). Optimization of control parameters for genetic algorithms. IEEE Transactions on Systems, Man,and Cybernetics, 16(1):122ā€“128.

    ArticleĀ  Google ScholarĀ 

  • Harik G.R. and Goldberg, D.E. (1996). Learning linkage. Technical Report IlliGAL Report 99003, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.

    Google ScholarĀ 

  • Inza, I., LarraƱaga, P., Etxeberria, R., and Sierra, B. (2000). Feature subset selection by Bayesian network-based optimization. Artificial Intelligence, 123(1ā€“2):157ā€“184.

    ArticleĀ  MATHĀ  Google ScholarĀ 

  • Jain, A.K. and Chandrasekaran, R. (1982). Dimensionality and sample size considerations in pattern recognition practice. In Krishnaiah, P.R. and Kanal, L.N., editors, Handbook of Statistics, volume 2, pages 835ā€“855. North-Holland.

    Google ScholarĀ 

  • Jain, A.K. and Zongker, D. (1997). Feature selection: Evaluation, application, and small sample performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(2):153ā€“158.

    ArticleĀ  Google ScholarĀ 

  • Kittler, J. (1978). Feature set search algorithms. In Chen, C., editor, Pattern Recognition and Signal Processing, pages 41ā€“60.

    ChapterĀ  Google ScholarĀ 

  • Sithoff and Noordhoff. Kohavi, R. and John, G. (1997). Wrappers for feature subset selection. ArtificialIntelligence, 97(1ā€“2):273ā€“324.

    Google ScholarĀ 

  • Kohavi, R., Sommerfield, D., and Dougherty, J. (1997). Data mining using MLC++, a machine learning library in C++. International Journal of Artificial Intelligence Tools, 6:537ā€“566.

    ArticleĀ  Google ScholarĀ 

  • Kudo, M. and Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33:25ā€“41.

    ArticleĀ  Google ScholarĀ 

  • Langley, P. and Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence, pages 399ā€“406.

    Google ScholarĀ 

  • Liu, H. and Motoda, H. (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.

    BookĀ  MATHĀ  Google ScholarĀ 

  • Miller, A.J. (1990). Subset Selection in Regression. Chapman and Hall.

    Google ScholarĀ 

  • Mladenic, M. (1998). Feature subset selection in text-learning. In Proceedings of the Tenth European Conference on Machine Learning, pages 95ā€“100.

    Google ScholarĀ 

  • Murphy, P. (1995). UCI Repository of machine learning databases. University of California, Department of Information and Computer Science.

    Google ScholarĀ 

  • Narendra, P. and Fukunaga, K. (1977). A branch and bound algorithm for feature subset selection. IEEE Transactions on Computer, C-26(9):917ā€“922.

    ArticleĀ  MATHĀ  Google ScholarĀ 

  • Ng, A.Y. (1997). Preventing ā€œoverfittingā€ of cross-validation data. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 245ā€“253.

    Google ScholarĀ 

  • Pelikan, M., Goldberg, D.E., and CantĆŗ-Paz, E. (1998). Linkage problem, distribution estimation, and Bayesian networks. Technical Report IlliGAL Report 98013, University of Illinois at Urbana-Champaign, Illinois Genetic Algorithms Laboratory.

    Google ScholarĀ 

  • Pelikan, M. and MĆ¼ehlenbein, H. (1999). The bivariate marginal distribution algorithm. In Advances in Soft Computing-Engineering Design and Manufacturing, pages 521ā€“535. Springer-Verlag.

    Google ScholarĀ 

  • Pudil, P., Novovicova, J., and Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15(1):1119ā€“1125.

    ArticleĀ  Google ScholarĀ 

  • Rana, S., Whitley, L.D., and Cogswell, R. (1996). Searching in the presence of noise. In Lecture Notes in Computer Science 1411: Parallel Problem Solving from Nature - PPSN IV, pages 198ā€“207.

    Google ScholarĀ 

  • SangĆ¼esa, R., CortĆ©s, U., and Gisolfi, A. (1998). A parallel algorithm for building possibilistic causal networks. International Journal of Approximate Reasoning, 18(3ā€“4):251ā€“270.

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  • Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 7:461ā€“464.

    ArticleĀ  Google ScholarĀ 

  • Siedelecky, W. and Sklansky, J. (1988). On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence, 2:197ā€“220.

    ArticleĀ  Google ScholarĀ 

  • Syswerda, G. (1993). Simulated crossover in genetic algorithms. In Whitley, L.D., editor, Foundations of Genetic Algorithms, volume 2, pages 239ā€“255.

    Google ScholarĀ 

  • Thierens, D. and Goldberg, D.E. (1993). Mixing in genetic algorithms. In Proceedings of the Fifth International Conference in Genetic Algorithms, pages 38ā€“45.

    Google ScholarĀ 

  • Xiang, Y. and Chu, T. (1999). Parallel learning of belief networks in large and difficult domains. Data Mining and Knowledge Discovery, 3(3):315ā€“338.

    ArticleĀ  Google ScholarĀ 

  • Yang, Y. and Pedersen, J.O. (1997). A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 412ā€“420.

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Inza, I., LarraƱaga, P., Sierra, B. (2002). Feature Subset Selection by Estimation of Distribution Algorithms. In: LarraƱaga, P., Lozano, J.A. (eds) Estimation of Distribution Algorithms. Genetic Algorithms and Evolutionary Computation, vol 2. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-1539-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-1539-5_13

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5604-2

  • Online ISBN: 978-1-4615-1539-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics