Skip to main content

Sharper Bounds for the Hardness of Prototype and Feature Selection

  • Conference paper
  • First Online:
Algorithmic Learning Theory (ALT 2000)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1968))

Included in the following conference series:

Abstract

As pointed out by Blum [Blu94], “nearly all results in Machine Learning [...] deal with problems of separating relevant from irrelevant information in some way”. This paper is concerned with structural complexity issues regarding the selection of relevant Prototypes or Features. We give the first results proving that both problems can be much harder than expected in the literature for various notions of relevance. In particular, the worst-case bounds achievable by any efficient algorithm are proven to be very large, most of the time not so far from trivial bounds. We think these results give a theoretical justification for the numerous heuristic approaches found in the literature to cope with these problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, Marchetti Spaccamela A., and Protasi M. Complexity and Approximation. Combinatorial Optimization Problems and their Approximability Properties. Springer-Verlag, Berlin, 1999. 226

    Google Scholar 

  2. S. Arora. Probabilistic checking of proofs and hardness of approximation problems. Technical Report CS-TR-476-94, Princeton University, 1994. 225, 228

    Google Scholar 

  3. M. Bellare. Proof checking and Approximation: towards tight results. SIGACT news, 1996. 225, 228

    Google Scholar 

  4. L. Breiman, J. H. Freidman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984. 227

    Google Scholar 

  5. A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, pages 245–272, 1997. 225, 227, 235

    Google Scholar 

  6. A. Blum. Relevant examples and relevant features: Thoughts from computational learning theory. In AAAI Fall Symposium (survey paper), 1994. 224

    Google Scholar 

  7. P. Crescenzi and V. Kann. A Compendium of NP-Optimization problems. WWW-Available at http://www.nada.kth.se/~viggo/wwwcompendium/, 2000. 226, 230

  8. T. Hancock, T. Jiang, M. Li, and J. Tromp. Lower bounds on learning decision lists and trees. In Proc. of the Symposium on Theoretical Aspects of Computer Science, 1994. 225, 231

    Google Scholar 

  9. L. Hyafil and R. Rivest. Constructing optimal decision trees is npcomplete. Inform. Process. Letters, pages 15–17, 1976. 231

    Google Scholar 

  10. George H. John, Ron Kohavi, and Karl Pfleger. Irrelevant features and the subset selection problem. In Proc. of the 11 th International Conference on Machine Learning, pages 121–129, 1994. 233

    Google Scholar 

  11. D. S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and System Sci., pages 256–278, 1974. 226, 235

    Google Scholar 

  12. V. Kann, S. Khanna, J. Lagergren, and A. Panconesi. On the hardness of approximating MAX k-CUT and its dual. Chicago Journal of Theoretical Computer Science, 2, 1997. 225

    Google Scholar 

  13. M.J. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 459–468, 1996. 227

    Google Scholar 

  14. R. Kohavi. Feature subset selection as search with probabilistic estimates. In AAAI Fall Symposium on Relevance, 1994. 224

    Google Scholar 

  15. R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper model: overfitting and dynamic search space topology. In First International Conference on Knowledge Discovery and Data Mining, 1995. 224

    Google Scholar 

  16. D. Koller and R. M. Sahami. Toward optimal feature selection. In Proc. of the 13 th International Conference on Machine Learning, 1996. 224

    Google Scholar 

  17. M. J. Kearns and U. V. Vazirani.An Introduction to Computational Learning Theory. M.I.T. Press, 1994. 231, 235

    Google Scholar 

  18. T. Mitchell. Machine Learning. McGraw-Hill, 1997. 227

    Google Scholar 

  19. R. Nock and O. Gascuel. On learning decision committees. In Proc. of the 12 th International Conference on Machine Learning, pages 413–420, 1995. 231

    Google Scholar 

  20. R. Nock and P. Jappy. Function-free horn clauses are hard to approximate. In Proc. of the Eighth International Conference on Inductive Logic Programming, pages 195–204, 1998. 225

    Google Scholar 

  21. R. Nock and P. Jappy. On the power of decision lists. In Proc. of the 15 th International Conference on Machine Learning, pages 413–420, 1998. 231

    Google Scholar 

  22. R. Nock, P. Jappy, and J. Sallantin. Generalized Graph Colorability and Compressibility of Boolean Formulae. In Proc. of the 9 th International Symp. on Algorithms and Computation, pages 237–246, 1998. 225, 226, 231

    Google Scholar 

  23. R. Nock. Learning logical formulae having limited size: theoretical aspects, methods and results. PhD thesis, Université Montpellier II, 1998. Also available as techreport RR-LIRMM-98014. 231

    Google Scholar 

  24. K. Pillaipakkamnatt and V. Raghavan. On the limits of proper learnability of subclasses of DNF formulae. In Proc. of the 7 th International Conference on Computational Learning Theory, pages 118–129, 1994. 232

    Google Scholar 

  25. J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1994. 227

    Google Scholar 

  26. D. B. Skalak. Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In Eleventh International Conference on Machine Learning, pages 293–301, 1994. 224

    Google Scholar 

  27. M. Sebban and R. Nock. Combining feature and prototype pruning by uncertainty minimization. In Proc. of the 16 th International Conference on Uncertainty in Artificial Intelligence, 2000. to appear. 224

    Google Scholar 

  28. M. Sebban and R. Nock. Prototype selection as an information-preserving problem. In Proc. of the 17 th International Conference on Machine Learning, 2000. to appear. 224

    Google Scholar 

  29. R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual ACM Conference on Computational Learning Theory, pages 80–91, 1998. 227

    Google Scholar 

  30. D. Wilson and T. Martinez. Instance pruning techniques. In Proc. of the 14 th International Conference on Machine Learning, pages 404–411, 1997. 224

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nock, R., Sebban, M. (2000). Sharper Bounds for the Hardness of Prototype and Feature Selection. In: Arimura, H., Jain, S., Sharma, A. (eds) Algorithmic Learning Theory. ALT 2000. Lecture Notes in Computer Science(), vol 1968. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40992-0_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-40992-0_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41237-3

  • Online ISBN: 978-3-540-40992-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics