Sharper Bounds for the Hardness of Prototype and Feature Selection

Nock, Richard; Sebban, Marc

doi:10.1007/3-540-40992-0_17

Richard Nock⁴ &
Marc Sebban⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1968))

Included in the following conference series:

International Conference on Algorithmic Learning Theory

456 Accesses
2 Citations

Abstract

As pointed out by Blum [Blu94], “nearly all results in Machine Learning [...] deal with problems of separating relevant from irrelevant information in some way”. This paper is concerned with structural complexity issues regarding the selection of relevant Prototypes or Features. We give the first results proving that both problems can be much harder than expected in the literature for various notions of relevance. In particular, the worst-case bounds achievable by any efficient algorithm are proven to be very large, most of the time not so far from trivial bounds. We think these results give a theoretical justification for the numerous heuristic approaches found in the literature to cope with these problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, Marchetti Spaccamela A., and Protasi M. Complexity and Approximation. Combinatorial Optimization Problems and their Approximability Properties. Springer-Verlag, Berlin, 1999. 226
Google Scholar
S. Arora. Probabilistic checking of proofs and hardness of approximation problems. Technical Report CS-TR-476-94, Princeton University, 1994. 225, 228
Google Scholar
M. Bellare. Proof checking and Approximation: towards tight results. SIGACT news, 1996. 225, 228
Google Scholar
L. Breiman, J. H. Freidman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth, 1984. 227
Google Scholar
A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, pages 245–272, 1997. 225, 227, 235
Google Scholar
A. Blum. Relevant examples and relevant features: Thoughts from computational learning theory. In AAAI Fall Symposium (survey paper), 1994. 224
Google Scholar
P. Crescenzi and V. Kann. A Compendium of NP-Optimization problems. WWW-Available at http://www.nada.kth.se/~viggo/wwwcompendium/, 2000. 226, 230
T. Hancock, T. Jiang, M. Li, and J. Tromp. Lower bounds on learning decision lists and trees. In Proc. of the Symposium on Theoretical Aspects of Computer Science, 1994. 225, 231
Google Scholar
L. Hyafil and R. Rivest. Constructing optimal decision trees is npcomplete. Inform. Process. Letters, pages 15–17, 1976. 231
Google Scholar
George H. John, Ron Kohavi, and Karl Pfleger. Irrelevant features and the subset selection problem. In Proc. of the 11 th International Conference on Machine Learning, pages 121–129, 1994. 233
Google Scholar
D. S. Johnson. Approximation algorithms for combinatorial problems. Journal of Computer and System Sci., pages 256–278, 1974. 226, 235
Google Scholar
V. Kann, S. Khanna, J. Lagergren, and A. Panconesi. On the hardness of approximating MAX k-CUT and its dual. Chicago Journal of Theoretical Computer Science, 2, 1997. 225
Google Scholar
M.J. Kearns and Y. Mansour. On the boosting ability of top-down decision tree learning algorithms. Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, pages 459–468, 1996. 227
Google Scholar
R. Kohavi. Feature subset selection as search with probabilistic estimates. In AAAI Fall Symposium on Relevance, 1994. 224
Google Scholar
R. Kohavi and D. Sommerfield. Feature subset selection using the wrapper model: overfitting and dynamic search space topology. In First International Conference on Knowledge Discovery and Data Mining, 1995. 224
Google Scholar
D. Koller and R. M. Sahami. Toward optimal feature selection. In Proc. of the 13 th International Conference on Machine Learning, 1996. 224
Google Scholar
M. J. Kearns and U. V. Vazirani.An Introduction to Computational Learning Theory. M.I.T. Press, 1994. 231, 235
Google Scholar
T. Mitchell. Machine Learning. McGraw-Hill, 1997. 227
Google Scholar
R. Nock and O. Gascuel. On learning decision committees. In Proc. of the 12 th International Conference on Machine Learning, pages 413–420, 1995. 231
Google Scholar
R. Nock and P. Jappy. Function-free horn clauses are hard to approximate. In Proc. of the Eighth International Conference on Inductive Logic Programming, pages 195–204, 1998. 225
Google Scholar
R. Nock and P. Jappy. On the power of decision lists. In Proc. of the 15 th International Conference on Machine Learning, pages 413–420, 1998. 231
Google Scholar
R. Nock, P. Jappy, and J. Sallantin. Generalized Graph Colorability and Compressibility of Boolean Formulae. In Proc. of the 9 th International Symp. on Algorithms and Computation, pages 237–246, 1998. 225, 226, 231
Google Scholar
R. Nock. Learning logical formulae having limited size: theoretical aspects, methods and results. PhD thesis, Université Montpellier II, 1998. Also available as techreport RR-LIRMM-98014. 231
Google Scholar
K. Pillaipakkamnatt and V. Raghavan. On the limits of proper learnability of subclasses of DNF formulae. In Proc. of the 7 th International Conference on Computational Learning Theory, pages 118–129, 1994. 232
Google Scholar
J. R. Quinlan. C4.5: programs for machine learning. Morgan Kaufmann, 1994. 227
Google Scholar
D. B. Skalak. Prototype and feature selection by sampling and random mutation hill-climbing algorithms. In Eleventh International Conference on Machine Learning, pages 293–301, 1994. 224
Google Scholar
M. Sebban and R. Nock. Combining feature and prototype pruning by uncertainty minimization. In Proc. of the 16 th International Conference on Uncertainty in Artificial Intelligence, 2000. to appear. 224
Google Scholar
M. Sebban and R. Nock. Prototype selection as an information-preserving problem. In Proc. of the 17 th International Conference on Machine Learning, 2000. to appear. 224
Google Scholar
R. E. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. In Proceedings of the Eleventh Annual ACM Conference on Computational Learning Theory, pages 80–91, 1998. 227
Google Scholar
D. Wilson and T. Martinez. Instance pruning techniques. In Proc. of the 14 th International Conference on Machine Learning, pages 404–411, 1997. 224
Google Scholar

Download references

Author information

Authors and Affiliations

Dépt Scientifique Interfacultaire, Université des Antilles-Guyane, Campus deSchoelcher, 97233, Schoelcher, France
Richard Nock
Dépt de Sciences Juridiques, Université des Antilles-Guyane, Campus de Fouillole, 97159, Pointe-à-Pitre, France
Marc Sebban

Authors

Richard Nock
View author publications
You can also search for this author in PubMed Google Scholar
Marc Sebban
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Informatics, Kyushu University, Hakozaki 6-10-1, 812-8581, Fukuoka, Japan
Hiroki Arimura
School of Computing, National University of Singapore, 3 Science Drive 2, 117543, Singapore, Singapore
Sanjay Jain
School of Computer Science and Engineering, The University of New South Wales, 2052, Sydney, Australia
Arun Sharma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nock, R., Sebban, M. (2000). Sharper Bounds for the Hardness of Prototype and Feature Selection. In: Arimura, H., Jain, S., Sharma, A. (eds) Algorithmic Learning Theory. ALT 2000. Lecture Notes in Computer Science(), vol 1968. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-40992-0_17

Download citation

DOI: https://doi.org/10.1007/3-540-40992-0_17
Published: 19 October 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41237-3
Online ISBN: 978-3-540-40992-2
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics