A Blocking Strategy for Ranking Features According to Probabilistic Relevance

Bontempi, Gianluca

doi:10.1007/978-3-319-51469-7_5

Gianluca Bontempi¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10122))

Included in the following conference series:

International Workshop on Machine Learning, Optimization, and Big Data

2539 Accesses

Abstract

The paper presents an algorithm to rank features in “small number of samples, large dimensionality” problems according to probabilistic feature relevance, a novel definition of feature relevance. Probabilistic feature relevance, intended as expected weak relevance, is introduced in order to address the problem of estimating conventional feature relevance in data settings where the number of samples is much smaller than the number of features. The resulting ranking algorithm relies on a blocking approach for estimation and consists in creating a large number of identical configurations to measure the conditional information of each feature in a paired manner. Its implementation can be made embarrassingly parallel in the case of very large n. A number of experiments on simulated and real data confirms the interest of the approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Boldface denotes random variables.
2.
All details on the datasets (number of samples, number of variables, number of classes) are available in https://github.com/ramhiser/datamicroarray/blob/master/README.md.

References

Bontempi, G.: A blocking strategy to improve gene selection for classification of gene expression data. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(2), 293–300 (2007)
Article Google Scholar
Bontempi, G., Meyer, P.E.: Causal filter selection in microarray data. In: Proceeding of the ICML 2010 Conference (2010)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1990)
MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
MATH Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)
Article MATH Google Scholar
Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)
Google Scholar
Meyer, P.E., Bontempi, G.: Information-theoretic gene selection in expression data. In: Biological Knowledge Discovery Handbook. IEEE Computer Society (2014)
Google Scholar
Montgomery, D.C.: Design and Analysis of Experiments. Wiley, Hoboken (2001)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)
Article Google Scholar
Ramey, J.A.: Datamicroarray: Collection of Data Sets for Classification (2013). R package version 0.2.2
Google Scholar
Robert, C.P., Casella, G.: Monte Carlo Statistical Methods. Springer, New York (1999)
Book MATH Google Scholar
Tsamardinos, I., Aliferis, C.: Towards principled feature selection: relevancy. In: Proceedings of the 9th International Workshop on Artificial Intelligence and Statistics (2003)
Google Scholar
Tsamardinos, I., Aliferis, C.F., Statnikov, A.: Algorithms for large scale Markov blanket discovery. In: Proceedings of the 16th International FLAIRS Conference (FLAIRS 2003) (2003)
Google Scholar

Download references

Acknowledgements

The author acknowledges the support of the “BruFence: Scalable machine learning for automating defense system” project (RBC/14 PFS-ICT 5), funded by the Institute for the encouragement of Scientific Research and Innovation of Brussels (INNOVIRIS, Brussels Region, Belgium).

Author information

Authors and Affiliations

Machine Learning Group, Computer Science Department, Interuniversity Institute of Bioinformatics in Brussels (IB)2, ULB, Université Libre de Bruxelles, Bruxelles, Belgium
Gianluca Bontempi

Authors

Gianluca Bontempi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gianluca Bontempi .

Editor information

Editors and Affiliations

Department of Industrial and Systems Engineering, University of Florida, Gainesville, Florida, USA
Panos M. Pardalos
Semantic Technology Laboratory, National Research Council (CNR), Catania, Italy
Piero Conca
Dipartimento di Sociologia e Metodi della Ricerca Sociale, Università di Catania, Catania, Italy
Giovanni Giuffrida
Department of Mathematics and Computer Science, University of Catania, Catania, Italy
Giuseppe Nicosia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bontempi, G. (2016). A Blocking Strategy for Ranking Features According to Probabilistic Relevance. In: Pardalos, P., Conca, P., Giuffrida, G., Nicosia, G. (eds) Machine Learning, Optimization, and Big Data. MOD 2016. Lecture Notes in Computer Science(), vol 10122. Springer, Cham. https://doi.org/10.1007/978-3-319-51469-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-51469-7_5
Published: 25 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51468-0
Online ISBN: 978-3-319-51469-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics