Abstract
In machine learning, feature ranking (FR) algorithms are used to rank features by relevance to the class variable. FR algorithms are mostly investigated for the feature selection problem and less studied for the problem of ranking. This paper focuses on the latter. A question asked about the problem of ranking given in the terminology of FR is: as different FR criteria estimate the relationship between a feature and the class variable differently on a given data, can we determine which criterion better captures the “true” feature-to-class relationship and thus generates the most “correct” order of individual features? This is termed as the “correctness” problem. It requires a reference ordering against which the ranks assigned to features by a FR algorithm are directly compared. The reference ranking is generally unknown for real-life data. In this paper, we show through theoretical and empirical analysis that for two-class classification tasks represented with binary data, the ordering of binary features based on their individual predictive powers can be used as a benchmark. Thus, allowing us to test how correct is the ordering of a FR algorithm. Based on these ideas, an evaluation method termed as FR evaluation strategy (FRES) is proposed. Rankings of three different FR criteria (relief, mutual information, and the diff-criterion) are investigated on five artificially generated and four real-life binary data sets. The results indicate that FRES works equally good for synthetic and real-life data and the diff-criterion generates the most correct orderings for binary data.
Similar content being viewed by others
Notes
From here onwards, the discussion will be from machine learning perspective unless stated otherwise.
Also known as variables or attributes.
Also known as examples, observations or samples.
References
Agarwal S, Dugar D, Sengupta S (2010) Ranking chemical structures for drug discovery: a new machine learning approach. J Chem Inf Model 50(5):716–731
Agarwal S, Sengupta S (2009) Ranking genes by relevance to a disease. In: Proceedings of the 8th annual international conference on computational systems bioinformatics
AIMS (2010) The mathematics of ranking. http://www.aimath.org/ARCC/workshops/mathofranking.html
Arauzo-Azofra A, Aznarte J, Benitez J (2011) Empirical study of feature selection methods based on individual feature evaluation for classification problems. Expert Syst Appl 38(7):8170–8177
Bhamidipati N, Pal S (2009) Comparing scores intended for ranking. IEEE Trans Knowl Data Eng 21(1):21–34
Bishop C (2006) Pattern recognition and machine learning. Springer, Berlin
Boldi P (2005) TotalRank: ranking without damping. In: Special interest tracks and posters of the 14th international conference on world wide web, WWW ’05, pp 898–899
Bolon-Canedo V, Sanchez-Marono N, Alonso-Betanzos A (2013) A review of feature selection methods on synthetic data. Knowl Inf Syst 34(3):483–519
Clemencon S, Lugosi G, Vayatis N (2008) Ranking and empirical minimization of U-statistics. Ann Stat 36:844–874
Cohen W, Schapire R, Singer Y (1999) Learning to order things. J Artif Intell Res 10:240–270
Conover W (1999) Practical nonparametric statistics, 3rd edn. Wiley, New York
Cover T, Thomas J (1991) Elements of information theory. Wiley, New York
Duch W (2006) Feature extraction: foundations and applications. In: Guyon I, Nikravesh M, Gunn S, Zadeh L (eds) Foundations and applications. Springer, Berlin, pp 89–117
Duda R, Hart P, Stork D (2000) Pattern classification, 2nd edn. Wiley, New York
Dwork C, Kumar R, Naor M et al (2001) Rank aggregation methods for the web. In: Proceedings of the tenth international conference on World wide web (WWW10), pp 613–622
Fagin R, Kumar R, Sivakumar D (2003). Comparing top \(k\) lists. In: ACM SIAM symposium on discrete algorithms, pp 28–36
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Freund Y, Iyer R, Schapire R et al (2003) An efficient boosting algorithm for combining preferences. J Mach Learn Res 4:933–969
Gleich D, Langville A (2010) Suggested problems for discussion. http://www.stat.uchicago.edu/lekheng/meetings/mathofranking/problems/david-amy.txt
Golub T, Slonim D, Tamayo P et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286:531–537
Gustafson A, Snitkin E, Parker S et al (2006) Towards the identification of essential genes using targeted genome sequencing and comparative analysis. BMC Bioinform 7. http://www.biomedcentral.com/1471-2164/7/265/
Guyon I, Aliferis C, Cooper G et al (2008) Design and analysis of the causation and prediction challenge. In: JMLR workshop and conference proceedings: causation and prediction challenge (WCCI 2008), vol. 3, pp 1–33
Guyon I, Cawley G, Dror G et al (eds) (2011) Hands-on pattern recognition: challenges in machine learning, vol. 1. Microtome Publishing, Brookline. http://www.mtome.com/Publications/CiML/ciml.html
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182
Guyon I, Saffari A, Dror G et al (2007) Agnostic learning vs. prior knowledge challenge. In: Proceedings of international joint conference on neural networks (IJCNN), pp 829–834
Guyon I, Weston J, Barnhill S et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422
Hall M, Holmes G (2003) Benchmarking attribute selection techniques for discrete class data mining. IEEE Trans Knowl Data Eng 15(6):1437–1447
Javed K (2012) Development of feature selection algorithms for high-dimensional binary data. Ph.D. thesis, Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan
Javed K, Babri H, Saeed M (2012a) Evaluating rankings of mutual information and diff-criterion for high-dimensional binary data. In: Proceedings of the first Taibah University International on computing and information technology, pp 18–23
Javed K, Babri H, Saeed M (2012b) Feature selection based on class-dependent densities for high-dimensional binary data. IEEE Trans Knowl Data Eng 24(3):465–477
John G, Kohavi R, Pfleger K (1994) Irrelevant feature and the subset selection problem. In: Proceedings of the 11th international conference on machine learning (ICML), pp 121–129
Jr EH, Ebecken N (2007) Towards efficient variables ordering for Bayesian networks classifier. Data Knowl Eng 63(2):258–269
Kalousis A, Prados J, Hilario M (2007) Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl Inf Syst 12(1):95–116
Kira K, Rendell L (1992). A practical approach to feature selection. In: Proceedings of the 9th international conference on machine learning (ICML), pp 249–256
Langville A, Meyer C (2004) Deeper inside pagerank. Internet Math 1(3):335–380
Lapata M (2006) Automatic evaluation of information ordering: Kendall’s Tau. Comput Linguist 32(4):471–484
Lazar C, Taminau J, Meganck S et al (2012) A survey on filter techniques for feature selection in gene expression microarray analysis. IEEE/ACM Trans Comput Biol Bioinf 9(4):1106–1119
Li H (2011) A short introduction to learning to Rank. IEICE Trans 94-D(10):1854–1862
Minka T (2003) A comparison of numerical optimizers for logistic regression. http://research.microsoft.com/minka/papers/
Rosa KD, Metsis V, Athitsos V (2012) Boosted ranking models: a unifying framework for ranking predictions. Knowl Inf Syst 30(3):543–568
Ruiz R, Aguilar-Ruiz J, Riquelme J et al (2005) Analysis of feature rankings for classification. In: Proceedings of the 6th international symposium on, intelligent data analysis, pp 362–372
Saeys Y, Inza I, Larranage P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517
Saffari A, Guyon I (2006) Quick start guide for challenge learning object package (CLOP), Technical report, Graz University of Technology and Clopinet. http://clopinet.com/clop/
Slavkov I, Zenko B, Dzeroski S (2010) Evaluation method for feature rankings and their aggregations for biomarker discovery. In: JMLR workshop and conference proceedings: machine learning in systems biology, vol. 8. pp 122–135
Su Y, Murali T, Pavlovic V et al (2003) RankGene: identification of diagnostic genes based on expression data. Bioinformatics 19(12):1578–1579
Wang B, Tang J, Fan W et al (2013) Query-dependent cross-domain ranking in heterogeneous network. Knowl Inf Syst 34(1):109–145
Xia F, Liu T-Y, Wang J et al (2008) Listwise approach to learning to rank: theory and algorithm. In: Proceedings of the 25th international conference on machine learning (ICML), pp 1192–1199
Xiao YHYY, Segal MR (2005) Identifying differentially expressed genes from microarray experiments via statistic synthesis. Bioinformatics 21(7):1084–1093
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Javed, K., Saeed, M. & Babri, H.A. The correctness problem: evaluating the ordering of binary features in rankings. Knowl Inf Syst 39, 543–563 (2014). https://doi.org/10.1007/s10115-013-0631-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-013-0631-0