Abstract
Random feature subspace selection can produce diverse classifiers and help with Co-training as shown by RASCO algorithm of Wang et al. 2008. For data sets with many irrelevant or noisy feature, RASCO may end up with inaccurate classifiers. In order to remedy this problem, we introduce two algorithms for selecting relevant and non-redundant feature subspaces for Co-training. The first algorithm Rel-RASCO (Relevant Random Subspaces for Co-training) produces subspaces by drawing features with probabilities proportional to their relevances. We also modify a successful feature selection algorithm, mRMR (Minimum Redundancy Maximum Relevance), for random feature subset selection and introduce Prob-mRMR (Probabilistic-mRMR). Experiments on 5 datasets demonstrate that the proposed algorithms outperform both RASCO and Co-training in terms of accuracy achieved at the end of Co-training. Theoretical analysis of the proposed algorithms is also provided.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Roli, F.: Semi-supervised multiple classifier systems: Background and research directions. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS, vol. 3541, pp. 1–11. Springer, Heidelberg (2005)
Blum, A., Mitchell, T.: Combining labeled and unlabeled data with co-training. In: Proc. of the 11th Annual Conference on Computational Learning Theory (COLT 1998), pp. 92–100 (1998)
Wang, J., Luo, S.W., Zeng, X.H.: A random subspace method for co-training. In: International Joint Conference on Neural Networks(IJCNN 2008), pp. 195–200 (2008)
Li, M., Zhou, Z.H.: Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man and Cybernetics 6, 1088–1098 (2007)
Didaci, L., Roli, F.: Using co-training and self-training in semi-supervised multiple classifier systems. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 522–530. Springer, Heidelberg (2006)
Hady, M.F.A., Schwenker, F.: Co-training by committee: A new semi-supervised learning framework. In: IEEE International Conference on Data Mining Workshops, pp. 563–572 (2008)
Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004)
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancys. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 1226–1238 (2005)
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Boley, D., Gini, M., Gross, R., Han, E., Hastings, K., Karypis, G., Kumar, V., Mobasher, B., Moore, J.: Partitioning-based clustering for web document categorization. Decision Support Systems 27, 329–341 (1999)
Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Transactions on Speech and Audio Processing 10(5), 293–302 (2002)
Duin, R.: PRTOOLS A Matlab Toolbox for Pattern Recognition (2004)
Moerchen, F., Ultsch, A., Thies, M., Loehken, I.: Modelling timbre distance with temporal statistics from polyphonic music. IEEE Transactions on Speech and Audio Processingg 14, 81–90 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yaslan, Y., Cataltepe, Z. (2009). Random Relevant and Non-redundant Feature Subspaces for Co-training. In: Corchado, E., Yin, H. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2009. IDEAL 2009. Lecture Notes in Computer Science, vol 5788. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04394-9_83
Download citation
DOI: https://doi.org/10.1007/978-3-642-04394-9_83
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04393-2
Online ISBN: 978-3-642-04394-9
eBook Packages: Computer ScienceComputer Science (R0)