Abstract
An increase in dataset dimensionality and size implies a large computational complexity and possible estimation errors. In this context, data reduction methods try to construct a new and more compact data subset. This subset should maintain the most representative information and remove redundant, irrelevant, and/or noisy information. The inherent uncertainty of MIL renders the data reduction process more difficult. Each positive bag is composed of several instances, of which only a part approximate the positive concept. Information on which instances are positive is not available. In this chapter, we first provide an introduction to data reduction. Next, two main strategies to reduce MIL data are considered. Section 8.2 describes the main concepts of feature selection as well as methods that try to reduce the number of features in MIL problems. Section 8.3 considers bag prototype selection and analyzes the corresponding multi-instance methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bellman, R.: Dynamic Programming and Lagrange Multipliers. Princeton University Press, Princeton (1957)
Burnet, S.F.M.: The Clonal Selection Theory of Acquired Immunity. Vanderbilt University Press, Nashville (1959)
Ciliberto, C., Smeraldi, F., Natale, L., Metta, G.: Online multiple instance learning applied to hand detection in a humanoid robot. In: De Luca, A. (ed.) Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS 2011), pp. 1526–1532. IEEE, San Francisco (2011)
Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. J. Mach. Learn. Res. 5, 803–821 (2004)
Chen, Y., Bi, J., Wang, J.Z.: MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. 28(12), 1931–1947 (2006)
Figueredo, G.P., Ebecken, N.F., Augusto, D.A., Barbosa, H.J.: An immune-inspired instance selection mechanism for supervised classification. Memet. Comput. 4(2), 135–147 (2012)
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)
Fu, Z., Robles-Kelly, A., Zhou, J.: MILIS: multiple instance learning with instance selection. IEEE Trans. Pattern Anal. 33(5), 958–977 (2011)
Gan, R., Yin, J.: Feature selection in multi-instance learning. Neural Comput. Appl. 23(3–4), 907–912 (2013)
García, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)
Jhuo, I.H., Lee, D.T.: Multiple-instance learning: multiple feature selection on instance representation. In: Proceedings of the 25th International Conference on Artificial Intelligence (AAAI 2011), pp. 1794–1795. Association for the Advancement of Artificial Intelligence, San Francisco (2011)
Ji, Z., Dasgupta, D.: V-detector: an efficient negative selection algorithm with “probably adequate” detector coverage. Inf. Sci. 179(10), 1390–1406 (2009)
Li, W.J.: MILD: multiple-instance learning via disambiguation. IEEE Trans. Knowl. Data Eng. 22(1), 76–89 (2010)
Li, M., Kwok, J.T., Lu, B.L.: Online multiple instance learning with no regret. In: Boykov, Y., Schmidt, F.R., Kahl, F., Lempitsky, V. (eds.) Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR 2010), pp. 1395–1401. IEEE, Los Alamitos (2010)
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer, Boston (1998)
Liu, H., Motoda, H.: Instance selection and construction for data mining. Kluwer Academic Publisher, Norwell (2001)
Liu, H., Motoda, H.: Computational Methods of Feature Selection. CRC Press, Boca Raton (2007)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Mao, Q., Tsang, I.W.H.: A feature selection method for multivariate performance measures. IEEE Trans. Pattern Anal. 35(9), 2051–2063 (2013)
Ngiam, J., Goh, H.: Learning global and regional features for photo annotation. In: Peters, C., Muller, H., Caputo, B., Gonzalo, J., Jones, G.J.F., Kalpathy-Cramer, J., Former, P., Giampiccolo, D. (eds.) Proceedings of 10th Workshop of Cross-Language Evaluation Forum for European Languages (CLEF 2009), pp. 287–290. Springer, Berlin (2009)
Robnikikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. J. Mach. Learn. Res. 53(1–2), 23–69 (2003)
Triguero, I., Derrac, J., García, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. C 42(1), 86–100 (2012)
Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)
Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988)
Yuan, X., Wang, M., Song, Y.: Concept-dependent image annotation via existence-based multiple-instance learning. In: Proceedings of the IEEE International Conference on Systems. Man and Cybernetics (SMC 2009), pp. 4112–4117. IEEE, Los Alamitos (2009)
Yuan, L., Liu, J., Tang, X.: Combining example selection with instance selection to speed up multiple-instance learning. Neurocomputing 129, 504–515 (2014)
Yuan, X., Hua, X.S., Wang, M., Qi, G.J., Wu, X.Q.: A novel multiple instance learning approach for image retrieval based on adaboost feature selection. In: Yun-Qing, S., Liao, M., Hu, Y.H., Sheu, P., Ostermann, J. (eds.) Proceedings of the International Conference on Multimedia and Expo (ICME 2007), pp. 1491–1494. IEEE Service Center, Piscataway (2007)
Zafra, A., Pechenizkiy, M., Ventura, S.: ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1), 210–218 (2012)
Zafra, A., Pechenizkiy, M., Ventura, S.: HyDR-MI: a hybrid algorithm to reduce dimensionality in multiple instance learning. Inf. Sci. 222, 282–301 (2013)
Zhang, Q., Goldman, S.: EM-DD: an improved multiple-instance learning technique. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Proceedings of the 17th Conference on Advances in Neural Information Processing Systems (NIPS 1998), pp. 1073–1080. MIT Press, Cambridge (1998)
Zhang, T., Liu, J., Liu, S., Xu, C., Lu, H.: Boosted exemplar learning for action recognition and annotation. IEEE Trans. Circ. Syst. Video 21(7), 853–866 (2011)
Zhu, L., Zhao, B., Gao, Y.: Multi-class multi-instance learning for lung cancer image classification based on bag feature selection. In: Wang, L., Jin, Y. (eds.) Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2008). Lecture Notes in Artificial Intelligence, pp. 487–492. Springer, Berlin (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this chapter
Cite this chapter
Herrera, F. et al. (2016). Data Reduction. In: Multiple Instance Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-47759-6_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-47759-6_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47758-9
Online ISBN: 978-3-319-47759-6
eBook Packages: Computer ScienceComputer Science (R0)