Data Reduction

Herrera, Francisco; Ventura, Sebastián; Bello, Rafael; Cornelis, Chris; Zafra, Amelia; Sánchez-Tarragó, Dánel; Vluymans, Sarah

doi:10.1007/978-3-319-47759-6_8

Francisco Herrera⁸,
Sebastián Ventura⁹,
Rafael Bello¹⁰,
Chris Cornelis¹¹,
Amelia Zafra¹²,
Dánel Sánchez-Tarragó¹³ &
…
Sarah Vluymans¹¹

1391 Accesses

Abstract

An increase in dataset dimensionality and size implies a large computational complexity and possible estimation errors. In this context, data reduction methods try to construct a new and more compact data subset. This subset should maintain the most representative information and remove redundant, irrelevant, and/or noisy information. The inherent uncertainty of MIL renders the data reduction process more difficult. Each positive bag is composed of several instances, of which only a part approximate the positive concept. Information on which instances are positive is not available. In this chapter, we first provide an introduction to data reduction. Next, two main strategies to reduce MIL data are considered. Section 8.2 describes the main concepts of feature selection as well as methods that try to reduce the number of features in MIL problems. Section 8.3 considers bag prototype selection and analyzes the corresponding multi-instance methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bellman, R.: Dynamic Programming and Lagrange Multipliers. Princeton University Press, Princeton (1957)
MATH Google Scholar
Burnet, S.F.M.: The Clonal Selection Theory of Acquired Immunity. Vanderbilt University Press, Nashville (1959)
Book Google Scholar
Ciliberto, C., Smeraldi, F., Natale, L., Metta, G.: Online multiple instance learning applied to hand detection in a humanoid robot. In: De Luca, A. (ed.) Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS 2011), pp. 1526–1532. IEEE, San Francisco (2011)
Google Scholar
Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. J. Mach. Learn. Res. 5, 803–821 (2004)
MathSciNet Google Scholar
Chen, Y., Bi, J., Wang, J.Z.: MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. 28(12), 1931–1947 (2006)
Article Google Scholar
Figueredo, G.P., Ebecken, N.F., Augusto, D.A., Barbosa, H.J.: An immune-inspired instance selection mechanism for supervised classification. Memet. Comput. 4(2), 135–147 (2012)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)
Article MathSciNet MATH Google Scholar
Fu, Z., Robles-Kelly, A., Zhou, J.: MILIS: multiple instance learning with instance selection. IEEE Trans. Pattern Anal. 33(5), 958–977 (2011)
Article Google Scholar
Gan, R., Yin, J.: Feature selection in multi-instance learning. Neural Comput. Appl. 23(3–4), 907–912 (2013)
Article Google Scholar
García, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)
Article Google Scholar
Jhuo, I.H., Lee, D.T.: Multiple-instance learning: multiple feature selection on instance representation. In: Proceedings of the 25th International Conference on Artificial Intelligence (AAAI 2011), pp. 1794–1795. Association for the Advancement of Artificial Intelligence, San Francisco (2011)
Google Scholar
Ji, Z., Dasgupta, D.: V-detector: an efficient negative selection algorithm with “probably adequate” detector coverage. Inf. Sci. 179(10), 1390–1406 (2009)
Article Google Scholar
Li, W.J.: MILD: multiple-instance learning via disambiguation. IEEE Trans. Knowl. Data Eng. 22(1), 76–89 (2010)
Article Google Scholar
Li, M., Kwok, J.T., Lu, B.L.: Online multiple instance learning with no regret. In: Boykov, Y., Schmidt, F.R., Kahl, F., Lempitsky, V. (eds.) Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR 2010), pp. 1395–1401. IEEE, Los Alamitos (2010)
Google Scholar
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer, Boston (1998)
Book MATH Google Scholar
Liu, H., Motoda, H.: Instance selection and construction for data mining. Kluwer Academic Publisher, Norwell (2001)
Book Google Scholar
Liu, H., Motoda, H.: Computational Methods of Feature Selection. CRC Press, Boca Raton (2007)
MATH Google Scholar
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)
Article Google Scholar
Mao, Q., Tsang, I.W.H.: A feature selection method for multivariate performance measures. IEEE Trans. Pattern Anal. 35(9), 2051–2063 (2013)
Article Google Scholar
Ngiam, J., Goh, H.: Learning global and regional features for photo annotation. In: Peters, C., Muller, H., Caputo, B., Gonzalo, J., Jones, G.J.F., Kalpathy-Cramer, J., Former, P., Giampiccolo, D. (eds.) Proceedings of 10th Workshop of Cross-Language Evaluation Forum for European Languages (CLEF 2009), pp. 287–290. Springer, Berlin (2009)
Google Scholar
Robnikikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. J. Mach. Learn. Res. 53(1–2), 23–69 (2003)
Article MATH Google Scholar
Triguero, I., Derrac, J., García, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. C 42(1), 86–100 (2012)
Article Google Scholar
Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)
Article Google Scholar
Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988)
Article MathSciNet MATH Google Scholar
Yuan, X., Wang, M., Song, Y.: Concept-dependent image annotation via existence-based multiple-instance learning. In: Proceedings of the IEEE International Conference on Systems. Man and Cybernetics (SMC 2009), pp. 4112–4117. IEEE, Los Alamitos (2009)
Google Scholar
Yuan, L., Liu, J., Tang, X.: Combining example selection with instance selection to speed up multiple-instance learning. Neurocomputing 129, 504–515 (2014)
Article Google Scholar
Yuan, X., Hua, X.S., Wang, M., Qi, G.J., Wu, X.Q.: A novel multiple instance learning approach for image retrieval based on adaboost feature selection. In: Yun-Qing, S., Liao, M., Hu, Y.H., Sheu, P., Ostermann, J. (eds.) Proceedings of the International Conference on Multimedia and Expo (ICME 2007), pp. 1491–1494. IEEE Service Center, Piscataway (2007)
Google Scholar
Zafra, A., Pechenizkiy, M., Ventura, S.: ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1), 210–218 (2012)
Article Google Scholar
Zafra, A., Pechenizkiy, M., Ventura, S.: HyDR-MI: a hybrid algorithm to reduce dimensionality in multiple instance learning. Inf. Sci. 222, 282–301 (2013)
Article MathSciNet Google Scholar
Zhang, Q., Goldman, S.: EM-DD: an improved multiple-instance learning technique. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Proceedings of the 17th Conference on Advances in Neural Information Processing Systems (NIPS 1998), pp. 1073–1080. MIT Press, Cambridge (1998)
Google Scholar
Zhang, T., Liu, J., Liu, S., Xu, C., Lu, H.: Boosted exemplar learning for action recognition and annotation. IEEE Trans. Circ. Syst. Video 21(7), 853–866 (2011)
Article Google Scholar
Zhu, L., Zhao, B., Gao, Y.: Multi-class multi-instance learning for lung cancer image classification based on bag feature selection. In: Wang, L., Jin, Y. (eds.) Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2008). Lecture Notes in Artificial Intelligence, pp. 487–492. Springer, Berlin (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
Francisco Herrera
Department of Computer Science, University of Córdoba, Córdoba, Spain
Sebastián Ventura
Center of Information Studies, Central University “Marta Abreu” of Las Villas, Santa Clara, Villa Clara, Cuba
Rafael Bello
Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium
Chris Cornelis & Sarah Vluymans
Department of Computer Science and Numerical Analysis, University of Córdoba, Córdoba, Spain
Amelia Zafra
Central University “Marta Abreu” of Las Villas, Santa Clara, Villa Clara, Cuba
Dánel Sánchez-Tarragó

Authors

Francisco Herrera
View author publications
You can also search for this author in PubMed Google Scholar
Sebastián Ventura
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Bello
View author publications
You can also search for this author in PubMed Google Scholar
Chris Cornelis
View author publications
You can also search for this author in PubMed Google Scholar
Amelia Zafra
View author publications
You can also search for this author in PubMed Google Scholar
Dánel Sánchez-Tarragó
View author publications
You can also search for this author in PubMed Google Scholar
Sarah Vluymans
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco Herrera .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Herrera, F. et al. (2016). Data Reduction. In: Multiple Instance Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-47759-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-47759-6_8
Published: 09 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-47758-9
Online ISBN: 978-3-319-47759-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics