Skip to main content

Data Reduction

  • Chapter
  • First Online:
Multiple Instance Learning

Abstract

An increase in dataset dimensionality and size implies a large computational complexity and possible estimation errors. In this context, data reduction methods try to construct a new and more compact data subset. This subset should maintain the most representative information and remove redundant, irrelevant, and/or noisy information. The inherent uncertainty of MIL renders the data reduction process more difficult. Each positive bag is composed of several instances, of which only a part approximate the positive concept. Information on which instances are positive is not available. In this chapter, we first provide an introduction to data reduction. Next, two main strategies to reduce MIL data are considered. Section 8.2 describes the main concepts of feature selection as well as methods that try to reduce the number of features in MIL problems. Section 8.3 considers bag prototype selection and analyzes the corresponding multi-instance methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bellman, R.: Dynamic Programming and Lagrange Multipliers. Princeton University Press, Princeton (1957)

    MATH  Google Scholar 

  2. Burnet, S.F.M.: The Clonal Selection Theory of Acquired Immunity. Vanderbilt University Press, Nashville (1959)

    Book  Google Scholar 

  3. Ciliberto, C., Smeraldi, F., Natale, L., Metta, G.: Online multiple instance learning applied to hand detection in a humanoid robot. In: De Luca, A. (ed.) Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS 2011), pp. 1526–1532. IEEE, San Francisco (2011)

    Google Scholar 

  4. Chen, Y., Wang, J.Z.: Image categorization by learning and reasoning with regions. J. Mach. Learn. Res. 5, 803–821 (2004)

    MathSciNet  Google Scholar 

  5. Chen, Y., Bi, J., Wang, J.Z.: MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. 28(12), 1931–1947 (2006)

    Article  Google Scholar 

  6. Figueredo, G.P., Ebecken, N.F., Augusto, D.A., Barbosa, H.J.: An immune-inspired instance selection mechanism for supervised classification. Memet. Comput. 4(2), 135–147 (2012)

    Article  Google Scholar 

  7. Friedman, J., Hastie, T., Tibshirani, R.: Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28(2), 337–407 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fu, Z., Robles-Kelly, A., Zhou, J.: MILIS: multiple instance learning with instance selection. IEEE Trans. Pattern Anal. 33(5), 958–977 (2011)

    Article  Google Scholar 

  9. Gan, R., Yin, J.: Feature selection in multi-instance learning. Neural Comput. Appl. 23(3–4), 907–912 (2013)

    Article  Google Scholar 

  10. García, S., Luengo, J., Sáez, J.A., López, V., Herrera, F.: A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans. Knowl. Data Eng. 25(4), 734–750 (2013)

    Article  Google Scholar 

  11. Jhuo, I.H., Lee, D.T.: Multiple-instance learning: multiple feature selection on instance representation. In: Proceedings of the 25th International Conference on Artificial Intelligence (AAAI 2011), pp. 1794–1795. Association for the Advancement of Artificial Intelligence, San Francisco (2011)

    Google Scholar 

  12. Ji, Z., Dasgupta, D.: V-detector: an efficient negative selection algorithm with “probably adequate” detector coverage. Inf. Sci. 179(10), 1390–1406 (2009)

    Article  Google Scholar 

  13. Li, W.J.: MILD: multiple-instance learning via disambiguation. IEEE Trans. Knowl. Data Eng. 22(1), 76–89 (2010)

    Article  Google Scholar 

  14. Li, M., Kwok, J.T., Lu, B.L.: Online multiple instance learning with no regret. In: Boykov, Y., Schmidt, F.R., Kahl, F., Lempitsky, V. (eds.) Proceedings of the International Conference on Computer Vision and Pattern Recognition (CVPR 2010), pp. 1395–1401. IEEE, Los Alamitos (2010)

    Google Scholar 

  15. Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer, Boston (1998)

    Book  MATH  Google Scholar 

  16. Liu, H., Motoda, H.: Instance selection and construction for data mining. Kluwer Academic Publisher, Norwell (2001)

    Book  Google Scholar 

  17. Liu, H., Motoda, H.: Computational Methods of Feature Selection. CRC Press, Boca Raton (2007)

    MATH  Google Scholar 

  18. Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. 17(4), 491–502 (2005)

    Article  Google Scholar 

  19. Mao, Q., Tsang, I.W.H.: A feature selection method for multivariate performance measures. IEEE Trans. Pattern Anal. 35(9), 2051–2063 (2013)

    Article  Google Scholar 

  20. Ngiam, J., Goh, H.: Learning global and regional features for photo annotation. In: Peters, C., Muller, H., Caputo, B., Gonzalo, J., Jones, G.J.F., Kalpathy-Cramer, J., Former, P., Giampiccolo, D. (eds.) Proceedings of 10th Workshop of Cross-Language Evaluation Forum for European Languages (CLEF 2009), pp. 287–290. Springer, Berlin (2009)

    Google Scholar 

  21. Robnikikonja, M., Kononenko, I.: Theoretical and empirical analysis of ReliefF and RReliefF. J. Mach. Learn. Res. 53(1–2), 23–69 (2003)

    Article  MATH  Google Scholar 

  22. Triguero, I., Derrac, J., García, S., Herrera, F.: A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans. Syst. Man Cybern. C 42(1), 86–100 (2012)

    Article  Google Scholar 

  23. Whitley, D.: A genetic algorithm tutorial. Stat. Comput. 4(2), 65–85 (1994)

    Article  Google Scholar 

  24. Yager, R.R.: On ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Trans. Syst. Man Cybern. 18(1), 183–190 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  25. Yuan, X., Wang, M., Song, Y.: Concept-dependent image annotation via existence-based multiple-instance learning. In: Proceedings of the IEEE International Conference on Systems. Man and Cybernetics (SMC 2009), pp. 4112–4117. IEEE, Los Alamitos (2009)

    Google Scholar 

  26. Yuan, L., Liu, J., Tang, X.: Combining example selection with instance selection to speed up multiple-instance learning. Neurocomputing 129, 504–515 (2014)

    Article  Google Scholar 

  27. Yuan, X., Hua, X.S., Wang, M., Qi, G.J., Wu, X.Q.: A novel multiple instance learning approach for image retrieval based on adaboost feature selection. In: Yun-Qing, S., Liao, M., Hu, Y.H., Sheu, P., Ostermann, J. (eds.) Proceedings of the International Conference on Multimedia and Expo (ICME 2007), pp. 1491–1494. IEEE Service Center, Piscataway (2007)

    Google Scholar 

  28. Zafra, A., Pechenizkiy, M., Ventura, S.: ReliefF-MI: an extension of ReliefF to multiple instance learning. Neurocomputing 75(1), 210–218 (2012)

    Article  Google Scholar 

  29. Zafra, A., Pechenizkiy, M., Ventura, S.: HyDR-MI: a hybrid algorithm to reduce dimensionality in multiple instance learning. Inf. Sci. 222, 282–301 (2013)

    Article  MathSciNet  Google Scholar 

  30. Zhang, Q., Goldman, S.: EM-DD: an improved multiple-instance learning technique. In: Becker, S., Thrun, S., Obermayer, K. (eds.) Proceedings of the 17th Conference on Advances in Neural Information Processing Systems (NIPS 1998), pp. 1073–1080. MIT Press, Cambridge (1998)

    Google Scholar 

  31. Zhang, T., Liu, J., Liu, S., Xu, C., Lu, H.: Boosted exemplar learning for action recognition and annotation. IEEE Trans. Circ. Syst. Video 21(7), 853–866 (2011)

    Article  Google Scholar 

  32. Zhu, L., Zhao, B., Gao, Y.: Multi-class multi-instance learning for lung cancer image classification based on bag feature selection. In: Wang, L., Jin, Y. (eds.) Proceedings of the 5th International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2008). Lecture Notes in Artificial Intelligence, pp. 487–492. Springer, Berlin (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francisco Herrera .

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this chapter

Cite this chapter

Herrera, F. et al. (2016). Data Reduction. In: Multiple Instance Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-47759-6_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-47759-6_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-47758-9

  • Online ISBN: 978-3-319-47759-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics