Dynamically Adaptive Genetic Algorithm to Select Training Data for SVMs

Kawulok, Michal; Nalepa, Jakub

doi:10.1007/978-3-319-12027-0_20

Michal Kawulok⁶ &
Jakub Nalepa⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8864))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1655 Accesses
8 Citations

Abstract

This paper addresses an important problem of training set selection for support vector machines (SVMs). It is a critical step in case of large and noisy data sets due to high time and memory complexity of the SVM training. There have been several methods proposed so far, in majority underpinned with the analysis of data geometry either in the input or kernel space. Here, we propose a new dynamically adaptive genetic algorithm (DAGA) to select valuable training sets. We demonstrate that not only can DAGA quickly select the training data, but in addition it dynamically determines the desired training set size without any prior information. We analyze the impact of the support vectors ratio, defined as the percentage of support vectors in the training set, on the DAGA performance. Also, we investigate and discuss the possibility of incorporating reduced SVMs into the proposed algorithm. Extensive experimental study shows that DAGA offers fast and effective training set optimization that is independent on the entire training set size.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cortes, C., Vapnik, V.: Support-Vector Networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Kawulok, M., Nalepa, J.: Support vector machines training data selection using a genetic algorithm. In: Gimel’farb, G., Hancock, E., Imiya, A., Kuijper, A., Kudo, M., Omachi, S., Windeatt, T., Yamada, K. (eds.) SSPR & SPR 2012. LNCS, vol. 7626, pp. 557–565. Springer, Heidelberg (2012)
Chapter Google Scholar
Nalepa, J., Kawulok, M.: Adaptive genetic algorithm to select training set for support vector machines. In: EvoIASP, EvoApp. LNCS. Springer (in press, 2014)
Google Scholar
Joachims, T.: Making large-scale SVM learning practical. In: Schölkopf, B., Burges, C.J.C., Smola, A.J. (eds.) Advances in Kernel Methods, pp. 169–184. MIT Press (1999)
Google Scholar
Rodriguez-Lujan, I., Cruz, C.S., Huerta, R.: Hierarchical linear support vector machine. Patt. Recogn. 45(12), 4414–4427 (2012)
Article MATH Google Scholar
Le, Q., Sarlos, T., Smola, A.: Fastfood - approximating kernel expansions in loglinear time. In: Proc. ICML (2013)
Google Scholar
Balcázar, J., Dai, Y., Watanabe, O.: A Random Sampling Technique for Training Support Vector Machines. In: Abe, N., Khardon, R., Zeugmann, T. (eds.) ALT 2001. LNCS (LNAI), vol. 2225, pp. 119–134. Springer, Heidelberg (2001)
Chapter Google Scholar
Ferragut, E., Laska, J.: Randomized sampling for large data applications of SVM. In: Int. Conf. on Mach. Learning and App., vol. 1, pp. 350–355 (2012)
Google Scholar
Lee, Y.J., Huang, S.Y.: Reduced support vector machines: A statistical theory. IEEE Trans. on Neural Networks 18(1), 1–13 (2007)
Article Google Scholar
Chang, C.C., Pao, H.K., Lee, Y.J.: An RSVM based two-teachers-one-student semi-supervised learning algorithm. Neural Networks 25, 57–69 (2012)
Article Google Scholar
Chien, L.J., Chang, C.C., Lee, Y.J.: Variant methods of reduced set selection for reduced support vector machines. J. Inf. Sci. Eng. 26(1), 183–196 (2010)
MATH Google Scholar
Koggalage, R., Halgamuge, S.: Reducing the number of training samples for fast support vector machine classification. Neural Information Process. Lett. and Reviews 2(3), 57–65 (2004)
Google Scholar
Shin, H., Cho, S.: Neighborhood property-based pattern selection for SVMs. Neural Comput. 19(3), 816–855 (2007)
Article MATH Google Scholar
Abe, S., Inoue, T.: Fast Training of Support Vector Machines by Extracting Boundary Data. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, pp. 308–313. Springer, Heidelberg (2001)
Chapter Google Scholar
Wang, D., Shi, L.: Selecting valuable training samples for SVMs via data structure analysis. Neurocomputing 71, 2772–2781 (2008)
Article Google Scholar
Salvador, S., Chan, P.: Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. In: Proc. IEEE ICTAI, pp. 576–584 (2004)
Google Scholar
Wang, J., Neskovic, P., Cooper, L.N.: Training data selection for SVMs. In: Adv. in Natural Comp., pp. 554–564. Springer (2005)
Google Scholar
Lopez-Chau, A., Li, X., Yu, W.: Convex-concave hull for classification with SVM. In: Proc. IEEE ICDMW, pp. 431–438 (2012)
Google Scholar
Zhang, W., King, I.: Locating support vectors via \(\beta \)-skeleton technique. In: Int. Conf. on Neural Inf. Process., pp. 1423–1427 (2002)
Google Scholar
Tsang, I.W., Kwok, J.T., Cheung, P.M.: Core vector machines: Fast SVM training on very large data sets. J. of Machine Learning Research 6, 363–392 (2005)
MATH MathSciNet Google Scholar
Zeng, Z.Q., Xu, H.R., Xie, Y.Q., Gao, J.: A geometric approach to train SVM on very large data sets. Intell. Sys. and Knowl. Eng. 1, 991–996 (2008)
Google Scholar
Musicant, D.R., Feinberg, A.: Active set support vector regression. IEEE Trans. on Neural Networks 15(2), 268–275 (2004)
Article Google Scholar
Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: Int. Conf. on Mach. Learning, pp. 839–846 (2000)
Google Scholar
Nalepa, J., Kawulok, M.: A memetic algorithm to select training data for support vector machines. In: Proc. of the 2014 Conf. on Genetic and Evolutionary Computation, GECCO 2014, pp. 573–580. ACM (2014)
Google Scholar
Nalepa, J., Czech, Z.J.: New Selection Schemes in a Memetic Algorithm for the Vehicle Routing Problem with Time Windows. In: Tomassini, M., Antonioni, A., Daolio, F., Buesser, P. (eds.) ICANNGA 2013. LNCS, vol. 7824, pp. 396–405. Springer, Heidelberg (2013)
Chapter Google Scholar
Elamin, E.E.A.: A proposed genetic algorithm selection method. In: 1st National Symposium (NITS) (2006)
Google Scholar
Lee, J.S., Kuo, Y.M., Chung, P.C., Chen, E.L.: Naked image detection based on adaptive and extensible skin color model. Pattern Recognit. 40, 2261–2270 (2007)
Article MATH Google Scholar
Phung, S.L., Chai, D., Bouzerdoum, A.: Adaptive skin segmentation in color images. In: Proc. IEEE ICASSP, pp. 353–356 (2003)
Google Scholar
Hsu, C.W., Chang, C.C., Lin, C.J., et al.: A practical guide to support vector classification (2003)
Google Scholar
Lin, K.M., Lin, C.J.: A study on reduced support vector machines. IEEE Trans. on Neural Networks 14(6), 1449–1459 (2003)
Article Google Scholar
Simiński, K.: Transformation of Input Domain for SVM in Regression Task. In: Gruca, A., Czachórski, T., Kozielski, S. (eds.) Man-Machine Interactions 3. AISC, vol. 242, pp. 423–430. Springer, Heidelberg (2014)
Chapter Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Michal Kawulok & Jakub Nalepa

Authors

Michal Kawulok
View author publications
You can also search for this author in PubMed Google Scholar
Jakub Nalepa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakub Nalepa .

Editor information

Editors and Affiliations

Universidade Federal do Rio Grande do Sul, Porto Alegre, Brazil
Ana L.C. Bazzan
Pontifica Universidad Católica (PUC), Santiago de Chile, Chile
Karim Pichara

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kawulok, M., Nalepa, J. (2014). Dynamically Adaptive Genetic Algorithm to Select Training Data for SVMs. In: Bazzan, A., Pichara, K. (eds) Advances in Artificial Intelligence -- IBERAMIA 2014. IBERAMIA 2014. Lecture Notes in Computer Science(), vol 8864. Springer, Cham. https://doi.org/10.1007/978-3-319-12027-0_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-12027-0_20
Published: 12 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12026-3
Online ISBN: 978-3-319-12027-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics