Abstract
In the context of support vector machines, identifying the support vectors is a key issue when dealing with large data sets. In Camelo et al. (Ann Oper Res 235:85–101, 2015), the authors present a promising approach to finding or approximating most of the support vectors through a procedure based on sub-sampling and enriching the support vector sets by nearest neighbors. This method has been shown to improve the computational efficiency of support vector machines on large data sets with low or intermediate feature space dimension. In the present article we discuss ways of adapting the nearest neighbor enriching methodology to the context of very high dimensional data, such as text data or other high dimensional data types, for which nearest neighbor queries involve, in principle, a high computational cost. Our approach incorporates the proximity preserving order search algorithm of Chavez et al. (MICAI 2005: advances in artificial intelligence, Springer, Berlin, pp 405–414, 2005), into the nearest neighbor enriching method of Camelo et al. (2015), in order to adapt this procedure to the high dimension setting. For the required set of pivots, both random pivots and the base prototype pivot set of Micó et al. (Pattern Recogn Lett 15:9–17, 2015), are considered. The methodology proposed is evaluated on real data sets.
Similar content being viewed by others
Notes
Features with zero values in every entry were removed.
References
Camelo, S., Gonzalez-Lima, M., Quiroz, A.J.: Nearest neighbors methods for support vector machines. Ann. Oper. Res. 235, 85–101 (2015)
Chavez, E., Figueroa, K., Navarro, G.: Proximity searching in high dimensional spaces with a proximity preserving order. In: MICAI 2005: Advances in Artificial Intelligence, pp. 405–414. Springer, Berlin (2005)
Chavez, E., Navarro, G.: An effective clustering algorithm to index high dimensional metric spaces. In: SPIRE 2000. Proceedings of the Seventh International Symposium on String Processing and Information Retrieval, pp. 75–86. IEEE, Computer Science (2000)
Cortes, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Diaconis, P., Graham, R.L.: Spearman’s footrule as a measure of disarray. J. R. Stat. Soc. Ser. B (Methodol.) 39, 262–268 (1977)
Freund, R., Osuna, E., Girosi, F.: An improved training algorithm for support vector machines. In: Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Workshop, pp. 276–285 (1997)
Gieseke, F., Airola, A., Pahikkala, T., Kramer, O.: Fast and simple gradient-based optimization for semi-supervised support vector machines. Neurocomputing 123, 23–32 (2014)
Hart, P., Duda, R., Stork, D.: Pattern Classification. Wiley, Hoboken (2000)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2008)
Kim, D., Der, M., Saul, L.: A Gaussian latent variable model for large margin classification of labeled and unlabeled data. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland. W&CP, JMLR, vol. 33, pp. 484–492 (2014)
Mico, M.L., Oncino, J., Vidal, E.: A new version of the nearest neighbours approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15, 9–17 (2015)
Mangasarian, O., Musicant, D.: Succesive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10, 1032–1037 (1999)
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 41–65. MIT Press, Cambridge (1998)
Shin, H., Cho, S.: Neighborhood property based pattern selection for support vector machines. Neural Comput. 19, 816–855 (2007)
Sindhwani, V., Keerthi, S.S.: Large scale semi-supervised linear SVMs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 477–484. ACM (2006)
Sindhwani, V., Keerthi, S.S.: Newton methods for fast solution of semi-supervised linear SVMs. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Kernel Machines, pp. 155–174. MIT Press (2007)
Suykens, J.A.K., van Gestel, T., De Brabanter, J., De Moore, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific Publishing Co., Hackensack (2002)
Teo, C.H., Vishwanthan, S.V.N., Smola, A.J., Le, Q.V.: Bundle methods for regularized risk minimization. J. Mach. Learn. Res. 11(Jan), 311–365 (2010)
Zhang, X., Saha, A., Vishwanathan, S.V.N.: Smoothing multivariate performance measures. J. Mach. Learn. Res. 13(Dec), 3623–3680 (2012)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Montañés, D.C., Quiroz, A.J., Dulce Rubio, M. et al. Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces. Optim Lett 15, 391–404 (2021). https://doi.org/10.1007/s11590-020-01616-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11590-020-01616-w