Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces

Montañés, Diana C.; Quiroz, Adolfo J.; Dulce Rubio, Mateo; Riascos Villegas, Alvaro J.

doi:10.1007/s11590-020-01616-w

Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces

Original Paper
Published: 13 July 2020

Volume 15, pages 391–404, (2021)
Cite this article

Optimization Letters Aims and scope Submit manuscript

Diana C. Montañés¹,
Adolfo J. Quiroz¹,
Mateo Dulce Rubio² &
…
Alvaro J. Riascos Villegas ORCID: orcid.org/0000-0002-6325-5559^1,2

229 Accesses
3 Citations
Explore all metrics

Abstract

In the context of support vector machines, identifying the support vectors is a key issue when dealing with large data sets. In Camelo et al. (Ann Oper Res 235:85–101, 2015), the authors present a promising approach to finding or approximating most of the support vectors through a procedure based on sub-sampling and enriching the support vector sets by nearest neighbors. This method has been shown to improve the computational efficiency of support vector machines on large data sets with low or intermediate feature space dimension. In the present article we discuss ways of adapting the nearest neighbor enriching methodology to the context of very high dimensional data, such as text data or other high dimensional data types, for which nearest neighbor queries involve, in principle, a high computational cost. Our approach incorporates the proximity preserving order search algorithm of Chavez et al. (MICAI 2005: advances in artificial intelligence, Springer, Berlin, pp 405–414, 2005), into the nearest neighbor enriching method of Camelo et al. (2015), in order to adapt this procedure to the high dimension setting. For the required set of pivots, both random pivots and the base prototype pivot set of Micó et al. (Pattern Recogn Lett 15:9–17, 2015), are considered. The methodology proposed is evaluated on real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nearest neighbors methods for support vector machines

Article 22 August 2015

An Empirical Comparison of Support Vector Machines Versus Nearest Neighbour Methods for Machine Learning Applications

Towards Optimizing Data Analysis for Multi-dimensional Data Sets

Notes

https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#real-sim.
Features with zero values in every entry were removed.
https://archive.ics.uci.edu/ml/datasets/Arcene.
http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/.

References

Camelo, S., Gonzalez-Lima, M., Quiroz, A.J.: Nearest neighbors methods for support vector machines. Ann. Oper. Res. 235, 85–101 (2015)
Article MathSciNet Google Scholar
Chavez, E., Figueroa, K., Navarro, G.: Proximity searching in high dimensional spaces with a proximity preserving order. In: MICAI 2005: Advances in Artificial Intelligence, pp. 405–414. Springer, Berlin (2005)
Chavez, E., Navarro, G.: An effective clustering algorithm to index high dimensional metric spaces. In: SPIRE 2000. Proceedings of the Seventh International Symposium on String Processing and Information Retrieval, pp. 75–86. IEEE, Computer Science (2000)
Cortes, C., Vapnik, V.N.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, Cambridge (2000)
Book Google Scholar
Diaconis, P., Graham, R.L.: Spearman’s footrule as a measure of disarray. J. R. Stat. Soc. Ser. B (Methodol.) 39, 262–268 (1977)
MathSciNet MATH Google Scholar
Freund, R., Osuna, E., Girosi, F.: An improved training algorithm for support vector machines. In: Neural Networks for Signal Processing VII. Proceedings of the 1997 IEEE Workshop, pp. 276–285 (1997)
Gieseke, F., Airola, A., Pahikkala, T., Kramer, O.: Fast and simple gradient-based optimization for semi-supervised support vector machines. Neurocomputing 123, 23–32 (2014)
Article Google Scholar
Hart, P., Duda, R., Stork, D.: Pattern Classification. Wiley, Hoboken (2000)
MATH Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, New York (2008)
MATH Google Scholar
Kim, D., Der, M., Saul, L.: A Gaussian latent variable model for large margin classification of labeled and unlabeled data. In: Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS) 2014, Reykjavik, Iceland. W&CP, JMLR, vol. 33, pp. 484–492 (2014)
Mico, M.L., Oncino, J., Vidal, E.: A new version of the nearest neighbours approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements. Pattern Recogn. Lett. 15, 9–17 (2015)
Article Google Scholar
Mangasarian, O., Musicant, D.: Succesive overrelaxation for support vector machines. IEEE Trans. Neural Netw. 10, 1032–1037 (1999)
Article Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 41–65. MIT Press, Cambridge (1998)
Google Scholar
Shin, H., Cho, S.: Neighborhood property based pattern selection for support vector machines. Neural Comput. 19, 816–855 (2007)
Article Google Scholar
Sindhwani, V., Keerthi, S.S.: Large scale semi-supervised linear SVMs. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 477–484. ACM (2006)
Sindhwani, V., Keerthi, S.S.: Newton methods for fast solution of semi-supervised linear SVMs. In: Bottou, L., Chapelle, O., DeCoste, D., Weston, J. (eds.) Large Scale Kernel Machines, pp. 155–174. MIT Press (2007)
Suykens, J.A.K., van Gestel, T., De Brabanter, J., De Moore, B., Vandewalle, J.: Least Squares Support Vector Machines. World Scientific Publishing Co., Hackensack (2002)
Book Google Scholar
Teo, C.H., Vishwanthan, S.V.N., Smola, A.J., Le, Q.V.: Bundle methods for regularized risk minimization. J. Mach. Learn. Res. 11(Jan), 311–365 (2010)
MathSciNet MATH Google Scholar
Zhang, X., Saha, A., Vishwanathan, S.V.N.: Smoothing multivariate performance measures. J. Mach. Learn. Res. 13(Dec), 3623–3680 (2012)
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Universidad de los Andes, Bogotá, Colombia
Diana C. Montañés, Adolfo J. Quiroz & Alvaro J. Riascos Villegas
Quantil, Bogotá, Colombia
Mateo Dulce Rubio & Alvaro J. Riascos Villegas

Authors

Diana C. Montañés
View author publications
You can also search for this author in PubMed Google Scholar
Adolfo J. Quiroz
View author publications
You can also search for this author in PubMed Google Scholar
Mateo Dulce Rubio
View author publications
You can also search for this author in PubMed Google Scholar
Alvaro J. Riascos Villegas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alvaro J. Riascos Villegas.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Montañés, D.C., Quiroz, A.J., Dulce Rubio, M. et al. Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces. Optim Lett 15, 391–404 (2021). https://doi.org/10.1007/s11590-020-01616-w

Download citation

Received: 06 September 2019
Accepted: 30 June 2020
Published: 13 July 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11590-020-01616-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces

Abstract

Access this article

Similar content being viewed by others

Nearest neighbors methods for support vector machines

An Empirical Comparison of Support Vector Machines Versus Nearest Neighbour Methods for Machine Learning Applications

Towards Optimizing Data Analysis for Multi-dimensional Data Sets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient nearest neighbors methods for support vector machines in high dimensional feature spaces

Abstract

Access this article

Similar content being viewed by others

Nearest neighbors methods for support vector machines

An Empirical Comparison of Support Vector Machines Versus Nearest Neighbour Methods for Machine Learning Applications

Towards Optimizing Data Analysis for Multi-dimensional Data Sets

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation