A two stages sparse SVM training

Li, Ziqiang; Zhou, Mingtian; Lin, Hao; Pu, Haibo

doi:10.1007/s13042-013-0181-5

Ziqiang Li^1,2,
Mingtian Zhou²,
Hao Lin³ &
…
Haibo Pu¹

288 Accesses
7 Citations
Explore all metrics

Abstract

The small number of support vectors is an important factor for SVM to fast deal with very large scale problems. This paper considers fitting each class of data with a plane by a new model, which captures separability information between classes and can be solved by fast core set methods. Then training on the core sets of the fitting-planes yields a very sparse SVM classifier. The computing complexity of the proposed algorithm is up bounded by $ {\text{\rm O}}(1/\varepsilon ) $. Experimental results show that the new algorithm trains faster than both CVM and SVMperf averagely, and with comparable generalization performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning from imbalanced data: open challenges and future directions

Article Open access 22 April 2016

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Survey on SVM and their application in image classification

Article 11 January 2018

References

Bach FR, Jordan MI (2005) Predictive low-rank decomposition for kernel methods. In: 22nd international conference on machine learning, Bonn. ICML 2005. Association for Computing Machinery, pp 33–40
Badoiu M, Clarkson KL (2008) Optimal core-sets for balls. Comput Geom Theory Appl 40(1):14–22. doi:10.1016/j.comgeo.2007.04.002
Article MATH MathSciNet Google Scholar
Burges CJC (1996) Simplified support vector decision rules. In: Proceedings of 13th international conference on machine learning, p 7
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Downs T, Gates KE, Masters A (2002) Exact simplification of support vector solutions. J Mach Learn Res 2(2):293–297. doi:10.1162/15324430260185637
MATH MathSciNet Google Scholar
Fan R-E, Chen P-H, Lin C-J (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918
MATH MathSciNet Google Scholar
Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910. doi:10.1109/tpami.2007.1068
Joachims T (1998) Making large scale SVM learning practical. Advances in kernel methods—support vector learning
Joachims T, Yu CNJ (2009) Sparse kernel SVMs via cutting-plane training. Mach Learn 76(2–3):179–193. doi:10.1007/s10994-009-5126-6
Article Google Scholar
Keerthi SS, Chapelle O, DeCoste D (2006) Building support vector machines with reduced classifier complexity. J Mach Learn Res 7:1493–1515
MATH MathSciNet Google Scholar
Lee YJ, Huang SY (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18(1):1–13. doi:10.1109/tnn.2006.883722
Google Scholar
Liang X, Chen RC, Guo XY (2008) Pruning support vector machines without altering performances. IEEE Trans Neural Netw 19(10):1792–1803. doi:10.1109/tnn.2008.2002696
Article Google Scholar
Licheng J, Liefeng B, Ling W (2007) Fast sparse approximation for least squares support vector machine. IEEE Trans Neural Netw 18(3):685–697
Article Google Scholar
Lin KM, Lin CJ (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14(6):1449–1459. doi:10.1109/tnn.2003.820828
Article Google Scholar
Peng XJ (2011) Building sparse twin support vector machine classifiers in primal space. Inf Sci 181(18):3967–3980. doi:10.1016/j.ins.2011.05.004
Article Google Scholar
Smola A, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. Paper presented at the ICML
Sun P, Yao X (2010) Sparse approximation through boosting for learning large scale kernel machines. IEEE Trans Neural Netw 21(6):883–894. doi:10.1109/tnn.2010.2044244
Article MathSciNet Google Scholar
Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300
Article MathSciNet Google Scholar
Tsang IWH, Kwok JTY, Zurada JM (2006) Generalized core vector machines. IEEE Trans Neural Netw 17(5):1126–1140. doi:10.1109/tnn.2006.878123
Article Google Scholar
Wu M, Scholkopf B, Bakir G Building sparse large margin classifiers. In: 22nd international conference on machine learning, Bonn. ICML 2005. Association for Computing Machinery, pp 1001–1008
Khemchandani R, Karpatne A, Chandra S (2013) Twin support vector regression for the simultaneous learning of a function and its derivatives. Int J Mach Learn Cybern 4(1):51–63
Article Google Scholar
Wang X, Shu-Xia L, Zhai J-H (2008) Fast fuzzy multi-category SVM based on support vector domain description. Int J Pattern Recognit Artif Intell 22(1):109–120
Article Google Scholar

Download references

Acknowledgments

This work was supported by Scientific Research Fund of SiChuan Provincial Education Department under Grant No. 12ZA112 and the National Natural Science Foundation of China (No. 61202256).

Author information

Authors and Affiliations

School of Information and Engineering, Sichuan Agricultural University, Yaan, 625014, People’s Republic of China
Ziqiang Li & Haibo Pu
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, People’s Republic of China
Ziqiang Li & Mingtian Zhou
School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People’s Republic of China
Hao Lin

Authors

Ziqiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Mingtian Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Hao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Pu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ziqiang Li.

Appendix

Proof of Theorem 1

From the definition of $ \bar{K} $, given any vector $ \bar{\alpha } = \left[ {\begin{array}{*{20}c} {\alpha^{T} } & {\alpha^{*T} } \\ \end{array} } \right]^{T} \ne 0 $, and $ C^{\prime} = {\raise0.7ex\hbox{${m_{1} C_{n} }$} \!\mathord{\left/ {\vphantom {{m_{1} C_{n} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}} $, one has

$$ \begin{aligned} \bar{\alpha }^{T} \bar{K}\bar{\alpha } = & \left[ {\begin{array}{*{20}c} {\alpha^{T} } & {\alpha^{*T} } \\ \end{array} } \right]\bar{K}\left[ {\begin{array}{*{20}c} {\alpha^{T} } & {\alpha^{*T} } \\ \end{array} } \right]^{T} \\ = & \left[ {\begin{array}{*{20}c} {\alpha^{T} } & {\alpha^{*T} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {K_{AA} \alpha + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}E_{AA} \alpha + C^{\prime}I_{AA} \alpha - K_{AA} \alpha^{*} - {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}E_{AA} \alpha^{*} + C^{\prime}I_{AA} \alpha^{*} } \\ { - K_{AA} \alpha - {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}E_{AA} \alpha + C^{\prime}I_{AA} \alpha + K_{AA} \alpha^{*} + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}E_{AA} \alpha^{*} + C^{\prime}I_{AA} \alpha^{*} } \\ \end{array} } \right] \\ = & \alpha^{T} K_{AA} \alpha + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}\alpha^{T} E_{AA} \alpha + C^{\prime}\alpha^{T} I_{AA} \alpha \\ & \quad - 2\alpha^{*T} K_{AA} \alpha - \alpha^{*T} E_{AA} \alpha + 2C^{\prime}\alpha^{*T} I_{AA} \alpha \\ & \quad + \alpha^{*T} K_{AA} \alpha^{*} + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}\alpha^{*T} E_{AA} \alpha^{*} + C^{\prime}\alpha^{*T} I_{AA} \alpha^{*} \\ & = (\alpha^{T} - \alpha^{*T} )K_{AA} (\alpha - \alpha^{*} ) + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}(\alpha^{T} - \alpha^{*T} )E_{AA} (\alpha - \alpha^{*} ) \\ & \quad + C^{\prime}(\alpha^{T} + \alpha^{*T} )I_{AA} (\alpha + \alpha^{*} ) \\ \end{aligned} $$

Because matrix $ E_{AA} $ is positive semidefinite and I _AA is positive definite, so, If K _AA is positive semidefinite, then $ \bar{K} $ is positive semidefinite. Furthermore, for the new learning algorithm, $ \alpha \ne - \alpha^{*} , \alpha \succ 0, \alpha^{*} \succ 0 $, so, $ \bar{\alpha }^{\text{T}} \bar{K}\bar{\alpha } $ is always positive. Thus we say $ \bar{K} $ is actually strict positive definite in the feasible set. $\square $

Proof of Theorem 2

For the CCMEB of (8), like (10), one has

$$ R = \sqrt { - \bar{\alpha }^{T} \bar{K}\bar{\alpha } + \bar{\alpha }^{T} (diag(\bar{K}) + \Updelta )} ,\quad c = \sum\limits_{i = 1}^{{2m_{1} }} {\bar{\alpha }_{i} } \bar{\varphi }(x_{i} ) $$

Combining it with (16) and the definition of Δ yields

$$ R^{2} = 2\rho_{ + } /C + \left\| c \right\|^{2} + \eta $$

(20)

From (13), (15), (20), (19) and the definition of $ \bar{K} $, the squared distance between the center and any point $ \bar{\varphi }(x_{j} ) $ is

$$ \begin{aligned} (dist(j))^{2} = & \left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta + \left[ {\begin{array}{*{20}c} { - 2\bar{p}} \\ {2\bar{p}} \\ \end{array} } \right]_{j} \\ = & \left\{ {\begin{array}{*{20}c} {\left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta - 2\bar{p}_{j} , j \prec m_{1} } \\ {\left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta + 2\bar{p}_{j} , m_{1} \le j \prec 2m_{1} } \\ \end{array} } \right. \\ \end{aligned} $$

For $ j \prec m_{1} $,

$$ \begin{aligned} \left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta - 2\bar{p}_{j} = & \left\| c \right\|^{2} + \eta - 2\bar{p}_{j} - 2(\bar{K}\bar{\alpha })_{j} \\ = & \left\| c \right\|^{2} + \eta - 2\bar{p}_{j} - 2\left( {\sum\limits_{i = 0}^{{m_{1} }} {\left\langle {\bar{\varphi }(x_{i} ),\bar{\varphi }(x_{j} )} \right\rangle } \bar{\alpha }_{i} } \right. \\ & \quad + \left. {\sum\limits_{{i = m_{1} }}^{{2m_{1} }} {\left\langle {\bar{\varphi }(x_{i} ),\bar{\varphi }(x_{j} )} \right\rangle } \bar{\alpha }_{i} } \right) \\ & = \left\| c \right\|^{2} + \eta - 2\bar{p}_{j} - 2\left( {\sum\limits_{i = 0}^{{m_{1} }} {\left\langle {\varphi (x_{i} ),\varphi (x_{j} )} \right\rangle } + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}} + \delta_{i} \delta_{j} } \right)\bar{\alpha }_{i} \\ & \quad + \sum\limits_{{i = m_{1} }}^{{2m_{1} }} {\left( { - \left\langle {\varphi (x_{{i - m_{1} }} ),\varphi (x_{j} ) - {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}} + \delta_{i} \delta_{j} } \right\rangle } \right)\bar{\alpha }_{i} + {\raise0.7ex\hbox{${C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }_{{m_{1} + j}} )} \\ & = \left\| c \right\|^{2} + \eta - 2\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}( - w_{ + }^{T} \varphi (x_{j} ) - b_{ + } ) + {\raise0.7ex\hbox{${C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} )} \right) \\ & = \left\| c \right\|^{2} + \eta + {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & = R^{2} - 2\rho_{ + } /C + {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ \end{aligned} $$

Likewise, for $ m_{1} \le j \prec 2m_{1} $,

$$ \left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta + 2\bar{p}_{j} = R^{2} - 2\rho_{ + } /C - {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) $$

So, for each point $ x_{i} \in CS_{P}^{t} $ with nonzero Lagrange multiplier, one has

if $ j \prec m_{1} $, then

$$ \begin{aligned} R^{2} = & R^{2} - 2\rho_{ + } /C + {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & \quad \Rightarrow w_{ + }^{T} \varphi (x_{j} ) + b_{ + } = \rho_{ + } + C_{n} m_{1} (\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & \quad \therefore \quad w_{ + }^{T} \varphi (x_{j} ) + b_{ + } \succ \rho_{ + } \\ \end{aligned} $$

if $ m_{1} \le j \prec 2m_{1} $, then

$$ \begin{aligned} R^{2} = & R^{2} - 2\rho_{ + } /C - {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & \quad \Rightarrow - (w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) = \rho_{ + } + C_{n} m_{1} (\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & \quad \therefore \quad w_{ + }^{T} \varphi (x_{j} ) + b_{ + } \prec - \rho_{ + } \\ \end{aligned} $$

Thus the point $ x_{i} \in CS_{P}^{t} $ with nonzero Lagrange multiplier must lie out of the slab.

One can also prove, in the same way, that each point $ x_{i} \in CS_{P}^{t} $ on the ball $ B(c^{t} ,r^{t} ) $ with zero Lagrange multiplier must lie on bounding planes of the slab; each point $ x_{i} \in CS_{P}^{t} $ inside the ball $ B(c^{t} ,r^{t} ) $ must lie inside the slab. Details are deleted to save space. $\square $

Proof of Theorem 3

Similar to the proof of theorem 2. $\square $

Proof of Theorem 4

From (6) and (7), we can easily have $ {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {m_{1} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${m_{1} }$}}\xi_{ + }^{T} e_{{m_{1} }} = C_{n} $. So the parameter C _n represents the average extent each point out of the slab.$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Zhou, M., Lin, H. et al. A two stages sparse SVM training. Int. J. Mach. Learn. & Cyber. 5, 425–434 (2014). https://doi.org/10.1007/s13042-013-0181-5

Download citation

Received: 10 May 2012
Accepted: 18 June 2013
Published: 03 July 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s13042-013-0181-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A two stages sparse SVM training

Abstract

Access this article

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

Feature selection techniques for machine learning: a survey of more than two decades of research

Survey on SVM and their application in image classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A two stages sparse SVM training

Abstract

Access this article

Similar content being viewed by others

Learning from imbalanced data: open challenges and future directions

Feature selection techniques for machine learning: a survey of more than two decades of research

Survey on SVM and their application in image classification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Theorem 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation