Skip to main content
Log in

A two stages sparse SVM training

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The small number of support vectors is an important factor for SVM to fast deal with very large scale problems. This paper considers fitting each class of data with a plane by a new model, which captures separability information between classes and can be solved by fast core set methods. Then training on the core sets of the fitting-planes yields a very sparse SVM classifier. The computing complexity of the proposed algorithm is up bounded by \( {\text{\rm O}}(1/\varepsilon ) \). Experimental results show that the new algorithm trains faster than both CVM and SVMperf averagely, and with comparable generalization performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Bach FR, Jordan MI (2005) Predictive low-rank decomposition for kernel methods. In: 22nd international conference on machine learning, Bonn. ICML 2005. Association for Computing Machinery, pp 33–40

  2. Badoiu M, Clarkson KL (2008) Optimal core-sets for balls. Comput Geom Theory Appl 40(1):14–22. doi:10.1016/j.comgeo.2007.04.002

    Article  MATH  MathSciNet  Google Scholar 

  3. Burges CJC (1996) Simplified support vector decision rules. In: Proceedings of 13th international conference on machine learning, p 7

  4. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  5. Downs T, Gates KE, Masters A (2002) Exact simplification of support vector solutions. J Mach Learn Res 2(2):293–297. doi:10.1162/15324430260185637

    MATH  MathSciNet  Google Scholar 

  6. Fan R-E, Chen P-H, Lin C-J (2005) Working set selection using second order information for training support vector machines. J Mach Learn Res 6:1889–1918

    MATH  MathSciNet  Google Scholar 

  7. Jayadeva, Khemchandani R, Chandra S (2007) Twin support vector machines for pattern classification. IEEE Trans Pattern Anal Mach Intell 29(5):905–910. doi:10.1109/tpami.2007.1068

  8. Joachims T (1998) Making large scale SVM learning practical. Advances in kernel methods—support vector learning

  9. Joachims T, Yu CNJ (2009) Sparse kernel SVMs via cutting-plane training. Mach Learn 76(2–3):179–193. doi:10.1007/s10994-009-5126-6

    Article  Google Scholar 

  10. Keerthi SS, Chapelle O, DeCoste D (2006) Building support vector machines with reduced classifier complexity. J Mach Learn Res 7:1493–1515

    MATH  MathSciNet  Google Scholar 

  11. Lee YJ, Huang SY (2007) Reduced support vector machines: a statistical theory. IEEE Trans Neural Netw 18(1):1–13. doi:10.1109/tnn.2006.883722

    Google Scholar 

  12. Liang X, Chen RC, Guo XY (2008) Pruning support vector machines without altering performances. IEEE Trans Neural Netw 19(10):1792–1803. doi:10.1109/tnn.2008.2002696

    Article  Google Scholar 

  13. Licheng J, Liefeng B, Ling W (2007) Fast sparse approximation for least squares support vector machine. IEEE Trans Neural Netw 18(3):685–697

    Article  Google Scholar 

  14. Lin KM, Lin CJ (2003) A study on reduced support vector machines. IEEE Trans Neural Netw 14(6):1449–1459. doi:10.1109/tnn.2003.820828

    Article  Google Scholar 

  15. Peng XJ (2011) Building sparse twin support vector machine classifiers in primal space. Inf Sci 181(18):3967–3980. doi:10.1016/j.ins.2011.05.004

    Article  Google Scholar 

  16. Smola A, Schölkopf B (2000) Sparse greedy matrix approximation for machine learning. Paper presented at the ICML

  17. Sun P, Yao X (2010) Sparse approximation through boosting for learning large scale kernel machines. IEEE Trans Neural Netw 21(6):883–894. doi:10.1109/tnn.2010.2044244

    Article  MathSciNet  Google Scholar 

  18. Suykens JAK, Vandewalle J (1999) Least squares support vector machine classifiers. Neural Process Lett 9(3):293–300

    Article  MathSciNet  Google Scholar 

  19. Tsang IWH, Kwok JTY, Zurada JM (2006) Generalized core vector machines. IEEE Trans Neural Netw 17(5):1126–1140. doi:10.1109/tnn.2006.878123

    Article  Google Scholar 

  20. Wu M, Scholkopf B, Bakir G Building sparse large margin classifiers. In: 22nd international conference on machine learning, Bonn. ICML 2005. Association for Computing Machinery, pp 1001–1008

  21. Khemchandani R, Karpatne A, Chandra S (2013) Twin support vector regression for the simultaneous learning of a function and its derivatives. Int J Mach Learn Cybern 4(1):51–63

    Article  Google Scholar 

  22. Wang X, Shu-Xia L, Zhai J-H (2008) Fast fuzzy multi-category SVM based on support vector domain description. Int J Pattern Recognit Artif Intell 22(1):109–120

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by Scientific Research Fund of SiChuan Provincial Education Department under Grant No. 12ZA112 and the National Natural Science Foundation of China (No. 61202256).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ziqiang Li.

Appendix

Appendix

Proof of Theorem 1

From the definition of \( \bar{K} \), given any vector \( \bar{\alpha } = \left[ {\begin{array}{*{20}c} {\alpha^{T} } & {\alpha^{*T} } \\ \end{array} } \right]^{T} \ne 0 \), and \( C^{\prime} = {\raise0.7ex\hbox{${m_{1} C_{n} }$} \!\mathord{\left/ {\vphantom {{m_{1} C_{n} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}} \), one has

$$ \begin{aligned} \bar{\alpha }^{T} \bar{K}\bar{\alpha } = & \left[ {\begin{array}{*{20}c} {\alpha^{T} } & {\alpha^{*T} } \\ \end{array} } \right]\bar{K}\left[ {\begin{array}{*{20}c} {\alpha^{T} } & {\alpha^{*T} } \\ \end{array} } \right]^{T} \\ = & \left[ {\begin{array}{*{20}c} {\alpha^{T} } & {\alpha^{*T} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {K_{AA} \alpha + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}E_{AA} \alpha + C^{\prime}I_{AA} \alpha - K_{AA} \alpha^{*} - {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}E_{AA} \alpha^{*} + C^{\prime}I_{AA} \alpha^{*} } \\ { - K_{AA} \alpha - {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}E_{AA} \alpha + C^{\prime}I_{AA} \alpha + K_{AA} \alpha^{*} + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}E_{AA} \alpha^{*} + C^{\prime}I_{AA} \alpha^{*} } \\ \end{array} } \right] \\ = & \alpha^{T} K_{AA} \alpha + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}\alpha^{T} E_{AA} \alpha + C^{\prime}\alpha^{T} I_{AA} \alpha \\ & \quad - 2\alpha^{*T} K_{AA} \alpha - \alpha^{*T} E_{AA} \alpha + 2C^{\prime}\alpha^{*T} I_{AA} \alpha \\ & \quad + \alpha^{*T} K_{AA} \alpha^{*} + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}\alpha^{*T} E_{AA} \alpha^{*} + C^{\prime}\alpha^{*T} I_{AA} \alpha^{*} \\ & = (\alpha^{T} - \alpha^{*T} )K_{AA} (\alpha - \alpha^{*} ) + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}}(\alpha^{T} - \alpha^{*T} )E_{AA} (\alpha - \alpha^{*} ) \\ & \quad + C^{\prime}(\alpha^{T} + \alpha^{*T} )I_{AA} (\alpha + \alpha^{*} ) \\ \end{aligned} $$

Because matrix \( E_{AA} \) is positive semidefinite and I AA is positive definite, so, If K AA is positive semidefinite, then \( \bar{K} \) is positive semidefinite. Furthermore, for the new learning algorithm, \( \alpha \ne - \alpha^{*} , \alpha \succ 0, \alpha^{*} \succ 0 \), so, \( \bar{\alpha }^{\text{T}} \bar{K}\bar{\alpha } \) is always positive. Thus we say \( \bar{K} \) is actually strict positive definite in the feasible set. \(\square \)

Proof of Theorem 2

For the CCMEB of (8), like (10), one has

$$ R = \sqrt { - \bar{\alpha }^{T} \bar{K}\bar{\alpha } + \bar{\alpha }^{T} (diag(\bar{K}) + \Updelta )} ,\quad c = \sum\limits_{i = 1}^{{2m_{1} }} {\bar{\alpha }_{i} } \bar{\varphi }(x_{i} ) $$

Combining it with (16) and the definition of Δ yields

$$ R^{2} = 2\rho_{ + } /C + \left\| c \right\|^{2} + \eta $$
(20)

From (13), (15), (20), (19) and the definition of \( \bar{K} \), the squared distance between the center and any point \( \bar{\varphi }(x_{j} ) \) is

$$ \begin{aligned} (dist(j))^{2} = & \left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta + \left[ {\begin{array}{*{20}c} { - 2\bar{p}} \\ {2\bar{p}} \\ \end{array} } \right]_{j} \\ = & \left\{ {\begin{array}{*{20}c} {\left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta - 2\bar{p}_{j} , j \prec m_{1} } \\ {\left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta + 2\bar{p}_{j} , m_{1} \le j \prec 2m_{1} } \\ \end{array} } \right. \\ \end{aligned} $$

For \( j \prec m_{1} \),

$$ \begin{aligned} \left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta - 2\bar{p}_{j} = & \left\| c \right\|^{2} + \eta - 2\bar{p}_{j} - 2(\bar{K}\bar{\alpha })_{j} \\ = & \left\| c \right\|^{2} + \eta - 2\bar{p}_{j} - 2\left( {\sum\limits_{i = 0}^{{m_{1} }} {\left\langle {\bar{\varphi }(x_{i} ),\bar{\varphi }(x_{j} )} \right\rangle } \bar{\alpha }_{i} } \right. \\ & \quad + \left. {\sum\limits_{{i = m_{1} }}^{{2m_{1} }} {\left\langle {\bar{\varphi }(x_{i} ),\bar{\varphi }(x_{j} )} \right\rangle } \bar{\alpha }_{i} } \right) \\ & = \left\| c \right\|^{2} + \eta - 2\bar{p}_{j} - 2\left( {\sum\limits_{i = 0}^{{m_{1} }} {\left\langle {\varphi (x_{i} ),\varphi (x_{j} )} \right\rangle } + {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}} + \delta_{i} \delta_{j} } \right)\bar{\alpha }_{i} \\ & \quad + \sum\limits_{{i = m_{1} }}^{{2m_{1} }} {\left( { - \left\langle {\varphi (x_{{i - m_{1} }} ),\varphi (x_{j} ) - {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 2}}\right.\kern-0pt} \!\lower0.7ex\hbox{$2$}} + \delta_{i} \delta_{j} } \right\rangle } \right)\bar{\alpha }_{i} + {\raise0.7ex\hbox{${C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }_{{m_{1} + j}} )} \\ & = \left\| c \right\|^{2} + \eta - 2\left( {{\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}( - w_{ + }^{T} \varphi (x_{j} ) - b_{ + } ) + {\raise0.7ex\hbox{${C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} )} \right) \\ & = \left\| c \right\|^{2} + \eta + {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & = R^{2} - 2\rho_{ + } /C + {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ \end{aligned} $$

Likewise, for \( m_{1} \le j \prec 2m_{1} \),

$$ \left\| c \right\|^{2} - 2(\bar{K}\bar{\alpha })_{j} + \eta + 2\bar{p}_{j} = R^{2} - 2\rho_{ + } /C - {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) $$

So, for each point \( x_{i} \in CS_{P}^{t} \) with nonzero Lagrange multiplier, one has

if \( j \prec m_{1} \), then

$$ \begin{aligned} R^{2} = & R^{2} - 2\rho_{ + } /C + {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & \quad \Rightarrow w_{ + }^{T} \varphi (x_{j} ) + b_{ + } = \rho_{ + } + C_{n} m_{1} (\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & \quad \therefore \quad w_{ + }^{T} \varphi (x_{j} ) + b_{ + } \succ \rho_{ + } \\ \end{aligned} $$

if \( m_{1} \le j \prec 2m_{1} \), then

$$ \begin{aligned} R^{2} = & R^{2} - 2\rho_{ + } /C - {\raise0.7ex\hbox{$2$} \!\mathord{\left/ {\vphantom {2 C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) - {\raise0.7ex\hbox{${2C_{n} m_{1} }$} \!\mathord{\left/ {\vphantom {{2C_{n} m_{1} } C}}\right.\kern-0pt} \!\lower0.7ex\hbox{$C$}}(\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & \quad \Rightarrow - (w_{ + }^{T} \varphi (x_{j} ) + b_{ + } ) = \rho_{ + } + C_{n} m_{1} (\bar{\alpha }_{j} + \bar{\alpha }^{*}_{j} ) \\ & \quad \therefore \quad w_{ + }^{T} \varphi (x_{j} ) + b_{ + } \prec - \rho_{ + } \\ \end{aligned} $$

Thus the point \( x_{i} \in CS_{P}^{t} \) with nonzero Lagrange multiplier must lie out of the slab.

One can also prove, in the same way, that each point \( x_{i} \in CS_{P}^{t} \) on the ball \( B(c^{t} ,r^{t} ) \) with zero Lagrange multiplier must lie on bounding planes of the slab; each point \( x_{i} \in CS_{P}^{t} \) inside the ball \( B(c^{t} ,r^{t} ) \) must lie inside the slab. Details are deleted to save space. \(\square \)

Proof of Theorem 3

Similar to the proof of theorem 2. \(\square \)

Proof of Theorem 4

From (6) and (7), we can easily have \( {\raise0.7ex\hbox{$1$} \!\mathord{\left/ {\vphantom {1 {m_{1} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${m_{1} }$}}\xi_{ + }^{T} e_{{m_{1} }} = C_{n} \). So the parameter C n represents the average extent each point out of the slab.\(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Z., Zhou, M., Lin, H. et al. A two stages sparse SVM training. Int. J. Mach. Learn. & Cyber. 5, 425–434 (2014). https://doi.org/10.1007/s13042-013-0181-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-013-0181-5

Keywords

Navigation