Massive Classification with Support Vector Machines

Do, Thanh Nghi; Le Thi, Hoai An

doi:10.1007/978-3-662-48145-5_8

Thanh Nghi Do¹⁵ &
Hoai An Le Thi¹⁶

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9240))

458 Accesses
1 Citations

Abstract

The new boosting of Least-Squares SVM (LS-SVM), Proximal SVM (PSVM), Newton SVM (NSVM) algorithms aim at classifying very large datasets on standard personal computers (PCs). We extend the PSVM, LS-SVM and NSVM in several ways to efficiently classify large datasets. We developed a row incremental version for datasets with billions of data points. By adding a Tikhonov regularization term and using the Sherman-Morrison-Woodbury formula, we developed new algorihms to process datasets with a small number of data points but very high dimensionality. Finally, by applying boosting including AdaBoost and Arcx4 to these algorithms, we developed classification algorithms for massive, very-high-dimensional datasets. Numerical test results on large datasets from the UCI repository showed that our algorithms are often significantly faster and/or more accurate than state-of-the-art algorithms LibSVM, CB-SVM, SVM-perf and LIBLINEAR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Guyon, I.: Web page on svm applications (1999). http://www.clopinet.com/isabelle/Projects/-SVM/app-list.html
Fayyad, U., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the kdd-03 panel - data mining: the next 10 years. SIGKDD Explor. 5(2), 191–196 (2004)
Article Google Scholar
Lyman, P., Varian, H.R., Swearingen, K., Charles, P., Good, N., Jordan, L., Pal, J.: How much information (2003). http://www.sims.berkeley.edu/research/projects/how-much-info-2003/
Boser, B., Guyon, I., Vapnik, V.: An training algorithm for optimal margin classifiers. In: Proceedings of 5th ACM Annual Workshop on Computational Learning Theoryof 5th ACM Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
Google Scholar
Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Principe, J., Gile, L., Morgan, N., Wilson, E., (eds.) Neural Networks for Signal Processing VII, pp. 276–285 (1997)
Google Scholar
Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)
Google Scholar
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. Adv. Neural Inf. Process. Syst. 13, 409–415 (2001)
Google Scholar
Do, T.N., Poulet, F.: Incremental svm and visualization tools for bio-medical data mining. In: Proceedings of Workshop on Data Mining and Text Mining in Bioinformatics, pp. 14–19 (2003)
Google Scholar
Do, T.N., Poulet, F.: Towards high dimensional data mining with boosting of psvm and visualization tools. In: Proceedings of 6th International Conference on Entreprise Information Systems, pp. 36–41 (2004)
Google Scholar
Do, T.N., Poulet, F.: Classifying one billion data with a new distributed svm algorithm. In: Proceedings of 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66. IEEE Press (2006)
Google Scholar
Fung, G., Mangasarian, O.: Incremental support vector machine classification. In: Proceedings of the 2nd SIAM International Conference on Data Mining (2002)
Google Scholar
Poulet, F., Do, T.N.: Mining very large datasets with support vector machine algorithms. In: Camp, O., Filipe, J., Hammoudi, S., Piattini, M., et al. (eds.) Enterprise Information Systems V, pp. 177–184. Kluwer Academic Publishers, Dordrecht (2004)
Google Scholar
Do, T.N., Le-Thi, H.A.: Classifying large datasets with svm. In: Proceedings of 4th International Conference on Computational Management Science (2007)
Google Scholar
Syed, N., Liu, H., Sung, K.: Incremental learning with support vector machines. In: Proceedings of the ACM SIGKDD International Conference on KDD. ACM (1999)
Google Scholar
Do, T.N., Poulet, F.: Mining very large datasets with svm and visualization. In: Proceedings of 7th International Conference on Entreprise Information Systems, pp. 127–134 (2005)
Google Scholar
Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the 17th International Conference on Machine Learning, pp. 999–1006. ACM (2000)
Google Scholar
Suykens, J., Vandewalle, J.: Least squares support vector machines classifiers. Neural Process. Lett. 9(3), 293–300 (1999)
Article MathSciNet Google Scholar
Fung, G., Mangasarian, O.: Proximal support vector classifiers. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 77–86. ACM (2001)
Google Scholar
Mangasarian, O.: A finite newton method for classification problems. Technical report 01–11, Data Mining Institute, Computer Sciences Department, University of Wisconsin (2001)
Google Scholar
Tikhonov, A.N.: On the stability of inverse problems. Dokl Akad. Nauk SSSR 39(5), 195–198 (1943)
Google Scholar
Golub, G., Loan, C.V.: Matrix Computations, 3rd edn. John Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Rao, C.: Linear Statistical Inference and Its applications. Wiley, New York (1965)
MATH Google Scholar
Freund, Y., Schapire, R.: A short introduction to boosting. J. Japan. Soc. Artif. Intell. 14(5), 771–780 (1999)
Google Scholar
Breiman, L.: Arcing classifiers. Ann. Stat. 26(3), 801–849 (1998)
Article MathSciNet MATH Google Scholar
Mangasarian, O., Musicant, D.: Lagrangian support vector machines. J. Mach. Learn. Res. 1, 161–177 (2001)
MathSciNet MATH Google Scholar
Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://www.ics.uci.edu/~mlearn/MLRepository.html
Lewis, D.: Reuters-21578 text classification test collection (1997). http://www.david-dlewis.com/resources/testcollections/reuters21578/
Chang, C.C., Lin, C.J.: LIBSVM - a library for support vector machines (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm
Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(4), 1871–1874 (2008)
MATH Google Scholar
Joachims, T.: Training linear svms in linear time. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 217–226. ACM (2006)
Google Scholar
Yu, H., Yang, J., Han, J.: Classifying large data sets using svms with hierarchical clusters. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 306–315. ACM (2003)
Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)
Book Google Scholar
Reyzin, L., Schapire, R.: How boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 753–760. ACM (2006)
Google Scholar
Dongarra, J., Pozo, R., Walker, D.: LAPACK++: a design overview of object-oriented extensions for high performance linear algebra. In: Proceedings of Supercomputing, pp. 162–171 (1993)
Google Scholar
McCallum, A.: Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering (1998). http://www-2.cs.cmu.edu/~mccallum/bow

Download references

Author information

Authors and Affiliations

College of Information Technology, Can Tho University, No 1, Ly Tu Trong Street, Can Tho, 92100, Ninh Kieu District, Vietnam
Thanh Nghi Do
Laboratory of Theoretical and Applied Computer Science, University of Lorraine, Ile de Saulcy, 57045, Metz, France
Hoai An Le Thi

Authors

Thanh Nghi Do
View author publications
You can also search for this author in PubMed Google Scholar
Hoai An Le Thi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hoai An Le Thi .

Editor information

Editors and Affiliations

Wroclaw University of Technology, Department of Information Systems, Wroclaw, Poland
Ngoc Thanh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Do, T.N., Le Thi, H.A. (2015). Massive Classification with Support Vector Machines. In: Nguyen, N. (eds) Transactions on Computational Collective Intelligence XVIII. Lecture Notes in Computer Science(), vol 9240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48145-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-662-48145-5_8
Published: 31 July 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48144-8
Online ISBN: 978-3-662-48145-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics