Skip to main content

Massive Classification with Support Vector Machines

  • Chapter
  • First Online:
Transactions on Computational Collective Intelligence XVIII

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 9240))

Abstract

The new boosting of Least-Squares SVM (LS-SVM), Proximal SVM (PSVM), Newton SVM (NSVM) algorithms aim at classifying very large datasets on standard personal computers (PCs). We extend the PSVM, LS-SVM and NSVM in several ways to efficiently classify large datasets. We developed a row incremental version for datasets with billions of data points. By adding a Tikhonov regularization term and using the Sherman-Morrison-Woodbury formula, we developed new algorihms to process datasets with a small number of data points but very high dimensionality. Finally, by applying boosting including AdaBoost and Arcx4 to these algorithms, we developed classification algorithms for massive, very-high-dimensional datasets. Numerical test results on large datasets from the UCI repository showed that our algorithms are often significantly faster and/or more accurate than state-of-the-art algorithms LibSVM, CB-SVM, SVM-perf and LIBLINEAR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)

    Book  MATH  Google Scholar 

  2. Guyon, I.: Web page on svm applications (1999). http://www.clopinet.com/isabelle/Projects/-SVM/app-list.html

  3. Fayyad, U., Piatetsky-Shapiro, G., Uthurusamy, R.: Summary from the kdd-03 panel - data mining: the next 10 years. SIGKDD Explor. 5(2), 191–196 (2004)

    Article  Google Scholar 

  4. Lyman, P., Varian, H.R., Swearingen, K., Charles, P., Good, N., Jordan, L., Pal, J.: How much information (2003). http://www.sims.berkeley.edu/research/projects/how-much-info-2003/

  5. Boser, B., Guyon, I., Vapnik, V.: An training algorithm for optimal margin classifiers. In: Proceedings of 5th ACM Annual Workshop on Computational Learning Theoryof 5th ACM Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)

    Google Scholar 

  6. Osuna, E., Freund, R., Girosi, F.: An improved training algorithm for support vector machines. In: Principe, J., Gile, L., Morgan, N., Wilson, E., (eds.) Neural Networks for Signal Processing VII, pp. 276–285 (1997)

    Google Scholar 

  7. Platt, J.: Fast training of support vector machines using sequential minimal optimization. In: Schölkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning, pp. 185–208. MIT Press, Cambridge (1999)

    Google Scholar 

  8. Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. Adv. Neural Inf. Process. Syst. 13, 409–415 (2001)

    Google Scholar 

  9. Do, T.N., Poulet, F.: Incremental svm and visualization tools for bio-medical data mining. In: Proceedings of Workshop on Data Mining and Text Mining in Bioinformatics, pp. 14–19 (2003)

    Google Scholar 

  10. Do, T.N., Poulet, F.: Towards high dimensional data mining with boosting of psvm and visualization tools. In: Proceedings of 6th International Conference on Entreprise Information Systems, pp. 36–41 (2004)

    Google Scholar 

  11. Do, T.N., Poulet, F.: Classifying one billion data with a new distributed svm algorithm. In: Proceedings of 4th IEEE International Conference on Computer Science, Research, Innovation and Vision for the Future, pp. 59–66. IEEE Press (2006)

    Google Scholar 

  12. Fung, G., Mangasarian, O.: Incremental support vector machine classification. In: Proceedings of the 2nd SIAM International Conference on Data Mining (2002)

    Google Scholar 

  13. Poulet, F., Do, T.N.: Mining very large datasets with support vector machine algorithms. In: Camp, O., Filipe, J., Hammoudi, S., Piattini, M., et al. (eds.) Enterprise Information Systems V, pp. 177–184. Kluwer Academic Publishers, Dordrecht (2004)

    Google Scholar 

  14. Do, T.N., Le-Thi, H.A.: Classifying large datasets with svm. In: Proceedings of 4th International Conference on Computational Management Science (2007)

    Google Scholar 

  15. Syed, N., Liu, H., Sung, K.: Incremental learning with support vector machines. In: Proceedings of the ACM SIGKDD International Conference on KDD. ACM (1999)

    Google Scholar 

  16. Do, T.N., Poulet, F.: Mining very large datasets with svm and visualization. In: Proceedings of 7th International Conference on Entreprise Information Systems, pp. 127–134 (2005)

    Google Scholar 

  17. Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. In: Proceedings of the 17th International Conference on Machine Learning, pp. 999–1006. ACM (2000)

    Google Scholar 

  18. Suykens, J., Vandewalle, J.: Least squares support vector machines classifiers. Neural Process. Lett. 9(3), 293–300 (1999)

    Article  MathSciNet  Google Scholar 

  19. Fung, G., Mangasarian, O.: Proximal support vector classifiers. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 77–86. ACM (2001)

    Google Scholar 

  20. Mangasarian, O.: A finite newton method for classification problems. Technical report 01–11, Data Mining Institute, Computer Sciences Department, University of Wisconsin (2001)

    Google Scholar 

  21. Tikhonov, A.N.: On the stability of inverse problems. Dokl Akad. Nauk SSSR 39(5), 195–198 (1943)

    Google Scholar 

  22. Golub, G., Loan, C.V.: Matrix Computations, 3rd edn. John Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  23. Rao, C.: Linear Statistical Inference and Its applications. Wiley, New York (1965)

    MATH  Google Scholar 

  24. Freund, Y., Schapire, R.: A short introduction to boosting. J. Japan. Soc. Artif. Intell. 14(5), 771–780 (1999)

    Google Scholar 

  25. Breiman, L.: Arcing classifiers. Ann. Stat. 26(3), 801–849 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  26. Mangasarian, O., Musicant, D.: Lagrangian support vector machines. J. Mach. Learn. Res. 1, 161–177 (2001)

    MathSciNet  MATH  Google Scholar 

  27. Frank, A., Asuncion, A.: UCI machine learning repository (2010). http://www.ics.uci.edu/~mlearn/MLRepository.html

  28. Lewis, D.: Reuters-21578 text classification test collection (1997). http://www.david-dlewis.com/resources/testcollections/reuters21578/

  29. Chang, C.C., Lin, C.J.: LIBSVM - a library for support vector machines (2001). http://www.csie.ntu.edu.tw/~cjlin/libsvm

  30. Fan, R., Chang, K., Hsieh, C., Wang, X., Lin, C.: LIBLINEAR: a library for large linear classification. J. Mach. Learn. Res. 9(4), 1871–1874 (2008)

    MATH  Google Scholar 

  31. Joachims, T.: Training linear svms in linear time. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 217–226. ACM (2006)

    Google Scholar 

  32. Yu, H., Yang, J., Han, J.: Classifying large data sets using svms with hierarchical clusters. In: Proceedings of the ACM SIGKDD International Conference on KDD, pp. 306–315. ACM (2003)

    Google Scholar 

  33. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, Cambridge (2000)

    Book  Google Scholar 

  34. Reyzin, L., Schapire, R.: How boosting the margin can also boost classifier complexity. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 753–760. ACM (2006)

    Google Scholar 

  35. Dongarra, J., Pozo, R., Walker, D.: LAPACK++: a design overview of object-oriented extensions for high performance linear algebra. In: Proceedings of Supercomputing, pp. 162–171 (1993)

    Google Scholar 

  36. McCallum, A.: Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering (1998). http://www-2.cs.cmu.edu/~mccallum/bow

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hoai An Le Thi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Do, T.N., Le Thi, H.A. (2015). Massive Classification with Support Vector Machines. In: Nguyen, N. (eds) Transactions on Computational Collective Intelligence XVIII. Lecture Notes in Computer Science(), vol 9240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48145-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48145-5_8

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48144-8

  • Online ISBN: 978-3-662-48145-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics