Skip to main content

Visual Methods for Examining SVM Classifiers

  • Chapter
Visual Data Mining

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4404))

Abstract

Support vector machines (SVM) offer a theoretically wellfounded approach to automated learning of pattern classifiers. They have been proven to give highly accurate results in complex classification problems, for example, gene expression analysis. The SVM algorithm is also quite intuitive with a few inputs to vary in the fitting process and several outputs that are interesting to study. For many data mining tasks (e.g., cancer prediction) finding classifiers with good predictive accuracy is important, but understanding the classifier is equally important. By studying the classifier outputs we may be able to produce a simpler classifier, learn which variables are the important discriminators between classes, and find the samples that are problematic to the classification. Visual methods for exploratory data analysis can help us to study the outputs and complement automated classification algorithms in data mining. We present the use of tour-based methods to plot aspects of the SVM classifier. This approach provides insights about the cluster structure in the data, the nature of boundaries between clusters, and problematic outliers. Furthermore, tours can be used to assess the variable importance. We show how visual methods can be used as a complement to crossvalidation methods in order to find good SVM input parameters for a particular data set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ankerst, M.: Report on the sigkdd-2002 panel the perfect data mining tool: Interactive or automated. SIGKDD Explorations 4(2) (2002)

    Google Scholar 

  2. Ankerst, M., Elsen, C., Ester, M., Kriegel, H.-P.: Visual classification: An interactive approach to decision tree construction. In: Proceedings of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Diego, CA (1999)

    Google Scholar 

  3. Ankerst, M., Jones, D., Kao, A., Wang, C.: Datajewel: Tightly integrating visualization with temporal data mining. In: Proceedings of the ICDM Workshop on Visual Data Mining, Melbourne, FL (2003)

    Google Scholar 

  4. Asimov, D.: The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing 6(1), 128–143 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  5. Becker, B., Kohavi, R., Sommerfield, D.: Visualizing the simple bayesian classifier. In: Fayyad, U., Grinstein, G., Wierse, A. (eds.) Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  6. Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J., Gandrillon, O.: Strong association rule mining for large gene expression data analysis: a case study on human sage data. Genome Biology 3(12) (2002)

    Google Scholar 

  7. Bi, J., Bennett, K., Embrechts, M., Breneman, C., Song, M.: Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research 3 (2003)

    Google Scholar 

  8. Brown, M., Grundy, W., Lin, D., Christianini, N., Sugnet, C., Furey, T., Ares Jr., M., Haussler, D.: Knowledge based analysis of microarray gene expression data using support vector machines. Technical Report UCSC CRL-99-09, Computing Research Laboratory, USSC, Santa Cruz, CA. (1999)

    Google Scholar 

  9. Buja, A., Cook, D., Asimov, D., Hurley, C.: Computational Methods for High-Dimensional Rotations in Data Visualization. In: Rao, C.R., Wegman, E.J., Solka, J.L. (eds.) Handbook of Statistics: Data Mining and Visualization, Elsevier/North Holland (2005), http://www.elsevier.com

  10. Caragea, D., Cook, D., Honavar, V.: Gaining insights into support vector machine classifiers using projection-based tour methods. In: Proceedings of the Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA (2001)

    Google Scholar 

  11. Caragea, D., Cook, D., Honavar, V.: Visual methods for examining support vector machines results, with applications to gene expression data analysis. Technical report, Iowa State University (2005)

    Google Scholar 

  12. Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  13. Cook, D., Buja, A.: Manual Controls For High-Dimensional Data Projections. Journal of Computational and Graphical Statistics 6(4), 464–480 (1997)

    Article  Google Scholar 

  14. Cook, D., Buja, A., Cabrera, J., Hurley, C.: Grand Tour and Projection Pursuit. Journal of Computational and Graphical Statistics 4(3), 155–172 (1995)

    Article  Google Scholar 

  15. Cook, D., Caragea, D., Honavar, V.: Visualization for classification problems, with examples using support vector machines. In: Proceedings of the COMPSTAT 2004, 16th Symposium of IASC, Prague, Czech Republic (2004)

    Google Scholar 

  16. Cook, D., Lee, E.-K., Buja, A., Wickham, H.: Grand Tours, Projection Pursuit Guided Tours and Manual Controls. In: Handbook of Data Visualization. Springer, New York (2006)

    Google Scholar 

  17. Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Misc Functions of the Department of Statistics, TU Wien (2006), http://www.r-project.org

  18. Do, T.-N., Poul, F.: Incremental SVM and visualization tools for bio-medical data mining. In: Proceedings of the European Workshop on Data Mining and Text Mining for Bioinformatics (2003)

    Google Scholar 

  19. Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Society 97(1) (2002)

    Google Scholar 

  20. Fayyad, U., Grinstein, G., Wierse, A.: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  21. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3 (2003)

    Google Scholar 

  22. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)

    Article  MATH  Google Scholar 

  23. Hastie, T., Tibshirani, R., Buja, A.: Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association 89(428), 1255–1270 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  24. Inselberg, A., Avidan, T.: The automated multidimensional detective. In: Proceedings of Infovis 1999, pp. 112–119 (1999)

    Google Scholar 

  25. Keim, D., Sips, M., Ankerst, M.: Visual data mining. In: Johnson, C., Hansen, C. (eds.) The Visualization Handbook. Academic Press, London (2005)

    Google Scholar 

  26. Lee, E.-K., Cook, D., Klinke, S., Lumley, T.: Projection pursuit for exploratory supervised classification. Technical Report 2004-06, Iowa State University (2004)

    Google Scholar 

  27. Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13 (2002)

    Google Scholar 

  28. Ng, R.T., Sander, J., Sleumer, M.C.: Hierarchical cluster analysis of SAGE data for cancer profiling. In: BIOKDD, pp. 65–72 (2001)

    Google Scholar 

  29. Poulet, F.: Cooperation between automatic algorithms, interactive algorithms and visualization tools for visual data mining. In: Proceedings of VDM@ECML/PKDD 2002, the 2nd Int. Workshop on Visual Data Mining, Helsinki, Finland (2002)

    Google Scholar 

  30. Poulet, F.: Full view: A visual data mining environment. International Journal of Image and Graphics 2(1), 127–143 (2002)

    Article  Google Scholar 

  31. Poulet, F.: Svm and graphical algorithms: a cooperative approach. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM 2004) (2004)

    Google Scholar 

  32. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2006); ISBN 3-900051-07-0

    Google Scholar 

  33. Rakotomamonjy, A.: Variable selection using svm-based criteria. Journal of Machine Learning Research 3 (2003)

    Google Scholar 

  34. Ripley, B.: Pattern recongnition and neural networks. Cambridge University Press, Cambridge (1996)

    Google Scholar 

  35. Soukup, T., Davidson, I.: Visual Data Mining: Techniques and Tools for Data Visualization and Mining. John Wiley and Sons, Inc., Chichester (2002)

    Google Scholar 

  36. Streeter, M.J., Ward, M.O., Alvarez, S.A.: NVIS: An interactive visualization tool for neural networks. In: Visual Data Exploration and Analysis VII, San Jose, CA, vol. 4302, pp. 234–241 (2001)

    Google Scholar 

  37. Swayne, D.F., Temple Lang, D., Buja, A., Cook, D.: GGobi: Evolving from XGobi into an Extensible Framework for Interactive Data Visualization. Computational Statistics & Data Analysis 43, 423–444 (2003), http://www.ggobi.org

    Article  MathSciNet  Google Scholar 

  38. Temple Lang, D., Swayne, D., Wickham, H., Lawrence, M.: rggobi: An Interface between R and GGobi (2006), http://www.r-project.org

  39. Vapnik, V.: The Nature of Statistical Learning Theory (Statistics for Engineering and Information Science). Springer, New York (1999)

    Google Scholar 

  40. Velculescu, V., Zhang, L., Vogelstein, B., Kinzler, K.: Serial analysis of gene expression. Science 270, 484–487 (1995)

    Article  Google Scholar 

  41. Wegman, E.J.: The Grand Tour in k-Dimensions. Technical Report 68, Center for Computational Statistics, George Mason University, (1991)

    Google Scholar 

  42. Wegman, E.J., Carr, D.B.: Statistical Graphics and Visualization. In: Rao, C.R. (ed.) Handbook of Statistics, pp. 857–958. Elsevier Science Publishers, Amsterdam (1993)

    Google Scholar 

  43. Wickham, H.: classifly: Classify and Explore a Data Set (2006), http://www.r-project.org

  44. Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco (2001)

    Google Scholar 

  45. Zhang, L., Zhou, W., Velculescu, V.E.,, S.E.K., Hruban, R.H., Hamilton, S.R., Vogelstein, B., Kinzler, K.W.: Gene expression profiles in normal and cancer cells. Science 276(5316), 1268–1272 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Simeon J. Simoff Michael H. Böhlen Arturas Mazeika

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Caragea, D., Cook, D., Wickham, H., Honavar, V. (2008). Visual Methods for Examining SVM Classifiers. In: Simoff, S.J., Böhlen, M.H., Mazeika, A. (eds) Visual Data Mining. Lecture Notes in Computer Science, vol 4404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71080-6_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71080-6_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71079-0

  • Online ISBN: 978-3-540-71080-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics