Abstract
Support vector machines (SVM) offer a theoretically wellfounded approach to automated learning of pattern classifiers. They have been proven to give highly accurate results in complex classification problems, for example, gene expression analysis. The SVM algorithm is also quite intuitive with a few inputs to vary in the fitting process and several outputs that are interesting to study. For many data mining tasks (e.g., cancer prediction) finding classifiers with good predictive accuracy is important, but understanding the classifier is equally important. By studying the classifier outputs we may be able to produce a simpler classifier, learn which variables are the important discriminators between classes, and find the samples that are problematic to the classification. Visual methods for exploratory data analysis can help us to study the outputs and complement automated classification algorithms in data mining. We present the use of tour-based methods to plot aspects of the SVM classifier. This approach provides insights about the cluster structure in the data, the nature of boundaries between clusters, and problematic outliers. Furthermore, tours can be used to assess the variable importance. We show how visual methods can be used as a complement to crossvalidation methods in order to find good SVM input parameters for a particular data set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ankerst, M.: Report on the sigkdd-2002 panel the perfect data mining tool: Interactive or automated. SIGKDD Explorations 4(2) (2002)
Ankerst, M., Elsen, C., Ester, M., Kriegel, H.-P.: Visual classification: An interactive approach to decision tree construction. In: Proceedings of ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Diego, CA (1999)
Ankerst, M., Jones, D., Kao, A., Wang, C.: Datajewel: Tightly integrating visualization with temporal data mining. In: Proceedings of the ICDM Workshop on Visual Data Mining, Melbourne, FL (2003)
Asimov, D.: The Grand Tour: A Tool for Viewing Multidimensional Data. SIAM Journal of Scientific and Statistical Computing 6(1), 128–143 (1985)
Becker, B., Kohavi, R., Sommerfield, D.: Visualizing the simple bayesian classifier. In: Fayyad, U., Grinstein, G., Wierse, A. (eds.) Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (2001)
Becquet, C., Blachon, S., Jeudy, B., Boulicaut, J., Gandrillon, O.: Strong association rule mining for large gene expression data analysis: a case study on human sage data. Genome Biology 3(12) (2002)
Bi, J., Bennett, K., Embrechts, M., Breneman, C., Song, M.: Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research 3 (2003)
Brown, M., Grundy, W., Lin, D., Christianini, N., Sugnet, C., Furey, T., Ares Jr., M., Haussler, D.: Knowledge based analysis of microarray gene expression data using support vector machines. Technical Report UCSC CRL-99-09, Computing Research Laboratory, USSC, Santa Cruz, CA. (1999)
Buja, A., Cook, D., Asimov, D., Hurley, C.: Computational Methods for High-Dimensional Rotations in Data Visualization. In: Rao, C.R., Wegman, E.J., Solka, J.L. (eds.) Handbook of Statistics: Data Mining and Visualization, Elsevier/North Holland (2005), http://www.elsevier.com
Caragea, D., Cook, D., Honavar, V.: Gaining insights into support vector machine classifiers using projection-based tour methods. In: Proceedings of the Conference on Knowledge Discovery and Data Mining (KDD), San Francisco, CA, USA (2001)
Caragea, D., Cook, D., Honavar, V.: Visual methods for examining support vector machines results, with applications to gene expression data analysis. Technical report, Iowa State University (2005)
Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines (2001), Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Cook, D., Buja, A.: Manual Controls For High-Dimensional Data Projections. Journal of Computational and Graphical Statistics 6(4), 464–480 (1997)
Cook, D., Buja, A., Cabrera, J., Hurley, C.: Grand Tour and Projection Pursuit. Journal of Computational and Graphical Statistics 4(3), 155–172 (1995)
Cook, D., Caragea, D., Honavar, V.: Visualization for classification problems, with examples using support vector machines. In: Proceedings of the COMPSTAT 2004, 16th Symposium of IASC, Prague, Czech Republic (2004)
Cook, D., Lee, E.-K., Buja, A., Wickham, H.: Grand Tours, Projection Pursuit Guided Tours and Manual Controls. In: Handbook of Data Visualization. Springer, New York (2006)
Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A.: e1071: Misc Functions of the Department of Statistics, TU Wien (2006), http://www.r-project.org
Do, T.-N., Poul, F.: Incremental SVM and visualization tools for bio-medical data mining. In: Proceedings of the European Workshop on Data Mining and Text Mining for Bioinformatics (2003)
Dudoit, S., Fridlyand, J., Speed, T.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Society 97(1) (2002)
Fayyad, U., Grinstein, G., Wierse, A.: Information Visualization in Data Mining and Knowledge Discovery. Morgan Kaufmann, San Francisco (2001)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3 (2003)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Hastie, T., Tibshirani, R., Buja, A.: Flexible discriminant analysis by optimal scoring. Journal of the American Statistical Association 89(428), 1255–1270 (1994)
Inselberg, A., Avidan, T.: The automated multidimensional detective. In: Proceedings of Infovis 1999, pp. 112–119 (1999)
Keim, D., Sips, M., Ankerst, M.: Visual data mining. In: Johnson, C., Hansen, C. (eds.) The Visualization Handbook. Academic Press, London (2005)
Lee, E.-K., Cook, D., Klinke, S., Lumley, T.: Projection pursuit for exploratory supervised classification. Technical Report 2004-06, Iowa State University (2004)
Liu, H., Li, J., Wong, L.: A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. Genome Informatics 13 (2002)
Ng, R.T., Sander, J., Sleumer, M.C.: Hierarchical cluster analysis of SAGE data for cancer profiling. In: BIOKDD, pp. 65–72 (2001)
Poulet, F.: Cooperation between automatic algorithms, interactive algorithms and visualization tools for visual data mining. In: Proceedings of VDM@ECML/PKDD 2002, the 2nd Int. Workshop on Visual Data Mining, Helsinki, Finland (2002)
Poulet, F.: Full view: A visual data mining environment. International Journal of Image and Graphics 2(1), 127–143 (2002)
Poulet, F.: Svm and graphical algorithms: a cooperative approach. In: Proceedings of the Fourth IEEE International Conference on Data Mining (ICDM 2004) (2004)
R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2006); ISBN 3-900051-07-0
Rakotomamonjy, A.: Variable selection using svm-based criteria. Journal of Machine Learning Research 3 (2003)
Ripley, B.: Pattern recongnition and neural networks. Cambridge University Press, Cambridge (1996)
Soukup, T., Davidson, I.: Visual Data Mining: Techniques and Tools for Data Visualization and Mining. John Wiley and Sons, Inc., Chichester (2002)
Streeter, M.J., Ward, M.O., Alvarez, S.A.: NVIS: An interactive visualization tool for neural networks. In: Visual Data Exploration and Analysis VII, San Jose, CA, vol. 4302, pp. 234–241 (2001)
Swayne, D.F., Temple Lang, D., Buja, A., Cook, D.: GGobi: Evolving from XGobi into an Extensible Framework for Interactive Data Visualization. Computational Statistics & Data Analysis 43, 423–444 (2003), http://www.ggobi.org
Temple Lang, D., Swayne, D., Wickham, H., Lawrence, M.: rggobi: An Interface between R and GGobi (2006), http://www.r-project.org
Vapnik, V.: The Nature of Statistical Learning Theory (Statistics for Engineering and Information Science). Springer, New York (1999)
Velculescu, V., Zhang, L., Vogelstein, B., Kinzler, K.: Serial analysis of gene expression. Science 270, 484–487 (1995)
Wegman, E.J.: The Grand Tour in k-Dimensions. Technical Report 68, Center for Computational Statistics, George Mason University, (1991)
Wegman, E.J., Carr, D.B.: Statistical Graphics and Visualization. In: Rao, C.R. (ed.) Handbook of Statistics, pp. 857–958. Elsevier Science Publishers, Amsterdam (1993)
Wickham, H.: classifly: Classify and Explore a Data Set (2006), http://www.r-project.org
Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proceedings of the Eighteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., San Francisco (2001)
Zhang, L., Zhou, W., Velculescu, V.E.,, S.E.K., Hruban, R.H., Hamilton, S.R., Vogelstein, B., Kinzler, K.W.: Gene expression profiles in normal and cancer cells. Science 276(5316), 1268–1272 (1997)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Caragea, D., Cook, D., Wickham, H., Honavar, V. (2008). Visual Methods for Examining SVM Classifiers. In: Simoff, S.J., Böhlen, M.H., Mazeika, A. (eds) Visual Data Mining. Lecture Notes in Computer Science, vol 4404. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71080-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-71080-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71079-0
Online ISBN: 978-3-540-71080-6
eBook Packages: Computer ScienceComputer Science (R0)