Abstract
Classical methods to obtain classifiers for structured objects (e.g., sequences, images) are based on generative models and adopt a classical generative Bayesian framework. To embrace discriminative approaches (namely, support vector machines), the objects have to be mapped/embedded onto a Hilbert space; one way that has been proposed to carry out such an embedding is via generative models (maybe learned from data). This type of hybrid discriminative/generative approach has been recently shown to outperform classifiers obtained directly from the generative model upon which the embedding is built.
Discriminative approaches based on generative embeddings involve two key components: a generative model used to define the embedding; a discriminative learning algorithms to obtain a (maybe kernel) classifier. The literature on generative embedding is essentially focused on defining the embedding, and some standard off-the-shelf kernel and learning algorithm are usually adopted. Recently, we have proposed a different approach that exploits the probabilistic nature of generative embeddings, by using information-theoretic kernels defined on probability distributions. In this chapter, we review this approach and its building blocks. We illustrate the performance of this approach on two medical applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use the term document to refer to a finite sequence of objects from some finite set, simply because LSA and pLSA have their roots in the field of natural language processing (NLP). Recently, pLSA has been used, not only in NLP, but in other areas, such as computer vision, bioinformatics, and image analysis [10, 12, 28]. In image analysis problems, the idea is to use pLSA to model the occurrence of image features (visual words) [12, 28].
References
Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: In Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 841–848. MIT Press, Cambridge (2002)
Dan Rubinstein, Y., Hastie, T.: Discriminative vs informative learning. In: International Conference on Knowledge Discovery and Data Mining, KDD’1997, pp. 49–53. AAAI Press, Menlo Park (1997)
Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems (NIPS), vol. 11, pp. 487–493. MIT Press, Cambridge (1998)
Lasserre, J., Bishop, C., Minka, T.: Principled hybrids of generative and discriminative models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 87–94 (2006)
Bicego, M., Murino, V., Figueiredo, M.: Similarity-based classification of sequences using hidden Markov models. Pattern Recognit. 37(12), 2281–2291 (2004)
Bosch, A., Zisserman, A., Munoz, X.: Scene classification via pLSA. In: European Conference on Computer Vision (ECCV), pp. 517–530 (2006)
Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: A hybrid generative/discriminative classification framework based on free-energy terms. In: IEEE International Conference on Computer Vision (ICCV), pp. 2058–2065 (2009)
Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: Free energy score space. In: Advances in Neural Information Processing Systems (NIPS), vol. 22, pp. 1428–1436. MIT Press, Cambridge (2009)
Chandalia, G., Beal, M.J.: Using fisher kernels from topic models for dimensionality reduction. In: NIPS Workshop on Novel Applications of Dimensionality Reduction (2006)
Chappelier, J.-C., Eckard, E.: PLSI: The true Fisher kernel and beyond. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 195–210 (2009)
Figueiredo, M., Aguiar, P., Martins, A., Murino, V., Bicego, M.: Information theoretical kernels for generative embeddings based on hidden Markov models. In: Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition—S+SSPR’2010, Izmir, Turkey (2010)
Bicego, M., Perina, A., Murino, V., Martins, A., Aguiar, P., Figueiredo, M.: Combining free energy score spaces with information theoretic kernels: application to scene classification. In: IEEE International Conference on Image Processing—ICIP’2010, Hong Kong (2010)
Bicego, M., Ulaş, A., Schüffler, P., Castellani, U., Mirtuono, P., Murino, V., Martins, A., Aguiar, P., Figueiredo, M.: Renal cancer cell classification using generative embeddings and information theoretic kernels. In: International Conference on Pattern Recognition in Bioinformatics (PRIB) (2011)
Martins, A., Smith, N., Xing, E., Aguiar, P., Figueiredo, M.: Nonextensive information theoretic kernels on measures. J. Mach. Learn. Res. 10, 935–975 (2009)
Cuturi, M., Vert, J.-P.: Semigroup kernels on finite sets. In: Advances in Neural Information Processing Systems (NIPS), pp. 329–336. MIT Press, Cambridge (2005)
Cuturi, M., Fukumizu, K., Vert, J.-P.: Semigroup kernels on measures. J. Mach. Learn. Res. 6, 1169–1198 (2005)
Moreno, P., Ho, P., Vasconcelos, N.: Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2003)
Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.-R.: A new discriminative kernel from probabilistic models. Neural Comput. 14, 2397–2414 (2002)
Smith, N., Gales, M.: Speech recognition using SVMs. In: Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 1197–1204. MIT Press, Cambridge (2002)
Li, X., Lee, T.S., Liu, Y.: Hybrid generative-discriminative classification using posterior divergence. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2713–2720 (2011)
Bicego, M., Pekalska, E., Tax, D.M.J., Duin, R.P.W.: Component-based discriminative classification for hidden Markov models. Pattern Recognit. 42, 2637–2648 (2009)
Krishnapuram, B., Carin, L., Figueiredo, M.A.T., Hartemink, A.J.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005)
Bicego, M., Lovato, P., Oliboni, B., Perina, A.: Expression microarray classification using topic models. In: ACM Symposium on Applied Computing, pp. 1516–1520 (2010)
Castellani, U., Perina, A., Murino, V., Bellani, M., Rambaldelli, G., Tansella, M., Brambilla, P.: Brain morphometry by probabilistic latent semantic analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 177–184 (2010)
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)
Hofmann, T.: Learning the similarity of documents: an information-geometric approach to document retrieval and categorization. In: Advances in Neural Information Processing Systems (NIPS), pp. 914–920. MIT Press, Cambridge (2000)
Smith, N., Gales, M.: Using SVMs to classify variable length speech patterns. Technical Report CUED/F-INFENG/TR–412, Cambridge University Engineering Department (2002)
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1979)
Suyari, H.: Generalization of Shannon–Khinchin axioms to nonextensive systems and the uniqueness theorem for the nonextensive entropy. IEEE Trans. Inf. Theory 50(8) (2004)
Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991)
Tsallis, C.: Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 52, 479–487 (1988)
Burbea, J., Rao, C.: On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 28(3) (1982)
Lin, J.: Divergence measures based on Shannon entropy. IEEE Trans. Inf. Theory 37 (1991)
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Schüffler, P., Fuchs, T., Ong, C.S., Roth, V., Buhmann, J.: Computational TMA analysis and cell nucleus classification of renal cell carcinoma. In: 32nd DAGM Conference on Pattern Recognition, pp. 202–211. Springer, Berlin (2010)
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: 6th ACM International Conference on Image and Video Retrieval (CIVR), pp. 401–408 (2007)
Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cdna microarray data sets. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(2), 143–156 (2005)
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Ulaş, A., Schüffler, P., Bicego, M., Castellani, U., Murino, V.: Hybrid generative-discriminative nucleus classification of renal cell carcinoma. In: Pelillo, M., Hancock, E. (eds.) International Workshop on Similarity-Based Pattern Analysis (SIMBAD). LNCS, vol. 7005, pp. 77–88. Springer, Berlin (2011)
Deegalla, S., Bostrom, H.: Fusion of dimensionality reduction methods: a case study in microarray classification. In: Proc. Int. Conf. on Information Fusion, pp. 460–465 (2009)
German, D., Afsari, B., Choon, T.A., Naiman, D.Q.: Microarray classification from several two-gene expression comparisons. In: Proc. Int. Conf. on Machine Learning and Applications, pp. 583–585 (2008)
Liu, H., Liu, L., Zhang, H.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inform. 43(1), 81–87 (2010)
Wang, L., Zhu, J., Zou, H.: Hybrid Huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3), 412–419 (2008)
Acknowledgements
We acknowledge support from the FET programme (EU FP7), under the SIMBAD project (contract 213250).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag London
About this chapter
Cite this chapter
Aguiar, P.M.Q. et al. (2013). On the Combination of Information-Theoretic Kernels with Generative Embeddings. In: Pelillo, M. (eds) Similarity-Based Pattern Analysis and Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5628-4_4
Download citation
DOI: https://doi.org/10.1007/978-1-4471-5628-4_4
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5627-7
Online ISBN: 978-1-4471-5628-4
eBook Packages: Computer ScienceComputer Science (R0)