On the Combination of Information-Theoretic Kernels with Generative Embeddings

Aguiar, Pedro M. Q.; Bicego, Manuele; Castellani, Umberto; Figueiredo, Mário A. T.; Martins, André T.; Murino, Vittorio; Perina, Alessandro; Ulaş, Aydın

doi:10.1007/978-1-4471-5628-4_4

Pedro M. Q. Aguiar⁴,
Manuele Bicego⁵,
Umberto Castellani⁵,
Mário A. T. Figueiredo⁶,
André T. Martins⁶,
Vittorio Murino⁵,
Alessandro Perina⁷ &
…
Aydın Ulaş⁵

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1642 Accesses

Abstract

Classical methods to obtain classifiers for structured objects (e.g., sequences, images) are based on generative models and adopt a classical generative Bayesian framework. To embrace discriminative approaches (namely, support vector machines), the objects have to be mapped/embedded onto a Hilbert space; one way that has been proposed to carry out such an embedding is via generative models (maybe learned from data). This type of hybrid discriminative/generative approach has been recently shown to outperform classifiers obtained directly from the generative model upon which the embedding is built.

Discriminative approaches based on generative embeddings involve two key components: a generative model used to define the embedding; a discriminative learning algorithms to obtain a (maybe kernel) classifier. The literature on generative embedding is essentially focused on defining the embedding, and some standard off-the-shelf kernel and learning algorithm are usually adopted. Recently, we have proposed a different approach that exploits the probabilistic nature of generative embeddings, by using information-theoretic kernels defined on probability distributions. In this chapter, we review this approach and its building blocks. We illustrate the performance of this approach on two medical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use the term document to refer to a finite sequence of objects from some finite set, simply because LSA and pLSA have their roots in the field of natural language processing (NLP). Recently, pLSA has been used, not only in NLP, but in other areas, such as computer vision, bioinformatics, and image analysis [10, 12, 28]. In image analysis problems, the idea is to use pLSA to model the occurrence of image features (visual words) [12, 28].

References

Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: In Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 841–848. MIT Press, Cambridge (2002)
Google Scholar
Dan Rubinstein, Y., Hastie, T.: Discriminative vs informative learning. In: International Conference on Knowledge Discovery and Data Mining, KDD’1997, pp. 49–53. AAAI Press, Menlo Park (1997)
Google Scholar
Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)
MATH Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)
Book MATH Google Scholar
Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems (NIPS), vol. 11, pp. 487–493. MIT Press, Cambridge (1998)
Google Scholar
Lasserre, J., Bishop, C., Minka, T.: Principled hybrids of generative and discriminative models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 87–94 (2006)
Google Scholar
Bicego, M., Murino, V., Figueiredo, M.: Similarity-based classification of sequences using hidden Markov models. Pattern Recognit. 37(12), 2281–2291 (2004)
Google Scholar
Bosch, A., Zisserman, A., Munoz, X.: Scene classification via pLSA. In: European Conference on Computer Vision (ECCV), pp. 517–530 (2006)
Google Scholar
Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: A hybrid generative/discriminative classification framework based on free-energy terms. In: IEEE International Conference on Computer Vision (ICCV), pp. 2058–2065 (2009)
Google Scholar
Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: Free energy score space. In: Advances in Neural Information Processing Systems (NIPS), vol. 22, pp. 1428–1436. MIT Press, Cambridge (2009)
Google Scholar
Chandalia, G., Beal, M.J.: Using fisher kernels from topic models for dimensionality reduction. In: NIPS Workshop on Novel Applications of Dimensionality Reduction (2006)
Google Scholar
Chappelier, J.-C., Eckard, E.: PLSI: The true Fisher kernel and beyond. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 195–210 (2009)
Chapter Google Scholar
Figueiredo, M., Aguiar, P., Martins, A., Murino, V., Bicego, M.: Information theoretical kernels for generative embeddings based on hidden Markov models. In: Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition—S+SSPR’2010, Izmir, Turkey (2010)
Google Scholar
Bicego, M., Perina, A., Murino, V., Martins, A., Aguiar, P., Figueiredo, M.: Combining free energy score spaces with information theoretic kernels: application to scene classification. In: IEEE International Conference on Image Processing—ICIP’2010, Hong Kong (2010)
Google Scholar
Bicego, M., Ulaş, A., Schüffler, P., Castellani, U., Mirtuono, P., Murino, V., Martins, A., Aguiar, P., Figueiredo, M.: Renal cancer cell classification using generative embeddings and information theoretic kernels. In: International Conference on Pattern Recognition in Bioinformatics (PRIB) (2011)
Google Scholar
Martins, A., Smith, N., Xing, E., Aguiar, P., Figueiredo, M.: Nonextensive information theoretic kernels on measures. J. Mach. Learn. Res. 10, 935–975 (2009)
MathSciNet MATH Google Scholar
Cuturi, M., Vert, J.-P.: Semigroup kernels on finite sets. In: Advances in Neural Information Processing Systems (NIPS), pp. 329–336. MIT Press, Cambridge (2005)
Google Scholar
Cuturi, M., Fukumizu, K., Vert, J.-P.: Semigroup kernels on measures. J. Mach. Learn. Res. 6, 1169–1198 (2005)
MathSciNet MATH Google Scholar
Moreno, P., Ho, P., Vasconcelos, N.: Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2003)
Google Scholar
Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.-R.: A new discriminative kernel from probabilistic models. Neural Comput. 14, 2397–2414 (2002)
Article MATH Google Scholar
Smith, N., Gales, M.: Speech recognition using SVMs. In: Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 1197–1204. MIT Press, Cambridge (2002)
Google Scholar
Li, X., Lee, T.S., Liu, Y.: Hybrid generative-discriminative classification using posterior divergence. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2713–2720 (2011)
Google Scholar
Bicego, M., Pekalska, E., Tax, D.M.J., Duin, R.P.W.: Component-based discriminative classification for hidden Markov models. Pattern Recognit. 42, 2637–2648 (2009)
Article MATH Google Scholar
Krishnapuram, B., Carin, L., Figueiredo, M.A.T., Hartemink, A.J.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005)
Article Google Scholar
Bicego, M., Lovato, P., Oliboni, B., Perina, A.: Expression microarray classification using topic models. In: ACM Symposium on Applied Computing, pp. 1516–1520 (2010)
Google Scholar
Castellani, U., Perina, A., Murino, V., Bellani, M., Rambaldelli, G., Tansella, M., Brambilla, P.: Brain morphometry by probabilistic latent semantic analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 177–184 (2010)
Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)
Article MATH Google Scholar
Hofmann, T.: Learning the similarity of documents: an information-geometric approach to document retrieval and categorization. In: Advances in Neural Information Processing Systems (NIPS), pp. 914–920. MIT Press, Cambridge (2000)
Google Scholar
Smith, N., Gales, M.: Using SVMs to classify variable length speech patterns. Technical Report CUED/F-INFENG/TR–412, Cambridge University Engineering Department (2002)
Google Scholar
Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1979)
Article Google Scholar
Suyari, H.: Generalization of Shannon–Khinchin axioms to nonextensive systems and the uniqueness theorem for the nonextensive entropy. IEEE Trans. Inf. Theory 50(8) (2004)
Google Scholar
Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991)
Book MATH Google Scholar
Tsallis, C.: Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 52, 479–487 (1988)
Article MathSciNet MATH Google Scholar
Burbea, J., Rao, C.: On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 28(3) (1982)
Google Scholar
Lin, J.: Divergence measures based on Shannon entropy. IEEE Trans. Inf. Theory 37 (1991)
Google Scholar
Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)
Book Google Scholar
Schüffler, P., Fuchs, T., Ong, C.S., Roth, V., Buhmann, J.: Computational TMA analysis and cell nucleus classification of renal cell carcinoma. In: 32nd DAGM Conference on Pattern Recognition, pp. 202–211. Springer, Berlin (2010)
Google Scholar
Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: 6th ACM International Conference on Image and Video Retrieval (CIVR), pp. 401–408 (2007)
Chapter Google Scholar
Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cdna microarray data sets. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(2), 143–156 (2005)
Article Google Scholar
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)
Article Google Scholar
Ulaş, A., Schüffler, P., Bicego, M., Castellani, U., Murino, V.: Hybrid generative-discriminative nucleus classification of renal cell carcinoma. In: Pelillo, M., Hancock, E. (eds.) International Workshop on Similarity-Based Pattern Analysis (SIMBAD). LNCS, vol. 7005, pp. 77–88. Springer, Berlin (2011)
Google Scholar
Deegalla, S., Bostrom, H.: Fusion of dimensionality reduction methods: a case study in microarray classification. In: Proc. Int. Conf. on Information Fusion, pp. 460–465 (2009)
Google Scholar
German, D., Afsari, B., Choon, T.A., Naiman, D.Q.: Microarray classification from several two-gene expression comparisons. In: Proc. Int. Conf. on Machine Learning and Applications, pp. 583–585 (2008)
Google Scholar
Liu, H., Liu, L., Zhang, H.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inform. 43(1), 81–87 (2010)
Article Google Scholar
Wang, L., Zhu, J., Zou, H.: Hybrid Huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3), 412–419 (2008)
Article Google Scholar

Download references

Acknowledgements

We acknowledge support from the FET programme (EU FP7), under the SIMBAD project (contract 213250).

Author information

Authors and Affiliations

Instituto de Sistemas e Robótica, Instituto Superior Técnico, Lisboa, Portugal
Pedro M. Q. Aguiar
Dipartimento di Informatica, University of Verona, Verona, Italy
Manuele Bicego, Umberto Castellani, Vittorio Murino & Aydın Ulaş
Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa, Portugal
Mário A. T. Figueiredo & André T. Martins
Microsoft Research, Redmond, WA, USA
Alessandro Perina

Authors

Pedro M. Q. Aguiar
View author publications
You can also search for this author in PubMed Google Scholar
Manuele Bicego
View author publications
You can also search for this author in PubMed Google Scholar
Umberto Castellani
View author publications
You can also search for this author in PubMed Google Scholar
Mário A. T. Figueiredo
View author publications
You can also search for this author in PubMed Google Scholar
André T. Martins
View author publications
You can also search for this author in PubMed Google Scholar
Vittorio Murino
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Perina
View author publications
You can also search for this author in PubMed Google Scholar
Aydın Ulaş
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mário A. T. Figueiredo .

Editor information

Editors and Affiliations

DAIS, Ca' Foscari University of Venice, Venezia Mestre, Italy
Marcello Pelillo

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aguiar, P.M.Q. et al. (2013). On the Combination of Information-Theoretic Kernels with Generative Embeddings. In: Pelillo, M. (eds) Similarity-Based Pattern Analysis and Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5628-4_4

Download citation

DOI: https://doi.org/10.1007/978-1-4471-5628-4_4
Publisher Name: Springer, London
Print ISBN: 978-1-4471-5627-7
Online ISBN: 978-1-4471-5628-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics