Skip to main content

On the Combination of Information-Theoretic Kernels with Generative Embeddings

  • Chapter
Similarity-Based Pattern Analysis and Recognition

Abstract

Classical methods to obtain classifiers for structured objects (e.g., sequences, images) are based on generative models and adopt a classical generative Bayesian framework. To embrace discriminative approaches (namely, support vector machines), the objects have to be mapped/embedded onto a Hilbert space; one way that has been proposed to carry out such an embedding is via generative models (maybe learned from data). This type of hybrid discriminative/generative approach has been recently shown to outperform classifiers obtained directly from the generative model upon which the embedding is built.

Discriminative approaches based on generative embeddings involve two key components: a generative model used to define the embedding; a discriminative learning algorithms to obtain a (maybe kernel) classifier. The literature on generative embedding is essentially focused on defining the embedding, and some standard off-the-shelf kernel and learning algorithm are usually adopted. Recently, we have proposed a different approach that exploits the probabilistic nature of generative embeddings, by using information-theoretic kernels defined on probability distributions. In this chapter, we review this approach and its building blocks. We illustrate the performance of this approach on two medical applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use the term document to refer to a finite sequence of objects from some finite set, simply because LSA and pLSA have their roots in the field of natural language processing (NLP). Recently, pLSA has been used, not only in NLP, but in other areas, such as computer vision, bioinformatics, and image analysis [10, 12, 28]. In image analysis problems, the idea is to use pLSA to model the occurrence of image features (visual words) [12, 28].

References

  1. Ng, A.Y., Jordan, M.I.: On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. In: In Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 841–848. MIT Press, Cambridge (2002)

    Google Scholar 

  2. Dan Rubinstein, Y., Hastie, T.: Discriminative vs informative learning. In: International Conference on Knowledge Discovery and Data Mining, KDD’1997, pp. 49–53. AAAI Press, Menlo Park (1997)

    Google Scholar 

  3. Ripley, B.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)

    MATH  Google Scholar 

  4. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Berlin (1995)

    Book  MATH  Google Scholar 

  5. Schölkopf, B., Smola, A.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  6. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  7. Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. In: Advances in Neural Information Processing Systems (NIPS), vol. 11, pp. 487–493. MIT Press, Cambridge (1998)

    Google Scholar 

  8. Lasserre, J., Bishop, C., Minka, T.: Principled hybrids of generative and discriminative models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 87–94 (2006)

    Google Scholar 

  9. Bicego, M., Murino, V., Figueiredo, M.: Similarity-based classification of sequences using hidden Markov models. Pattern Recognit. 37(12), 2281–2291 (2004)

    Google Scholar 

  10. Bosch, A., Zisserman, A., Munoz, X.: Scene classification via pLSA. In: European Conference on Computer Vision (ECCV), pp. 517–530 (2006)

    Google Scholar 

  11. Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: A hybrid generative/discriminative classification framework based on free-energy terms. In: IEEE International Conference on Computer Vision (ICCV), pp. 2058–2065 (2009)

    Google Scholar 

  12. Perina, A., Cristani, M., Castellani, U., Murino, V., Jojic, N.: Free energy score space. In: Advances in Neural Information Processing Systems (NIPS), vol. 22, pp. 1428–1436. MIT Press, Cambridge (2009)

    Google Scholar 

  13. Chandalia, G., Beal, M.J.: Using fisher kernels from topic models for dimensionality reduction. In: NIPS Workshop on Novel Applications of Dimensionality Reduction (2006)

    Google Scholar 

  14. Chappelier, J.-C., Eckard, E.: PLSI: The true Fisher kernel and beyond. In: European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), pp. 195–210 (2009)

    Chapter  Google Scholar 

  15. Figueiredo, M., Aguiar, P., Martins, A., Murino, V., Bicego, M.: Information theoretical kernels for generative embeddings based on hidden Markov models. In: Joint IAPR International Workshops on Structural, Syntactic, and Statistical Pattern Recognition—S+SSPR’2010, Izmir, Turkey (2010)

    Google Scholar 

  16. Bicego, M., Perina, A., Murino, V., Martins, A., Aguiar, P., Figueiredo, M.: Combining free energy score spaces with information theoretic kernels: application to scene classification. In: IEEE International Conference on Image Processing—ICIP’2010, Hong Kong (2010)

    Google Scholar 

  17. Bicego, M., Ulaş, A., Schüffler, P., Castellani, U., Mirtuono, P., Murino, V., Martins, A., Aguiar, P., Figueiredo, M.: Renal cancer cell classification using generative embeddings and information theoretic kernels. In: International Conference on Pattern Recognition in Bioinformatics (PRIB) (2011)

    Google Scholar 

  18. Martins, A., Smith, N., Xing, E., Aguiar, P., Figueiredo, M.: Nonextensive information theoretic kernels on measures. J. Mach. Learn. Res. 10, 935–975 (2009)

    MathSciNet  MATH  Google Scholar 

  19. Cuturi, M., Vert, J.-P.: Semigroup kernels on finite sets. In: Advances in Neural Information Processing Systems (NIPS), pp. 329–336. MIT Press, Cambridge (2005)

    Google Scholar 

  20. Cuturi, M., Fukumizu, K., Vert, J.-P.: Semigroup kernels on measures. J. Mach. Learn. Res. 6, 1169–1198 (2005)

    MathSciNet  MATH  Google Scholar 

  21. Moreno, P., Ho, P., Vasconcelos, N.: Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In: Advances in Neural Information Processing Systems (NIPS). MIT Press, Cambridge (2003)

    Google Scholar 

  22. Tsuda, K., Kawanabe, M., Rätsch, G., Sonnenburg, S., Müller, K.-R.: A new discriminative kernel from probabilistic models. Neural Comput. 14, 2397–2414 (2002)

    Article  MATH  Google Scholar 

  23. Smith, N., Gales, M.: Speech recognition using SVMs. In: Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 1197–1204. MIT Press, Cambridge (2002)

    Google Scholar 

  24. Li, X., Lee, T.S., Liu, Y.: Hybrid generative-discriminative classification using posterior divergence. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2713–2720 (2011)

    Google Scholar 

  25. Bicego, M., Pekalska, E., Tax, D.M.J., Duin, R.P.W.: Component-based discriminative classification for hidden Markov models. Pattern Recognit. 42, 2637–2648 (2009)

    Article  MATH  Google Scholar 

  26. Krishnapuram, B., Carin, L., Figueiredo, M.A.T., Hartemink, A.J.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Trans. Pattern Anal. Mach. Intell. 27, 957–968 (2005)

    Article  Google Scholar 

  27. Bicego, M., Lovato, P., Oliboni, B., Perina, A.: Expression microarray classification using topic models. In: ACM Symposium on Applied Computing, pp. 1516–1520 (2010)

    Google Scholar 

  28. Castellani, U., Perina, A., Murino, V., Bellani, M., Rambaldelli, G., Tansella, M., Brambilla, P.: Brain morphometry by probabilistic latent semantic analysis. In: International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), pp. 177–184 (2010)

    Google Scholar 

  29. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)

    Article  MATH  Google Scholar 

  30. Hofmann, T.: Learning the similarity of documents: an information-geometric approach to document retrieval and categorization. In: Advances in Neural Information Processing Systems (NIPS), pp. 914–920. MIT Press, Cambridge (2000)

    Google Scholar 

  31. Smith, N., Gales, M.: Using SVMs to classify variable length speech patterns. Technical Report CUED/F-INFENG/TR–412, Cambridge University Engineering Department (2002)

    Google Scholar 

  32. Schwarz, G.: Estimating the dimension of a model. Ann. Stat. 6, 461–464 (1979)

    Article  Google Scholar 

  33. Suyari, H.: Generalization of Shannon–Khinchin axioms to nonextensive systems and the uniqueness theorem for the nonextensive entropy. IEEE Trans. Inf. Theory 50(8) (2004)

    Google Scholar 

  34. Cover, T., Thomas, J.: Elements of Information Theory. Wiley, New York (1991)

    Book  MATH  Google Scholar 

  35. Tsallis, C.: Possible generalization of Boltzmann–Gibbs statistics. J. Stat. Phys. 52, 479–487 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  36. Burbea, J., Rao, C.: On the convexity of some divergence measures based on entropy functions. IEEE Trans. Inf. Theory 28(3) (1982)

    Google Scholar 

  37. Lin, J.: Divergence measures based on Shannon entropy. IEEE Trans. Inf. Theory 37 (1991)

    Google Scholar 

  38. Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge (2004)

    Book  Google Scholar 

  39. Schüffler, P., Fuchs, T., Ong, C.S., Roth, V., Buhmann, J.: Computational TMA analysis and cell nucleus classification of renal cell carcinoma. In: 32nd DAGM Conference on Pattern Recognition, pp. 202–211. Springer, Berlin (2010)

    Google Scholar 

  40. Bosch, A., Zisserman, A., Munoz, X.: Representing shape with a spatial pyramid kernel. In: 6th ACM International Conference on Image and Video Retrieval (CIVR), pp. 401–408 (2007)

    Chapter  Google Scholar 

  41. Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cdna microarray data sets. IEEE/ACM Trans. Comput. Biol. Bioinform. 2(2), 143–156 (2005)

    Article  Google Scholar 

  42. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. 96(12), 6745–6750 (1999)

    Article  Google Scholar 

  43. Ulaş, A., Schüffler, P., Bicego, M., Castellani, U., Murino, V.: Hybrid generative-discriminative nucleus classification of renal cell carcinoma. In: Pelillo, M., Hancock, E. (eds.) International Workshop on Similarity-Based Pattern Analysis (SIMBAD). LNCS, vol. 7005, pp. 77–88. Springer, Berlin (2011)

    Google Scholar 

  44. Deegalla, S., Bostrom, H.: Fusion of dimensionality reduction methods: a case study in microarray classification. In: Proc. Int. Conf. on Information Fusion, pp. 460–465 (2009)

    Google Scholar 

  45. German, D., Afsari, B., Choon, T.A., Naiman, D.Q.: Microarray classification from several two-gene expression comparisons. In: Proc. Int. Conf. on Machine Learning and Applications, pp. 583–585 (2008)

    Google Scholar 

  46. Liu, H., Liu, L., Zhang, H.: Ensemble gene selection by grouping for microarray data classification. J. Biomed. Inform. 43(1), 81–87 (2010)

    Article  Google Scholar 

  47. Wang, L., Zhu, J., Zou, H.: Hybrid Huberized support vector machines for microarray classification and gene selection. Bioinformatics 24(3), 412–419 (2008)

    Article  Google Scholar 

Download references

Acknowledgements

We acknowledge support from the FET programme (EU FP7), under the SIMBAD project (contract 213250).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mário A. T. Figueiredo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Aguiar, P.M.Q. et al. (2013). On the Combination of Information-Theoretic Kernels with Generative Embeddings. In: Pelillo, M. (eds) Similarity-Based Pattern Analysis and Recognition. Advances in Computer Vision and Pattern Recognition. Springer, London. https://doi.org/10.1007/978-1-4471-5628-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5628-4_4

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5627-7

  • Online ISBN: 978-1-4471-5628-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics