Skip to main content

Abstract

Curve clustering is an important fundamental problem in biomedical applications involving clustering protein sequences or cell shapes in microscopy images. Existing model-based clustering techniques rely on simple probability models that are not generally valid for analyzing shapes of curves. In this chapter, we talk about an efficient Bayesian method to cluster curve data using a carefully chosen metric on the shape space. Rather than modeling the infinite-dimensional curves, we focus on modeling a summary statistic which is the inner product matrix obtained from the data. The inner-product matrix is modeled using a Wishart with parameters with carefully chosen hyperparameters which induce clustering and allow for automatic inference on the number of clusters. Posterior is sampled through an efficient Markov chain Monte Carlo procedure based on the Chinese restaurant process. This method is demonstrated on a variety of synthetic data and real data examples on protein structure analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Since only one observation changes the cluster index, one can explicitly calculate the difference between the old values of (3.5) and (3.6) and new values in O(1) steps.

References

  • Adametz, D. and Roth, V. (2011). Bayesian partitioning of large-scale distance data. In Neural Information Processing Systems (NIPS), pages 1368–1376.

    Google Scholar 

  • Auder, B. and Fischer, A. (2012). Projection-based curve clustering. Journal of Statistical Computation and Simulation, 82(8), 1145–1168.

    Article  MathSciNet  MATH  Google Scholar 

  • Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, pages 803–821.

    Google Scholar 

  • Belongie, S., Malik, J., and Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.

    Article  Google Scholar 

  • Bicego, M. and Murino, V. (2004). Investigating hidden Markov models’ capabilities in 2D shape classification. IEEE Trans. Pattern Anal. Mach. Intell, 26, 281–286.

    Article  Google Scholar 

  • Bicego, M. and Murino, V. (2007). Hidden Markov model-based weighted likelihood discriminant for 2D shape classification. IEEE Trans. Pattern Anal. Mach. Intell, 16, 2707–2719.

    Google Scholar 

  • Bicego, M., Murino, V., and Figueiredo, M. A. (2004). Similarity-based classification of sequences using hidden Markov models. Pattern Recognition, 37(12), 2281–2291.

    Article  Google Scholar 

  • Bringmann, K. and Panagiotou, K. (2012). Efficient sampling methods for discrete distributions. In In Proc. 39th International Colloquium on Automata, Languages, and Programming (ICALP’12, pages 133–144. Springer.

    Google Scholar 

  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, pages 209–230.

    Google Scholar 

  • Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. The Annals of Statistics, pages 615–629.

    Google Scholar 

  • Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41(8), 578–588.

    Article  MATH  Google Scholar 

  • Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.

    Article  MathSciNet  MATH  Google Scholar 

  • Fraley, C. and Raftery, A. E. (2006). MCLUST version 3: an R package for normal mixture modeling and model-based clustering. Technical report, DTIC Document.

    Google Scholar 

  • Gaffney, S. and Smyth, P. (2005). Joint probabilistic curve clustering and alignment. In Neural Information Processing Systems (NIPS), pages 473–480. MIT Press.

    Google Scholar 

  • Huang, W., Gallivan, K., Srivastava, A., and Absil, P.-A. (2014). Riemannian optimization for elastic shape analysis. Mathematical theory of Networks and Systems.

    Google Scholar 

  • Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

    MATH  Google Scholar 

  • Jeannin, S. and Bober, M. (1999). Shape data for the MPEG-7 core experiment CE-Shape-1 @ONLINE.

    Google Scholar 

  • Kurtek, S., Srivastava, A., Klassen, E., and Ding, Z. (2012). Statistical modeling of curves using shapes and related features. Journal of the American Statistical Association, 107(499), 1152–1165.

    Article  MathSciNet  MATH  Google Scholar 

  • Liu, M., Vemuri, B. C., Amari, S.-I., and Nielsen, F. (2012). Shape retrieval using hierarchical total Bregman soft clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2407–2419.

    Article  Google Scholar 

  • Liu, W., Srivastava, A., and Zhang, J. (2011). A mathematical framework for protein structure comparison. PLoS Computational Biology, 7(2).

    Google Scholar 

  • MacCullagh, P. and Yang, J. (2008). How many clusters? Bayesian Analysis, 3(1), 1–19.

    Article  MathSciNet  Google Scholar 

  • MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 281–297, Berkeley, Calif. University of California Press.

    Google Scholar 

  • McCullagh, P. (2009). Marginal likelihood for distance matrices. Statistica Sinica, 19, 631–649.

    MathSciNet  MATH  Google Scholar 

  • Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 247(4), 536–540.

    Google Scholar 

  • Ng, A. Y., Jordan, M. I., Weiss, Y., et al. (2002). On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2, 849–856.

    Google Scholar 

  • Ozawa, K. (1985). A stratificational overlapping cluster scheme. Pattern Recognition, 18(3–4), 279–286.

    Article  Google Scholar 

  • Pitman, J. (2006). Combinatorial stochastic processes, volume 1875. Springer-Verlag.

    Google Scholar 

  • Srivastava, A., Joshi, S., Mio, W., and Liu, X. (2005). Statistical shape analysis: clustering, learning, and testing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 590–602.

    Article  Google Scholar 

  • Srivastava, A., Klassen, E., Joshi, S. H., and Jermyn, I. H. (2011). Shape analysis of elastic curves in Euclidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 1415–1428.

    Article  Google Scholar 

  • Torsello, A., Robles-Kelly, A., and Hancock, E. (2007). Discovering shape classes using tree edit-distance and pairwise clustering. International Journal of Computer Vision, 72(3), 259–285.

    Article  Google Scholar 

  • Vogt, J. E., Prabhakaran, S., Fuchs, T. J., and Roth, V. (2010). The translation-invariant Wishart-Dirichlet process for clustering distance data. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 1111–1118.

    Google Scholar 

  • Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.

    Article  MathSciNet  Google Scholar 

  • Yankov, D. and Keogh, E. (2006). Manifold clustering of shapes. In Proceedings of ICDM, pages 1167–1171, Washington, DC, USA.

    Google Scholar 

  • Zhang, Z., Pati, D., and Srivastava, A. (2015). Bayesian clustering of shapes of curves. Journal of Statistical Planning and Inference (to appear).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debdeep Pati .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Zhang, Z., Pati, D., Srivastava, A. (2015). Bayesian Shape Clustering. In: Mitra, R., Müller, P. (eds) Nonparametric Bayesian Inference in Biostatistics. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-19518-6_3

Download citation

Publish with us

Policies and ethics