Bayesian Shape Clustering

Zhang, Zhengwu; Pati, Debdeep; Srivastava, Anuj

doi:10.1007/978-3-319-19518-6_3

Zhengwu Zhang⁸,
Debdeep Pati⁸ &
Anuj Srivastava⁸

Part of the book series: Frontiers in Probability and the Statistical Sciences ((FROPROSTAS))

4009 Accesses
1 Citations

Abstract

Curve clustering is an important fundamental problem in biomedical applications involving clustering protein sequences or cell shapes in microscopy images. Existing model-based clustering techniques rely on simple probability models that are not generally valid for analyzing shapes of curves. In this chapter, we talk about an efficient Bayesian method to cluster curve data using a carefully chosen metric on the shape space. Rather than modeling the infinite-dimensional curves, we focus on modeling a summary statistic which is the inner product matrix obtained from the data. The inner-product matrix is modeled using a Wishart with parameters with carefully chosen hyperparameters which induce clustering and allow for automatic inference on the number of clusters. Posterior is sampled through an efficient Markov chain Monte Carlo procedure based on the Chinese restaurant process. This method is demonstrated on a variety of synthetic data and real data examples on protein structure analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Since only one observation changes the cluster index, one can explicitly calculate the difference between the old values of (3.5) and (3.6) and new values in O(1) steps.

References

Adametz, D. and Roth, V. (2011). Bayesian partitioning of large-scale distance data. In Neural Information Processing Systems (NIPS), pages 1368–1376.
Google Scholar
Auder, B. and Fischer, A. (2012). Projection-based curve clustering. Journal of Statistical Computation and Simulation, 82(8), 1145–1168.
Article MathSciNet MATH Google Scholar
Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, pages 803–821.
Google Scholar
Belongie, S., Malik, J., and Puzicha, J. (2002). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.
Article Google Scholar
Bicego, M. and Murino, V. (2004). Investigating hidden Markov models’ capabilities in 2D shape classification. IEEE Trans. Pattern Anal. Mach. Intell, 26, 281–286.
Article Google Scholar
Bicego, M. and Murino, V. (2007). Hidden Markov model-based weighted likelihood discriminant for 2D shape classification. IEEE Trans. Pattern Anal. Mach. Intell, 16, 2707–2719.
Google Scholar
Bicego, M., Murino, V., and Figueiredo, M. A. (2004). Similarity-based classification of sequences using hidden Markov models. Pattern Recognition, 37(12), 2281–2291.
Article Google Scholar
Bringmann, K. and Panagiotou, K. (2012). Efficient sampling methods for discrete distributions. In In Proc. 39th International Colloquium on Automata, Languages, and Programming (ICALP’12, pages 133–144. Springer.
Google Scholar
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, pages 209–230.
Google Scholar
Ferguson, T. S. (1974). Prior distributions on spaces of probability measures. The Annals of Statistics, pages 615–629.
Google Scholar
Fraley, C. and Raftery, A. E. (1998). How many clusters? Which clustering method? Answers via model-based cluster analysis. The Computer Journal, 41(8), 578–588.
Article MATH Google Scholar
Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. Journal of the American Statistical Association, 97(458), 611–631.
Article MathSciNet MATH Google Scholar
Fraley, C. and Raftery, A. E. (2006). MCLUST version 3: an R package for normal mixture modeling and model-based clustering. Technical report, DTIC Document.
Google Scholar
Gaffney, S. and Smyth, P. (2005). Joint probabilistic curve clustering and alignment. In Neural Information Processing Systems (NIPS), pages 473–480. MIT Press.
Google Scholar
Huang, W., Gallivan, K., Srivastava, A., and Absil, P.-A. (2014). Riemannian optimization for elastic shape analysis. Mathematical theory of Networks and Systems.
Google Scholar
Jain, A. K. and Dubes, R. C. (1988). Algorithms for Clustering Data. Prentice-Hall, Inc., Upper Saddle River, NJ, USA.
MATH Google Scholar
Jeannin, S. and Bober, M. (1999). Shape data for the MPEG-7 core experiment CE-Shape-1 @ONLINE.
Google Scholar
Kurtek, S., Srivastava, A., Klassen, E., and Ding, Z. (2012). Statistical modeling of curves using shapes and related features. Journal of the American Statistical Association, 107(499), 1152–1165.
Article MathSciNet MATH Google Scholar
Liu, M., Vemuri, B. C., Amari, S.-I., and Nielsen, F. (2012). Shape retrieval using hierarchical total Bregman soft clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(12), 2407–2419.
Article Google Scholar
Liu, W., Srivastava, A., and Zhang, J. (2011). A mathematical framework for protein structure comparison. PLoS Computational Biology, 7(2).
Google Scholar
MacCullagh, P. and Yang, J. (2008). How many clusters? Bayesian Analysis, 3(1), 1–19.
Article MathSciNet Google Scholar
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics, pages 281–297, Berkeley, Calif. University of California Press.
Google Scholar
McCullagh, P. (2009). Marginal likelihood for distance matrices. Statistica Sinica, 19, 631–649.
MathSciNet MATH Google Scholar
Murzin, A. G., Brenner, S. E., Hubbard, T., and Chothia, C. (1995). SCOP: A structural classification of proteins database for the investigation of sequences and structures. Journal of Molecular Biology, 247(4), 536–540.
Google Scholar
Ng, A. Y., Jordan, M. I., Weiss, Y., et al. (2002). On spectral clustering: analysis and an algorithm. Advances in Neural Information Processing Systems, 2, 849–856.
Google Scholar
Ozawa, K. (1985). A stratificational overlapping cluster scheme. Pattern Recognition, 18(3–4), 279–286.
Article Google Scholar
Pitman, J. (2006). Combinatorial stochastic processes, volume 1875. Springer-Verlag.
Google Scholar
Srivastava, A., Joshi, S., Mio, W., and Liu, X. (2005). Statistical shape analysis: clustering, learning, and testing. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27, 590–602.
Article Google Scholar
Srivastava, A., Klassen, E., Joshi, S. H., and Jermyn, I. H. (2011). Shape analysis of elastic curves in Euclidean spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33, 1415–1428.
Article Google Scholar
Torsello, A., Robles-Kelly, A., and Hancock, E. (2007). Discovering shape classes using tree edit-distance and pairwise clustering. International Journal of Computer Vision, 72(3), 259–285.
Article Google Scholar
Vogt, J. E., Prabhakaran, S., Fuchs, T. J., and Roth, V. (2010). The translation-invariant Wishart-Dirichlet process for clustering distance data. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 1111–1118.
Google Scholar
Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.
Article MathSciNet Google Scholar
Yankov, D. and Keogh, E. (2006). Manifold clustering of shapes. In Proceedings of ICDM, pages 1167–1171, Washington, DC, USA.
Google Scholar
Zhang, Z., Pati, D., and Srivastava, A. (2015). Bayesian clustering of shapes of curves. Journal of Statistical Planning and Inference (to appear).
Google Scholar

Download references

Author information

Authors and Affiliations

Florida State University, Tallahassee, FL, USA
Zhengwu Zhang, Debdeep Pati & Anuj Srivastava

Authors

Zhengwu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Debdeep Pati
View author publications
You can also search for this author in PubMed Google Scholar
Anuj Srivastava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debdeep Pati .

Editor information

Editors and Affiliations

Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky, USA
Riten Mitra
Department of Mathematics, University of Texas, Austin, Texas, USA
Peter Müller

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, Z., Pati, D., Srivastava, A. (2015). Bayesian Shape Clustering. In: Mitra, R., Müller, P. (eds) Nonparametric Bayesian Inference in Biostatistics. Frontiers in Probability and the Statistical Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-19518-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-19518-6_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19517-9
Online ISBN: 978-3-319-19518-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics