Skip to main content

Unsupervised Component-Wise EM Learning for Finite Mixtures of Skew t-distributions

  • Conference paper
  • First Online:
Advanced Data Mining and Applications (ADMA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10086))

Included in the following conference series:

  • 2422 Accesses

Abstract

In recent years, finite mixtures of skew distributions are gaining popularity as a flexible tool for modelling data with asymmetric distributional features. Parameter estimation for these mixture models via the traditional EM algorithm requires the number of components to be specified a priori. In this paper, we consider unsupervised learning of skew mixture models where the optimal number of components is estimated during the parameter estimation process. We adopt a component-wise EM algorithm and use the minimum message length (MML) criterion. For illustrative purposes, we focus on the case of a finite mixture of multivariate skew t distributions. The performance of the approach is demonstrated on a real dataset from flow cytometry, where our mixture model was used to provide an automated segmentation of cell populations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abanto-Valle, C.A., Lachos, V.H., Dey, D.K.: Bayesian estimation of a skew-student-\(t\) stochastic volatility model. Methodol. Comput. Appl. Probab. 17, 721–738 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  2. Arellano-Valle, R.B., Genton, M.G.: On fundamental skew distributions. J. Multivar. Anal. 96, 93–116 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  3. Asparouhov, T., Muthén, B.: Structural equation models and mixture models with continuous non-normal skewed distributions. Structural Equation Modeling (2015)

    Google Scholar 

  4. Azzalini, A., Capitanio, A.: Distributions generated by perturbation of symmetry with emphasis on a multivariate skew \(t\)-distribution. J. Roy. Stat. Soc. B 65, 367–389 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Bernardi, M.: Risk measures for skew normal mixtures. Stat. Probab. Lett. 83, 1819–1824 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cabral, C.R.B., Lachos, V.H., Prates, M.O.: Multivariate mixture modeling using skew-normal independent distributions. Comput. Stat. Data Anal. 56, 126–142 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  7. Celeux, G., Chrétien, S., Forbes, F., MkhadrA.: A component-wise EM algorithm for mixtures. Journal of Computational and Graphical Statistics 10(4) (2001)

    Google Scholar 

  8. Figueiredo, M.A.T., Jain, A.K.: Unsupervised learning of finite mixture models. IEEE Trans. Pattern Anal. Mach. Intell. 24, 3813 (2002)

    Article  Google Scholar 

  9. Frühwirth-Schnatter, S., Pyne, S.: Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-\(t\) distributions. Biostatistics 11, 317–336 (2010)

    Article  Google Scholar 

  10. Hu, X., Kim, H., Brennan, P.J., Han, B., Baecher-Allan, C.M., Jager, P.L., Brenner, M.B., Raychaudhuri, S.: Application of user-guided automated cytometric data analysis to large-scale immunoprofiling of invariant natural killer t cells. In: Proceedings of the National Academy of Sciences USA, vol. 110, pp. 19030–19035 (2013)

    Google Scholar 

  11. Lee, S., McLachlan, G.J.: Finite mixtures of multivariate skew \(t\)-distributions: Some recent and new results. Stat. Comput. 24, 181–202 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  12. Lee, S.X., McLachlan, G.J.: Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22, 427–454 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  13. Lee, S.X., McLachlan, G.J.: Modelling asset return using multivariate asym- metric mixture nodels with applications to wstimation of value-at-risk. In: MODSIM 2013, 20th International Congress on Modelling and Simulation, pp. 1228–1234, Adelaide, Australia (2013)

    Google Scholar 

  14. Lee, S.X., McLachlan, G.J.: On mixtures of skew-normal and skew \(t\)-distributions. Adv. Data Anal. Classif. 7, 241–266 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  15. Lee, S.X., McLachlan, G.J.: Finite mixtures of canonical fundamental skew \(t\)-distributions: the unification of the restricted and unrestricted skew \(t\)-mixture models. Stat. Comput. 26, 573–589 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  16. Lee, S.X., McLachlan, G.J.: Risk measures based on multivariate skew normal and skew \(t\)-mixture models. In: Alcock, J., Satchell, S. (eds.) Asymmetric Dependence in Finance. Wiley, Hoboken, New Jersey (2016, to appear)

    Google Scholar 

  17. Lee, S.X., McLachlan, G.J., Pyne, S.: Supervised classification of flow cytometric samples via the Joint Clustering and Matching (JCM) procedure. arXiv:1411.2820 [q-bio.QM] (2014)

  18. Lee, S.X., McLachlan, G.J., Pyne, S.: Modelling of inter-sample variation in flow cytometric data with the Joint Clustering and Matching (JCM) procedure. Cytometry A (2016)

    Google Scholar 

  19. Lin, T.I.: Robust mixture modeling using multivariate skew-\(t\) distribution. Stat. Comput. 20, 343–356 (2010)

    Article  MathSciNet  Google Scholar 

  20. Lin, T.I., Ho, H.J., Lee, C.R.: Flexible mixture modelling using the multivariate skew-\(t\)-normal distribution. Stat. Comput. 24, 531–546 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  21. Lin, T.I., McLachlan, G.J., Lee, S.X.: Extending mixtures of factor models using the restricted multivariate skew-normal distribution. J. Multivar. Anal. 143, 398–413 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  22. Lin, T.I., Wu, P.H., McLachlan, G.J., Lee, S.X.: A robust factor analysis model using the restricted skew \(t\)-distribution. TEST 24, 510–531 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  23. McLachlan, G.J., Lee, S.X.: Comment on “Comparing Two Formulations of Skew Distributions with Special Reference to Model-Based Clustering” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. arXiv:1404.1733 (2014)

  24. McLachlan, G.J., Lee, S.X.: Comment on “On nomenclature for, and the relative merits of, two formulations of skew distributions” by A. Azzalini, R. Browne, M. Genton, and P. McNicholas. Statistics and Probaility Letters 116, 1–5 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  25. Muthén, B., Asparouhov, T.: Growth mixture modeling with non-normal distributions. Stat. Med. 34, 1041–1058 (2014)

    Article  MathSciNet  Google Scholar 

  26. Pyne, S., Hu, X., Wang, K., Rossin, E., Lin, T.I., Maier, L.M., Baecher-Allan, C., McLachlan, G.J., Tamayo, P., Hafler, D.A., Jager, P.L., Mesirow, J.P.: Automated high-dimensional flow cytometric data analysis. In: Proceedings of the National Academy of Sciences USA, vol. 106, pp. 8519–8524 (2009)

    Google Scholar 

  27. Pyne, S., Lee, S.X., Wang, K., Irish, J., Tamayo, P., Nazaire, M.D., Duong, T., Ng, S.K., Hafler, D., Levy, R., Nolan, G.P., Mesirov, J., McLachlan, G.: Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data. PLOS ONE 9, e100334 (2014)

    Article  Google Scholar 

  28. Pyne, S., Lee, S., McLachlan, G.: Nature and man: The goal of bio-security in the course of rapid and inevitable human development. J. Indian Soc. Agric. Stat. 69, 117–125 (2015)

    MathSciNet  Google Scholar 

  29. Riggi, S., Ingrassia, S.: A model-based clustering approach for mass composition analysis of high energy cosmic rays. Astropart. Phys. 48, 86–96 (2013)

    Article  Google Scholar 

  30. Sahu, S.K., Dey, D.K., Branco, M.D.: A new class of multivariate skew distributions with applications to Bayesian regression models. Can. J. Stat. 31, 129–150 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  31. Schaarschmidt, F., Hofmann, M., Jaki, T., Grün, B., Hothorn, L.A.: Statistical approaches for the determination of cut points in anti-drug antibody bioassays. J. Immunol. Methods 25, 295–306 (2015)

    Google Scholar 

  32. Wallace, C.S., Boulton, D.M.: An information measure for classification. Comput. J. 11, 185–189 (1968)

    Article  MATH  Google Scholar 

  33. Wang, K., Ng, S.K., McLachlan, G.J.: Multivariate skew \(t\) mixture models: applications to fluorescence-activated cell sorting data. In: Proceedings of Conference of Digital Image Computing: Techniques and Applications, pp. 526–531, Los Alamitos, California (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sharon X. Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Lee, S.X., McLachlan, G.J. (2016). Unsupervised Component-Wise EM Learning for Finite Mixtures of Skew t-distributions. In: Li, J., Li, X., Wang, S., Li, J., Sheng, Q. (eds) Advanced Data Mining and Applications. ADMA 2016. Lecture Notes in Computer Science(), vol 10086. Springer, Cham. https://doi.org/10.1007/978-3-319-49586-6_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49586-6_49

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49585-9

  • Online ISBN: 978-3-319-49586-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics