Skip to main content
Log in

Sparsest factor analysis for clustering variables: a matrix decomposition approach

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Adachi K (2012) Some contributions to data-fitting factor analysis with empirical comparisons to covariance-fitting factor analysis. J Jpn Soc Comput Stat 25:25–38

    Article  MathSciNet  Google Scholar 

  • Adachi K (2014) A matrix-intensive approach to factor analysis. Jpn J Stat 44:363–382 (in Japanese)

    MathSciNet  Google Scholar 

  • Adachi K, Trendafilov NT (2014) Sparse orthogonal factor analysis. In: Carpita M, Brentari E, Qannari EM (eds) Advances in latent variables: studies in theoretical and applied statistics. Springer, Heidelberg, pp 227–239

    Google Scholar 

  • Aggarwal CC (2015) Data mining: the textbook. Springer, New York

    Book  Google Scholar 

  • Costa PT, McCrae RR (1992) NEO PI-R professional manual: revised NEO personality inventory (NEO PI-R) and NEO five-factor inventory (NEO-FFI). Psychological Assessment Resources, Odessa

    Google Scholar 

  • de Leeuw J (2004) Least squares optimal scaling of partially observed linear systems. In: van Montfort K, Oud J, Satorra A (eds) Recent developments of structural equation models: theory and applications. Kluwer Academic Publishers, Dordrecht, pp 121–134

    Chapter  Google Scholar 

  • Eldén L (2007) Matrix methods in data mining and pattern recognition. SIAM, Philadelphia

    Book  Google Scholar 

  • Everitt BS (1993) Cluster analysis, 3rd edn. Edward Arnold, London

    MATH  Google Scholar 

  • Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and Applications. Society of Industrial and Applied Mathematics (SIAM), Philadelphia

    Book  Google Scholar 

  • Goldberg LR (1992) The development of markers for the Big-five factor structure. Psychol Assess 4:26–42

    Article  Google Scholar 

  • Harman HH (1976) Modern factor analysis, 3rd edn. The University of Chicago Press, Chicago

    MATH  Google Scholar 

  • Hirose K, Yamamoto M (2014a) Estimation of an oblique structure via penalized likelihood factor analysis. Comput Stat Data Anal 79:120–132

    Article  MathSciNet  Google Scholar 

  • Hirose K, Yamamoto M (2014b) Sparse estimation via nonconcave penalized likelihood in factor analysis model. Comput, Statist. doi:10.1007/s11222-014-9475-z

    Book  MATH  Google Scholar 

  • Holzinger KJ, Swineford F (1939) A study in factor analysis: the stability of a bi-factor solution. University of Chicago, Supplementary Educational Monographs, No. 48

  • Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York

    Book  Google Scholar 

  • Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12:531–547

    Article  MathSciNet  Google Scholar 

  • Knowles D, Ghahramani Z (2011) Nonparametric Bayesian sparse factor models with applications to gene expression modeling. Ann Appl Stat 5:1534–1552

    Article  MathSciNet  Google Scholar 

  • Mazumder R, Friedman J, Hastie T (2011) SparseNet: coordinate descent with nonconvex penalties. J Am Stat Assoc 106:1125–1138

    Article  MathSciNet  Google Scholar 

  • Mulaik SA (2010) Foundations of factor analysis, 2nd edn. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Rattray M, Stegle O, Sharp K, Winn J (2009) Inference algorithms and learning theory for Bayesian sparse factor analysis. J Phys Conf Ser 197:1–10. doi:10.1088/1742-6596/197/1/012002

    Article  Google Scholar 

  • Reyment R, Jöreskog KG (1996) Applied factor analysis in the natural sciences. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Sampson RJ (1968) \(R\)-mode factor analysis program in FORTRAN II for the IBM 1620 computer. Kansas Geol Survey Comput Contrib 20

  • Seber GAF (2008) A matrix handbook for statisticians. Wiley, Hoboken

    MATH  Google Scholar 

  • Sočan G (2003) The incremental value of minimum rank factor analysis. Ph.D. Thesis, University of Groningen, Groningen

  • Spearman C (1904) ‘General intelligence’ objectively determined and measured. Am J Psychol 15:201–293

    Article  Google Scholar 

  • Stegeman A (2016) A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts. Comput Stat Data Anal 99:189–203

    Article  MathSciNet  Google Scholar 

  • ten Berge JMF (1983) A generalization of Kristof’s theorem on the trace of certain matrix products. Psychometrika 48:519–523

    Article  MathSciNet  Google Scholar 

  • ten Berge JMF (1993) Least squares optimization in multivariate analysis. DSWO Press, Leiden

    Google Scholar 

  • Trendafilov NT (2014) From simple structure to sparse components: a review. Comput Stat 29:431–454

    Article  MathSciNet  Google Scholar 

  • Trendafilov NT, Unkel S (2011) Exploratory factor analysis of data matrices with more variables than observations. J Comput Graph Stat 20:874–891

    Article  MathSciNet  Google Scholar 

  • Trendafilov NT, Unkel S, Krzanowski W (2011) Exploratory factor and principal component analyses: some new aspects. Stat Comput 23:209–220

    Article  MathSciNet  Google Scholar 

  • Unkel S, Trendafilov NT (2010) Simultaneous parameter estimation in exploratory factor analysis: an expository review. Int Stat Rev 78:363–382

    Article  Google Scholar 

  • Vichi M, Saporta G (2009) Clustering and disjoint principal component analysis. Comput Stat Data Anal 53:3194–3208

    Article  MathSciNet  Google Scholar 

  • Zaki MJ, Meira W (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Zou DM, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15:265–286

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

Funding was provided by the Japan Society for the Promotion of Science (Grant No. (C)-26330039), The Leverhulme Trust, UK (Grant No. RPG-2013-211). The authors thank the Editors, the anonymous Associate Editor, and the anonymous reviewers for their useful comments and suggestions which considerably improved the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kohei Adachi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Adachi, K., Trendafilov, N.T. Sparsest factor analysis for clustering variables: a matrix decomposition approach. Adv Data Anal Classif 12, 559–585 (2018). https://doi.org/10.1007/s11634-017-0284-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-017-0284-z

Keywords

Mathematics Subject Classification

Navigation