Abstract
We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA.
Similar content being viewed by others
References
Adachi K (2012) Some contributions to data-fitting factor analysis with empirical comparisons to covariance-fitting factor analysis. J Jpn Soc Comput Stat 25:25–38
Adachi K (2014) A matrix-intensive approach to factor analysis. Jpn J Stat 44:363–382 (in Japanese)
Adachi K, Trendafilov NT (2014) Sparse orthogonal factor analysis. In: Carpita M, Brentari E, Qannari EM (eds) Advances in latent variables: studies in theoretical and applied statistics. Springer, Heidelberg, pp 227–239
Aggarwal CC (2015) Data mining: the textbook. Springer, New York
Costa PT, McCrae RR (1992) NEO PI-R professional manual: revised NEO personality inventory (NEO PI-R) and NEO five-factor inventory (NEO-FFI). Psychological Assessment Resources, Odessa
de Leeuw J (2004) Least squares optimal scaling of partially observed linear systems. In: van Montfort K, Oud J, Satorra A (eds) Recent developments of structural equation models: theory and applications. Kluwer Academic Publishers, Dordrecht, pp 121–134
Eldén L (2007) Matrix methods in data mining and pattern recognition. SIAM, Philadelphia
Everitt BS (1993) Cluster analysis, 3rd edn. Edward Arnold, London
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and Applications. Society of Industrial and Applied Mathematics (SIAM), Philadelphia
Goldberg LR (1992) The development of markers for the Big-five factor structure. Psychol Assess 4:26–42
Harman HH (1976) Modern factor analysis, 3rd edn. The University of Chicago Press, Chicago
Hirose K, Yamamoto M (2014a) Estimation of an oblique structure via penalized likelihood factor analysis. Comput Stat Data Anal 79:120–132
Hirose K, Yamamoto M (2014b) Sparse estimation via nonconcave penalized likelihood in factor analysis model. Comput, Statist. doi:10.1007/s11222-014-9475-z
Holzinger KJ, Swineford F (1939) A study in factor analysis: the stability of a bi-factor solution. University of Chicago, Supplementary Educational Monographs, No. 48
Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New York
Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12:531–547
Knowles D, Ghahramani Z (2011) Nonparametric Bayesian sparse factor models with applications to gene expression modeling. Ann Appl Stat 5:1534–1552
Mazumder R, Friedman J, Hastie T (2011) SparseNet: coordinate descent with nonconvex penalties. J Am Stat Assoc 106:1125–1138
Mulaik SA (2010) Foundations of factor analysis, 2nd edn. CRC Press, Boca Raton
Rattray M, Stegle O, Sharp K, Winn J (2009) Inference algorithms and learning theory for Bayesian sparse factor analysis. J Phys Conf Ser 197:1–10. doi:10.1088/1742-6596/197/1/012002
Reyment R, Jöreskog KG (1996) Applied factor analysis in the natural sciences. Cambridge University Press, Cambridge
Sampson RJ (1968) \(R\)-mode factor analysis program in FORTRAN II for the IBM 1620 computer. Kansas Geol Survey Comput Contrib 20
Seber GAF (2008) A matrix handbook for statisticians. Wiley, Hoboken
Sočan G (2003) The incremental value of minimum rank factor analysis. Ph.D. Thesis, University of Groningen, Groningen
Spearman C (1904) ‘General intelligence’ objectively determined and measured. Am J Psychol 15:201–293
Stegeman A (2016) A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts. Comput Stat Data Anal 99:189–203
ten Berge JMF (1983) A generalization of Kristof’s theorem on the trace of certain matrix products. Psychometrika 48:519–523
ten Berge JMF (1993) Least squares optimization in multivariate analysis. DSWO Press, Leiden
Trendafilov NT (2014) From simple structure to sparse components: a review. Comput Stat 29:431–454
Trendafilov NT, Unkel S (2011) Exploratory factor analysis of data matrices with more variables than observations. J Comput Graph Stat 20:874–891
Trendafilov NT, Unkel S, Krzanowski W (2011) Exploratory factor and principal component analyses: some new aspects. Stat Comput 23:209–220
Unkel S, Trendafilov NT (2010) Simultaneous parameter estimation in exploratory factor analysis: an expository review. Int Stat Rev 78:363–382
Vichi M, Saporta G (2009) Clustering and disjoint principal component analysis. Comput Stat Data Anal 53:3194–3208
Zaki MJ, Meira W (2014) Data mining and analysis: fundamental concepts and algorithms. Cambridge University Press, Cambridge
Zou DM, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J Comput Graph Stat 15:265–286
Acknowledgements
Funding was provided by the Japan Society for the Promotion of Science (Grant No. (C)-26330039), The Leverhulme Trust, UK (Grant No. RPG-2013-211). The authors thank the Editors, the anonymous Associate Editor, and the anonymous reviewers for their useful comments and suggestions which considerably improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Adachi, K., Trendafilov, N.T. Sparsest factor analysis for clustering variables: a matrix decomposition approach. Adv Data Anal Classif 12, 559–585 (2018). https://doi.org/10.1007/s11634-017-0284-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11634-017-0284-z
Keywords
- Exploratory factor analysis
- Sparsest loadings
- Matrix decomposition factor analysis
- Variable clustering
- QR re-parameterization