Summary
Several techniques for exploring ann×p data set are considered in the light of the statistical framework: data-structure+noise. The first application is to Principal Component Analysis (PCA), in fact generalized PCA with any metric M on the unit space ℝ p . A natural model for supporting this analysis is the fixed-effect model where the expectation of each unit is assumed to belong to some q-dimensional linear manyfold defining the structure, while the variance describes the noise. The best estimation of the structure is obtained for a proper choice of metric M and dimensionality q: guidelines are provided for both choices in section 2. The second application is to Projection Pursuit which aims to reveal structure in the original data by means of suitable low-dimensional projections of them. We suggest the use of generalized PCA with suitable metric M as a Projection Pursuit technique. According to the kind of structure which is looked for, two such metrics are proposed in section 3. Finally, the analysis ofn×p contingency tables is considered in section 4. Since the data are frequencies, we assume a multinomial or Poisson model for the noise. Several models may be considered for the structural part; we can say that Correspondence Analysis rests on one of them, spherical factor analysis on another one; Goodman association models also provide an alternative modelling. These different approaches are discussed and compared from several points of view.
Similar content being viewed by others
References
Art D., Gnanadesikan R., Kettenring J. R. (1982), Data-based metrics for cluster analysis,Utilitas Math., 21.A, 75–99.
Besse Ph. (1990), PCA Stability and choice of dimensionality. Preprint, Laboratoire de Statistique et Probabilités, Toulouse.
Besse, P., Caussinus H., Ferre L., Fine J. (1986), Some guidelines for Principal Components Analysis,COMPSTAT 86, Physica-Verlag Heidelberg, 23–30.
Besse P., Caussinus H., Ferre L., Fine J. (1988), Principal Components Analysis and optimization of graphical displays,Statistics, 19, 2, 301–312.
Caussinus H. (1986a), Models and uses of Principal Component Analysis (with discussion),Multidimensional Data Analysis, J. de Leeuw et al. (eds.), 149–178, DSWO, Press, Leiden.
Caussinus H. (1986b), Quelques réflexions sur la part des modèles probabilities en analyse des données,Data Analysis and Informatics IV, Diday et al. (eds.), 151–165, Amsterdam, North-Holland.
Caussinus H. (1992), A simple technique for producing interesting projections of multidimensional data, Preprint, Laboratoire de Statistique et Probabilités, Toulouse.
Caussinus H., Ferre L. (1989), «Analyse en Composantes Principlaes et individus définis par les paramètres d'un modèle»,Statistique et Analyse des Données, 14, 3, 19–28.
Caussinus H., Ferre L. (1992), Comparing the parameters of a model for several units by means of Principal Component Analysis,Computational Statistics and Data Analysis, to appear.
Caussinus H., Ruiz A. (1990), Interesting projections of multidimensional data by means of generalized principal component analysis,COMPSTAT 90, Physica-Verlag, Heidelberg, 121–126.
Daudin J. J., Duby C., Trecourt P. (1988), Stability of Principal Component Analysis Studied by the Bootstrap Method,Statistics, 19, 2, 241–258.
Daudin J. J., Duby C., Trecourt P. (1989), Stability studies by the Bootstrap and the Infinitesimal Jacknife Method,Statistics, 20, 2, 255–270.
Domenges D., Volle M. (1979), Analyse factorielle sphérique: une exploration,Annales de l'INSEE, 35, 3–84.
Ferre L. (1989), Choix de la dimension optimale pour certains types d'analyses en Composantes principales,C.R.Ac.Sc., Paris, 308, 1, 959–964.
Ferre L. (1990), A mean square error criterion to determine the number of components in generalized principal component analysis. Preprint, Laboratoire de Statistique et Probabilités, Toulouse.
Fine J., Pousse A. (1989), Asymptotic study of functional models—Application to the metric choice in Principal Component Analysis, Preprint, to appear inStatistics.
Friedman J. H. (1987), Exploratory projection pursuit,J. Amer. Statist. Assoc., 82, 249–266.
Gilula Z. (1984), On some similarities between canonical correlation models and latent class models for two-way contingency tables,Biometrika, 71, 523–529.
Gilula Z., Haberman S. J. (1986), Canonical analysis of two-way contingency tables by maximum likelihood,J. Amer. Statist. Assoc., 81, 780–788.
Goodman L. A. (1981), Association models and the bivariate normal for contigency tables with ordered categories,Biometrika, 68, 347–355.
Goodman L. A. (1986), Some useful extensions of the usual correspondence analysis approach and the usual log-linear models approach in the analysis of contingency tables, (with discussion),Intern. Statist. Review, 54, 3, 243–309.
Goodman L. A. (1991), Measures, models and graphical displays in the analysis of cross-classified data (with discussion),J. Amer. Statist. Assoc., 86, 1085–1124.
Huber P. J. (1985), Projection Pursuit (with discussion),Ann. Statist., 13, 435–525.
Jones M. C., Sibson R. (1987), What is Projection Pursuit? (with discussion),J. R. Stat. Soc. A, 150, 1, 1–36.
Linhart H., Zucchini W. (1986), Model Selection, Wiley, New York.
Mallows C. L. (1973), Some comments on Cp,Technometrics, 15, 661–675.
Rijckevorsel J. van (1987), The application of fuzzy coding and horseshoes in Multiple Correspondence Analysis, Leiden: DSWO Press.
Sibson R. (1984), Present position and potential developments: Some personal views —Multivariate Analysis,J. R. Statist. Soc. A, 147, 198–207.
Yenyukov I. S. (1988), Detecting structures by means of Projection Pursuit,COMPSTAT 88, Physica-Verlag, Heidelberg, 47–58.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Caussinus, H. The use of probabilistic models to produce optimal graphical displays of high-dimensional data sets. J. It. Statist. Soc. 1, 51–65 (1992). https://doi.org/10.1007/BF02589049
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02589049