Skip to main content
Log in

The use of probabilistic models to produce optimal graphical displays of high-dimensional data sets

  • Published:
Journal of the Italian Statistical Society Aims and scope Submit manuscript

Summary

Several techniques for exploring ann×p data set are considered in the light of the statistical framework: data-structure+noise. The first application is to Principal Component Analysis (PCA), in fact generalized PCA with any metric M on the unit space ℝ p . A natural model for supporting this analysis is the fixed-effect model where the expectation of each unit is assumed to belong to some q-dimensional linear manyfold defining the structure, while the variance describes the noise. The best estimation of the structure is obtained for a proper choice of metric M and dimensionality q: guidelines are provided for both choices in section 2. The second application is to Projection Pursuit which aims to reveal structure in the original data by means of suitable low-dimensional projections of them. We suggest the use of generalized PCA with suitable metric M as a Projection Pursuit technique. According to the kind of structure which is looked for, two such metrics are proposed in section 3. Finally, the analysis ofn×p contingency tables is considered in section 4. Since the data are frequencies, we assume a multinomial or Poisson model for the noise. Several models may be considered for the structural part; we can say that Correspondence Analysis rests on one of them, spherical factor analysis on another one; Goodman association models also provide an alternative modelling. These different approaches are discussed and compared from several points of view.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Art D., Gnanadesikan R., Kettenring J. R. (1982), Data-based metrics for cluster analysis,Utilitas Math., 21.A, 75–99.

    MathSciNet  Google Scholar 

  • Besse Ph. (1990), PCA Stability and choice of dimensionality. Preprint, Laboratoire de Statistique et Probabilités, Toulouse.

    Google Scholar 

  • Besse, P., Caussinus H., Ferre L., Fine J. (1986), Some guidelines for Principal Components Analysis,COMPSTAT 86, Physica-Verlag Heidelberg, 23–30.

    Google Scholar 

  • Besse P., Caussinus H., Ferre L., Fine J. (1988), Principal Components Analysis and optimization of graphical displays,Statistics, 19, 2, 301–312.

    MATH  MathSciNet  Google Scholar 

  • Caussinus H. (1986a), Models and uses of Principal Component Analysis (with discussion),Multidimensional Data Analysis, J. de Leeuw et al. (eds.), 149–178, DSWO, Press, Leiden.

    Google Scholar 

  • Caussinus H. (1986b), Quelques réflexions sur la part des modèles probabilities en analyse des données,Data Analysis and Informatics IV, Diday et al. (eds.), 151–165, Amsterdam, North-Holland.

    Google Scholar 

  • Caussinus H. (1992), A simple technique for producing interesting projections of multidimensional data, Preprint, Laboratoire de Statistique et Probabilités, Toulouse.

    Google Scholar 

  • Caussinus H., Ferre L. (1989), «Analyse en Composantes Principlaes et individus définis par les paramètres d'un modèle»,Statistique et Analyse des Données, 14, 3, 19–28.

    Google Scholar 

  • Caussinus H., Ferre L. (1992), Comparing the parameters of a model for several units by means of Principal Component Analysis,Computational Statistics and Data Analysis, to appear.

  • Caussinus H., Ruiz A. (1990), Interesting projections of multidimensional data by means of generalized principal component analysis,COMPSTAT 90, Physica-Verlag, Heidelberg, 121–126.

    Google Scholar 

  • Daudin J. J., Duby C., Trecourt P. (1988), Stability of Principal Component Analysis Studied by the Bootstrap Method,Statistics, 19, 2, 241–258.

    MATH  MathSciNet  Google Scholar 

  • Daudin J. J., Duby C., Trecourt P. (1989), Stability studies by the Bootstrap and the Infinitesimal Jacknife Method,Statistics, 20, 2, 255–270.

    MATH  MathSciNet  Google Scholar 

  • Domenges D., Volle M. (1979), Analyse factorielle sphérique: une exploration,Annales de l'INSEE, 35, 3–84.

    MathSciNet  Google Scholar 

  • Ferre L. (1989), Choix de la dimension optimale pour certains types d'analyses en Composantes principales,C.R.Ac.Sc., Paris, 308, 1, 959–964.

    MathSciNet  Google Scholar 

  • Ferre L. (1990), A mean square error criterion to determine the number of components in generalized principal component analysis. Preprint, Laboratoire de Statistique et Probabilités, Toulouse.

    Google Scholar 

  • Fine J., Pousse A. (1989), Asymptotic study of functional models—Application to the metric choice in Principal Component Analysis, Preprint, to appear inStatistics.

  • Friedman J. H. (1987), Exploratory projection pursuit,J. Amer. Statist. Assoc., 82, 249–266.

    Article  MATH  MathSciNet  Google Scholar 

  • Gilula Z. (1984), On some similarities between canonical correlation models and latent class models for two-way contingency tables,Biometrika, 71, 523–529.

    Article  MATH  MathSciNet  Google Scholar 

  • Gilula Z., Haberman S. J. (1986), Canonical analysis of two-way contingency tables by maximum likelihood,J. Amer. Statist. Assoc., 81, 780–788.

    Article  MATH  MathSciNet  Google Scholar 

  • Goodman L. A. (1981), Association models and the bivariate normal for contigency tables with ordered categories,Biometrika, 68, 347–355.

    Article  MATH  MathSciNet  Google Scholar 

  • Goodman L. A. (1986), Some useful extensions of the usual correspondence analysis approach and the usual log-linear models approach in the analysis of contingency tables, (with discussion),Intern. Statist. Review, 54, 3, 243–309.

    Article  MATH  Google Scholar 

  • Goodman L. A. (1991), Measures, models and graphical displays in the analysis of cross-classified data (with discussion),J. Amer. Statist. Assoc., 86, 1085–1124.

    Article  MATH  MathSciNet  Google Scholar 

  • Huber P. J. (1985), Projection Pursuit (with discussion),Ann. Statist., 13, 435–525.

    MATH  MathSciNet  Google Scholar 

  • Jones M. C., Sibson R. (1987), What is Projection Pursuit? (with discussion),J. R. Stat. Soc. A, 150, 1, 1–36.

    MATH  MathSciNet  Google Scholar 

  • Linhart H., Zucchini W. (1986), Model Selection, Wiley, New York.

    MATH  Google Scholar 

  • Mallows C. L. (1973), Some comments on Cp,Technometrics, 15, 661–675.

    Article  MATH  Google Scholar 

  • Rijckevorsel J. van (1987), The application of fuzzy coding and horseshoes in Multiple Correspondence Analysis, Leiden: DSWO Press.

    Google Scholar 

  • Sibson R. (1984), Present position and potential developments: Some personal views —Multivariate Analysis,J. R. Statist. Soc. A, 147, 198–207.

    Article  MATH  Google Scholar 

  • Yenyukov I. S. (1988), Detecting structures by means of Projection Pursuit,COMPSTAT 88, Physica-Verlag, Heidelberg, 47–58.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Caussinus, H. The use of probabilistic models to produce optimal graphical displays of high-dimensional data sets. J. It. Statist. Soc. 1, 51–65 (1992). https://doi.org/10.1007/BF02589049

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02589049

Keywords

Navigation