Skip to main content

Part of the book series: Statistics for Industry and Technology ((SIT))

Abstract

Standard clustering methods do not handle truly large data sets and fail to take into account multilevel data structures. This work outlines an approach to clustering that integrates the Kohonen Self-Organizing Map (SOM) with other clustering methods. Moreover, in order to take into account multilevel structures, a statistical model is proposed, in which a mixture of distributions may have mixing coefficients depending on higher-level variables. Thus, in a first step, the SOM provides a substantial data reduction, whereby a variety of ascending and divisive clustering algorithms becomes accessible. As a second step, statistical modeling provides both a direct means to treat multilevel structures and a framework for model-based clustering. The interplay of these two steps is illustrated on an example of nutritional data from a multicenter study on nutrition and cancer, known as EPIC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ambroise, C., Seźe, G., Badran, F., and Thiria, S. (2000). Hierarchical clustering of self-organizing maps for cloud classification, Neurocomputing, 30, 47–52.

    Article  Google Scholar 

  2. Bock, H. H. (1993). Classification and clustering: Problems for the future, In New Approaches in Classification and Data Analysis (Eds., E. Diday, Y. Lechevallier, M. Schader, P. Bertrand, and B. Burtschy), pp. 3–24, Springer-Verlag, Heidelberg.

    Google Scholar 

  3. Bock, H. H., and Diday, E. (Eds.) (1999). Analysis of symbolic data, exploratory methods for extracting statistical information from complex data, In Studies in Classification, Data Analysis and Knowledge Organization, Springer-Verlag, Heidelberg.

    Google Scholar 

  4. Chavent, M. (1998). A monothetic clustering algorithm, Pattern Recognition Letters, 19, 989–996.

    Article  MATH  Google Scholar 

  5. Ciampi, A., and Lechevallier, Y. (1995). Designing neural networks from statistical models: A new approach to data exploration, In Proceedings of the First International Conference on Knowledge Discovery and Data Mining, pp. 45–50, AAAI Press, Menlo Park, California.

    Google Scholar 

  6. Ciampi, A., and Lechevallier, Y. (1997). Statistical models as building blocks of neural networks, Communications in Statistics, 26, 991–1009.

    Article  MATH  MathSciNet  Google Scholar 

  7. Elemento, O. (1999). Apport de l’analyse en composantes principales pour l’initialisation et la validation de cartes de Kohonen, In Septièmes Journées de la Société Francophone de Classification, Nancy, France.

    Google Scholar 

  8. Gordon, A. D. (1981). Classification: Methods for the Exploratory Analysis of Multivariate Data, Chapman & Hall, London, UK.

    MATH  Google Scholar 

  9. Hébrail, G., and Debregeas, A. (1998). Interactive interpretation of Kohonen maps applied to curves, In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, pp. 179–183, AAAI press, Menlo Park, California.

    Google Scholar 

  10. Murthag, F. (1995). Interpreting the Kohonen self-organizing feature map using contiguity-constrained clustering, Pattern Recognition Letters, 16, 399–408.

    Article  Google Scholar 

  11. Noirhomme-Fraiture, M., and Rouard, M. (1998). Representation of subpopulations and correlation with Zoom Star, In Proceedings of NNTS’98, Sorrento, Italy.

    Google Scholar 

  12. Thiria, S., Lechevallier, Y., Gascuel, O., and Canu, S. (1997). Statistique et MĂ©thodes Neuronales, Dunod, Paris.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Birkhäuser Boston

About this chapter

Cite this chapter

Lechevallier, Y., Ciampi, A. (2007). Multilevel Clustering for Large Databases. In: Auget, JL., Balakrishnan, N., Mesbah, M., Molenberghs, G. (eds) Advances in Statistical Methods for the Health Sciences. Statistics for Industry and Technology. Birkhäuser Boston. https://doi.org/10.1007/978-0-8176-4542-7_17

Download citation

Publish with us

Policies and ethics