Skip to main content

On Class Visualisation for High Dimensional Data: Exploring Scientific Data Sets

  • Conference paper
Discovery Science (DS 2006)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4265))

Included in the following conference series:

Abstract

Parametric Embedding (PE) has recently been proposed as a general-purpose algorithm for class visualisation. It takes class posteriors produced by a mixture-based clustering algorithm and projects them in 2D for visualisation. However, although this fully modularised combination of objectives (clustering and projection) is attractive for its conceptual simplicity, in the case of high dimensional data, we show that a more optimal combination of these objectives can be achieved by integrating them both into a consistent probabilistic model. In this way, the projection step will fulfil a role of regularisation, guarding against the curse of dimensionality. As a result, the tradeoff between clustering and visualisation turns out to enhance the predictive abilities of the overall model. We present results on both synthetic data and two real-world high-dimensional data sets: observed spectra of early-type galaxies and gene expression arrays.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumour and Normal Colon Cancer Tissues Probed by Oligonucleotide Arrays. Cell Biol. 96, 6745–6750

    Google Scholar 

  2. Attias, H.: Learning in High Dimension: Modular mixture models. In: Proc. Artificial Intelligence and Statistics (2001)

    Google Scholar 

  3. Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York (1995)

    Google Scholar 

  4. Bishop, C.M., Svensen, M., Williams, C.K.I.: GTM: The Generative Topographic Mapping. Neural Computation 10(1) (1998)

    Google Scholar 

  5. Carlin, B.P., Louis, T.A.: Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall, Boca Raton (2000)

    Book  MATH  Google Scholar 

  6. Hofmann, T.: Gaussian Latent Semantic Models for Collaborative Filtering. In: 26th Annual International ACM SIGIR Conference (2003)

    Google Scholar 

  7. Iwata, T., Saito, K., Ueda, N., Stromsten, S., Griffiths, T.L., Tenenbaum, J.B.: Parameteric Embedding for Class Visualisation. In: Proc. Neur. Information Processing Systems, p. 17 (2005)

    Google Scholar 

  8. Kabán, A., Nolan, L., Raychaudhury, S.: Finding Young Stellar Populations in Elliptical Galaxies from Independent Components of Optical Spectra. In: Jonker, W., Petković, M. (eds.) SDM 2005. LNCS, vol. 3674, pp. 183–194. Springer, Heidelberg (2005)

    Google Scholar 

  9. Nolan, L., Harva, M., Kabán, A., Raychaudhury, S.: A data-driven Bayesian approach to finding young stellar populations in early-type galaxies from their ultraviolet-optical spectra. Mon. Not. of the Royal Astron. Soc. 366, 321–338 (2006)

    Google Scholar 

  10. Nolan, L., Dunlop, J.S., Panter, B., Jimenez, R., Heavens, A., Smith, G.: The star-formation histories of elliptical galaxies across the fundamental plane (submitted to MNRAS)

    Google Scholar 

  11. Rice, J.: Reflections on SCMA III. In: Feigelson, E.C., Babu, G.J. (eds.) Statistical challenges in astronomy. Springer, Heidelberg (2003)

    Google Scholar 

  12. Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cDNA microarray datasets. IEEE/ACM Transact. Comput. Biol. Bioinformatics 2, 143–156

    Google Scholar 

  13. Soukup, T., Davidson, I.: Visual Data Mining: Techniques and Tools for Data Visualisation and Mining. Wiley, Chichester (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kabán, A., Sun, J., Raychaudhury, S., Nolan, L. (2006). On Class Visualisation for High Dimensional Data: Exploring Scientific Data Sets. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds) Discovery Science. DS 2006. Lecture Notes in Computer Science(), vol 4265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893318_15

Download citation

  • DOI: https://doi.org/10.1007/11893318_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-46491-4

  • Online ISBN: 978-3-540-46493-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics