On Class Visualisation for High Dimensional Data: Exploring Scientific Data Sets

Kabán, Ata; Sun, Jianyong; Raychaudhury, Somak; Nolan, Louisa

doi:10.1007/11893318_15

Ata Kabán²¹,
Jianyong Sun^21,22,
Somak Raychaudhury²² &
…
Louisa Nolan²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4265))

Included in the following conference series:

International Conference on Discovery Science

1235 Accesses
5 Citations

Abstract

Parametric Embedding (PE) has recently been proposed as a general-purpose algorithm for class visualisation. It takes class posteriors produced by a mixture-based clustering algorithm and projects them in 2D for visualisation. However, although this fully modularised combination of objectives (clustering and projection) is attractive for its conceptual simplicity, in the case of high dimensional data, we show that a more optimal combination of these objectives can be achieved by integrating them both into a consistent probabilistic model. In this way, the projection step will fulfil a role of regularisation, guarding against the curse of dimensionality. As a result, the tradeoff between clustering and visualisation turns out to enhance the predictive abilities of the overall model. We present results on both synthetic data and two real-world high-dimensional data sets: observed spectra of early-type galaxies and gene expression arrays.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumour and Normal Colon Cancer Tissues Probed by Oligonucleotide Arrays. Cell Biol. 96, 6745–6750
Google Scholar
Attias, H.: Learning in High Dimension: Modular mixture models. In: Proc. Artificial Intelligence and Statistics (2001)
Google Scholar
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Inc., New York (1995)
Google Scholar
Bishop, C.M., Svensen, M., Williams, C.K.I.: GTM: The Generative Topographic Mapping. Neural Computation 10(1) (1998)
Google Scholar
Carlin, B.P., Louis, T.A.: Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall, Boca Raton (2000)
Book MATH Google Scholar
Hofmann, T.: Gaussian Latent Semantic Models for Collaborative Filtering. In: 26th Annual International ACM SIGIR Conference (2003)
Google Scholar
Iwata, T., Saito, K., Ueda, N., Stromsten, S., Griffiths, T.L., Tenenbaum, J.B.: Parameteric Embedding for Class Visualisation. In: Proc. Neur. Information Processing Systems, p. 17 (2005)
Google Scholar
Kabán, A., Nolan, L., Raychaudhury, S.: Finding Young Stellar Populations in Elliptical Galaxies from Independent Components of Optical Spectra. In: Jonker, W., Petković, M. (eds.) SDM 2005. LNCS, vol. 3674, pp. 183–194. Springer, Heidelberg (2005)
Google Scholar
Nolan, L., Harva, M., Kabán, A., Raychaudhury, S.: A data-driven Bayesian approach to finding young stellar populations in early-type galaxies from their ultraviolet-optical spectra. Mon. Not. of the Royal Astron. Soc. 366, 321–338 (2006)
Google Scholar
Nolan, L., Dunlop, J.S., Panter, B., Jimenez, R., Heavens, A., Smith, G.: The star-formation histories of elliptical galaxies across the fundamental plane (submitted to MNRAS)
Google Scholar
Rice, J.: Reflections on SCMA III. In: Feigelson, E.C., Babu, G.J. (eds.) Statistical challenges in astronomy. Springer, Heidelberg (2003)
Google Scholar
Rogers, S., Girolami, M., Campbell, C., Breitling, R.: The latent process decomposition of cDNA microarray datasets. IEEE/ACM Transact. Comput. Biol. Bioinformatics 2, 143–156
Google Scholar
Soukup, T., Davidson, I.: Visual Data Mining: Techniques and Tools for Data Visualisation and Mining. Wiley, Chichester (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science,
Ata Kabán & Jianyong Sun
School of Physics and Astronomy, The University of Birmingham, Birmingham, B15 2TT, UK
Jianyong Sun, Somak Raychaudhury & Louisa Nolan

Authors

Ata Kabán
View author publications
You can also search for this author in PubMed Google Scholar
Jianyong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Somak Raychaudhury
View author publications
You can also search for this author in PubMed Google Scholar
Louisa Nolan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Ljupčo Todorovski
University of Nova Gorica, Nova Gorica, Slovenia
Nada Lavrač
Meme Media Laboratory, Hokkaido University Sapporo, Kita 13, Nishi 8, Kita-ku, P.O. Box, 060-8628, Sapporo, Japan
Klaus P. Jantke

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kabán, A., Sun, J., Raychaudhury, S., Nolan, L. (2006). On Class Visualisation for High Dimensional Data: Exploring Scientific Data Sets. In: Todorovski, L., Lavrač, N., Jantke, K.P. (eds) Discovery Science. DS 2006. Lecture Notes in Computer Science(), vol 4265. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893318_15

Download citation

DOI: https://doi.org/10.1007/11893318_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46491-4
Online ISBN: 978-3-540-46493-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics