Data Visualisation and Exploration with Prior Knowledge

Schroeder, Martin; Cornford, Dan; Nabney, Ian T.

doi:10.1007/978-3-642-03969-0_13

Martin Schroeder⁴,
Dan Cornford⁴ &
Ian T. Nabney⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 43))

Included in the following conference series:

International Conference on Engineering Applications of Neural Networks

1438 Accesses
2 Citations

Abstract

Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bishop, C.M., Svensen, M., Williams, C.K.I.: Gtm: a principled alternative to the self-organizing map. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 165–170. Springer, Heidelberg (1996)
Chapter Google Scholar
Bishop, C.M., Svensen, M., Williams, C.K.I.: Developments of the generative topographic mapping. Neurocomputing 21, 203–224 (1998)
Article MATH Google Scholar
Borg, I., Groenen, P.: Modern Multidimensional Scaling: theory and applications. Springer, Heidelberg (2005)
MATH Google Scholar
Broomhead, D., Lowe, D.: Feed-forward neural networks and topographic mappings for exploratory data analysis. Complex Systems 2, 321–355 (1988)
MathSciNet MATH Google Scholar
Chatfield, C., Collins, A.J.: Introduction to Multivariate Analysis. Chapman and Hall, Boca Raton (1980)
Book MATH Google Scholar
Dempster, A., Laird, N., Rubin., D.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Ghahramani, Z., Jordan, M.I.: Learning from incomplete data. Technical Report AIM-1509 (1994)
Google Scholar
Harmeling, S.: Exploring model selection techniques for nonlinear dimensionality reduction. Technical report, Edinburgh University, Scotland (2007)
Google Scholar
de Silva, V., Tenenbaum, J.B., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
Article Google Scholar
Liechty, M.W., Liechty, J.C., Müller, P.: Bayesian correlation estimation. Biometrika 91, 1–14 (2004)
Article MathSciNet MATH Google Scholar
Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)
Book MATH Google Scholar
Lawrence, N.D.: A scaled conjugate gradient algorithm for fast supervised learning. Journal of Machine Learning Research 6, 1783–1816 (2005)
Google Scholar
Lowe, D., Tipping, M.E.: Feed-forward neural networks and topographic mappings for exploratory data analysis. Neural Computing and Applications 4, 84–95 (1996)
Article Google Scholar
Moeller, U., Radke, D.: Performance of data resampling methods for robust class discovery based on clustering. Intelligent Data Analysis 10, 139–162 (2006)
Google Scholar
Roweis, S.T., Saul, L.K.: Locally linear embedding. Science 290, 2323–2326 (2000)
Article Google Scholar
Schroeder, M., Cornford, D., Farrimond, P., Cornford, C.: Addressing missing data in geochemistry: A non-linear approach. Organic Geochemistry 39, 1162–1169 (2008)
Article Google Scholar
Schroeder, M., Nabney, I.T., Cornford, D.: Block gtm: Incorporating prior knowledge of covariance structure in data visualisation. Technical report, NCRG, Aston University, Birmingham (2008)
Google Scholar
Sun, Y.: Non-linear Hierarchical Visualisation. PhD thesis, Aston University (2002)
Google Scholar
Yu, C.H.: Resampling methods: concepts, applications, and justification. Practical Assessment, Research and Evaluation 8 (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Aston University, NCRG, Aston Triangle, Birmingham, B4 7ET, UK
Martin Schroeder, Dan Cornford & Ian T. Nabney

Authors

Martin Schroeder
View author publications
You can also search for this author in PubMed Google Scholar
Dan Cornford
View author publications
You can also search for this author in PubMed Google Scholar
Ian T. Nabney
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Computing, London Metropolitan University, 166-220 Holloway Road, N7 8DB, London, UK
Dominic Palmer-Brown
School of Computing, IT and Engineering, University of East London, Docklands Campus, 4-6 University Way, E16 2RD, London, UK
Chrisina Draganova & Haris Mouratidis &
School of Computing, IT and Engineering, University of East London, London, UK
Elias Pimenidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Schroeder, M., Cornford, D., Nabney, I.T. (2009). Data Visualisation and Exploration with Prior Knowledge. In: Palmer-Brown, D., Draganova, C., Pimenidis, E., Mouratidis, H. (eds) Engineering Applications of Neural Networks. EANN 2009. Communications in Computer and Information Science, vol 43. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03969-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-03969-0_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03968-3
Online ISBN: 978-3-642-03969-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics