Skip to main content

Data Visualisation and Exploration with Prior Knowledge

  • Conference paper
Engineering Applications of Neural Networks (EANN 2009)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 43))

Abstract

Visualising data for exploratory analysis is a major challenge in many applications. Visualisation allows scientists to gain insight into the structure and distribution of the data, for example finding common patterns and relationships between samples as well as variables. Typically, visualisation methods like principal component analysis and multi-dimensional scaling are employed. These methods are favoured because of their simplicity, but they cannot cope with missing data and it is difficult to incorporate prior knowledge about properties of the variable space into the analysis; this is particularly important in the high-dimensional, sparse datasets typical in geochemistry. In this paper we show how to utilise a block-structured correlation matrix using a modification of a well known non-linear probabilistic visualisation model, the Generative Topographic Mapping (GTM), which can cope with missing data. The block structure supports direct modelling of strongly correlated variables. We show that including prior structural information it is possible to improve both the data visualisation and the model fit. These benefits are demonstrated on artificial data as well as a real geochemical dataset used for oil exploration, where the proposed modifications improved the missing data imputation results by 3 to 13%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bishop, C.M., Svensen, M., Williams, C.K.I.: Gtm: a principled alternative to the self-organizing map. In: Vorbrüggen, J.C., von Seelen, W., Sendhoff, B. (eds.) ICANN 1996. LNCS, vol. 1112, pp. 165–170. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  2. Bishop, C.M., Svensen, M., Williams, C.K.I.: Developments of the generative topographic mapping. Neurocomputing 21, 203–224 (1998)

    Article  MATH  Google Scholar 

  3. Borg, I., Groenen, P.: Modern Multidimensional Scaling: theory and applications. Springer, Heidelberg (2005)

    MATH  Google Scholar 

  4. Broomhead, D., Lowe, D.: Feed-forward neural networks and topographic mappings for exploratory data analysis. Complex Systems 2, 321–355 (1988)

    MathSciNet  MATH  Google Scholar 

  5. Chatfield, C., Collins, A.J.: Introduction to Multivariate Analysis. Chapman and Hall, Boca Raton (1980)

    Book  MATH  Google Scholar 

  6. Dempster, A., Laird, N., Rubin., D.: Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  7. Ghahramani, Z., Jordan, M.I.: Learning from incomplete data. Technical Report AIM-1509 (1994)

    Google Scholar 

  8. Harmeling, S.: Exploring model selection techniques for nonlinear dimensionality reduction. Technical report, Edinburgh University, Scotland (2007)

    Google Scholar 

  9. de Silva, V., Tenenbaum, J.B., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  10. Liechty, M.W., Liechty, J.C., Müller, P.: Bayesian correlation estimation. Biometrika 91, 1–14 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kohonen, T.: Self-Organizing Maps. Springer, Heidelberg (1995)

    Book  MATH  Google Scholar 

  12. Lawrence, N.D.: A scaled conjugate gradient algorithm for fast supervised learning. Journal of Machine Learning Research 6, 1783–1816 (2005)

    Google Scholar 

  13. Lowe, D., Tipping, M.E.: Feed-forward neural networks and topographic mappings for exploratory data analysis. Neural Computing and Applications 4, 84–95 (1996)

    Article  Google Scholar 

  14. Moeller, U., Radke, D.: Performance of data resampling methods for robust class discovery based on clustering. Intelligent Data Analysis 10, 139–162 (2006)

    Google Scholar 

  15. Roweis, S.T., Saul, L.K.: Locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  16. Schroeder, M., Cornford, D., Farrimond, P., Cornford, C.: Addressing missing data in geochemistry: A non-linear approach. Organic Geochemistry 39, 1162–1169 (2008)

    Article  Google Scholar 

  17. Schroeder, M., Nabney, I.T., Cornford, D.: Block gtm: Incorporating prior knowledge of covariance structure in data visualisation. Technical report, NCRG, Aston University, Birmingham (2008)

    Google Scholar 

  18. Sun, Y.: Non-linear Hierarchical Visualisation. PhD thesis, Aston University (2002)

    Google Scholar 

  19. Yu, C.H.: Resampling methods: concepts, applications, and justification. Practical Assessment, Research and Evaluation 8 (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schroeder, M., Cornford, D., Nabney, I.T. (2009). Data Visualisation and Exploration with Prior Knowledge. In: Palmer-Brown, D., Draganova, C., Pimenidis, E., Mouratidis, H. (eds) Engineering Applications of Neural Networks. EANN 2009. Communications in Computer and Information Science, vol 43. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03969-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03969-0_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03968-3

  • Online ISBN: 978-3-642-03969-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics