Skip to main content

Using Correlation Dimension for Analysing Text Data

  • Conference paper
Artificial Neural Networks – ICANN 2010 (ICANN 2010)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6352))

Included in the following conference series:

Abstract

In this article, we study the scale-dependent dimensionality properties and overall structure of text data with a method that measures correlation dimension in different scales. As experimental results, we present the analysis of text data sets with the Reuters and Europarl corpora, which are also compared to artificially generated point sets. A comparison is also made with speech data. The results reflect some of the typical properties of the data and the use of our method in improving various data analysis applications is discussed.

This work has been supported by the Academy of Finland and a grant from the Department of Mathematics and Statistics at the University of Helsinki (IK).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Grassberger, P., Procaccia, I.: Characterization of strange attractors. Phys. Rev. Lett. 50(5), 346–349 (1983)

    Article  MathSciNet  Google Scholar 

  2. Camastra, F.: Data dimensionality estimation methods: a survey. Pattern Recognition 36(12), 2945–2954 (2003)

    Article  MATH  Google Scholar 

  3. Theiler, J.: Estimating fractal dimension. Journal of the Optical Society of America A 7, 1055–1073 (1990)

    Article  MathSciNet  Google Scholar 

  4. Karlgren, J., Holst, A., Sahlgren, M.: Filaments of meaning in word space. Advances in Information Retrieval, pp. 531–538 (2008)

    Google Scholar 

  5. Kumar, C.A., Srinivas, S.: A note on effect of term weighting on selecting intrinsic dimensionality of data. Journal of Cybernetics and Information Technologies 9(1), 5–12 (2009)

    Google Scholar 

  6. Kohonen, T., Nieminen, I.T., Honkela, T.: On the quantization error in SOM vs. VQ: A critical and systematic study. In: Proceedings of WSOM 2009, pp. 133–144 (2009)

    Google Scholar 

  7. Fukunaga, K., Olsen, D.R.: An algorithm for finding intrinsic dimensionality of data. IEEE Trans. Comput. 20, 176–183 (1971)

    Article  MATH  Google Scholar 

  8. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: A new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)

    Google Scholar 

  9. Manning, C.D., Schütze, H.: Foundations of statistical natural language processing. MIT Press, Cambridge (1999)

    MATH  Google Scholar 

  10. Vinay, V., Cox, I.J., Milic-Frayling, N., Wood, K.R.: Measuring the complexity of a collection of documents. In: Lalmas, M., MacFarlane, A., Rüger, S.M., Tombros, A., Tsikrika, T., Yavlinsky, A. (eds.) ECIR 2006. LNCS, vol. 3936, pp. 107–118. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. IEEE Transactions on Knowledge and Data Engineering 17(12), 1624–1637 (2005)

    Article  Google Scholar 

  12. Cole, R., Fanty, M.: Spoken letter recognition. In: HLT 1990: Proceedings of the Workshop on Speech and Natural Language, pp. 385–390 (1990)

    Google Scholar 

  13. Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: Machine Translation Summit X, pp. 79–86 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kivimäki, I., Lagus, K., Nieminen, I.T., Väyrynen, J.J., Honkela, T. (2010). Using Correlation Dimension for Analysing Text Data. In: Diamantaras, K., Duch, W., Iliadis, L.S. (eds) Artificial Neural Networks – ICANN 2010. ICANN 2010. Lecture Notes in Computer Science, vol 6352. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15819-3_49

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15819-3_49

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15818-6

  • Online ISBN: 978-3-642-15819-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics