Skip to main content

Towards a Deeper Understanding of the Complex Behaviour Observed in the Distribution of Words in Written Texts

  • Conference paper
Proceedings of the European Conference on Complex Systems 2012

Abstract

Here we show that the recently reported presence of long-range correlations in the distribution of words along texts is due to the complex distribution of the keywords, while common words are not correlated. Indeed we prove that the degree of long-range correlations of a word at long scales is a good measure of its relevance to the text. Additionally, we develop a model able to reproduce the spatial distribution of a word in a text, based on the long-range correlations observed for the word. The model not only reproduces the complex behaviour characterized by the presence of correlations at long scales and the degree of relevance of the word, but also the probability distribution of the inter-occurrences distances in the whole range of scales.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It has been downloaded from the Project Gutenberg web page. http://www.gutenberg.org.

References

  1. Carpena P, Bernaola-Galván P, Hackenberg M, Coronado AV, Oliver JL (2009) Level statistics of words: finding keywords in literary texts and symbolic sequences. Phys Rev E 79:035102(R)

    Article  ADS  Google Scholar 

  2. Montemurro MA, Zanette DH (2010) Towards the quantification of the semantic information encoded in written language. Adv Complex Syst 13(2):135–153

    Article  MATH  Google Scholar 

  3. Montemurro MA, Pury PA (2002) Long-range fractal correlations in literary corpora. Fractals 10:451–461

    Article  Google Scholar 

  4. Bhan J, Kim S, Kim J, Kwon Y, Yang S, Lee K (2006) Long-range correlations in Korean literary corpora. Chaos Solitons Fractals 29:69–81

    Article  ADS  MATH  Google Scholar 

  5. Şahin G, Erentürk M, Hacinliyan A (2009) Detrended fluctuation analysis in natural languages using non-corpus parametrization. Chaos Solitons Fractals 41:198–205

    Article  ADS  Google Scholar 

  6. Altmann EG, Pierrehumbert JB, Motter AE (2009) Beyond word frequency: bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 4(11):e7678

    Article  ADS  Google Scholar 

  7. Ortuño M, Carpena P, Bernaola-Galván P, Muñoz E, Somoza AM (2002) Keyword detection in natural languages and DNA. Europhys Lett 57(5):759–764

    Article  ADS  Google Scholar 

  8. Voss RF (1992) Evolution of long-range fractal correlations and 1/f noise in DNA base sequences. Phys Rev Lett 68:3805–3808

    Article  ADS  Google Scholar 

  9. Peng C-K, Buldyrev SV, Havlin S, Simons M, Stanley HE, Goldberger AL (1994) Mosaic organization of DNA nucleotides. Phys Rev E 49:1685–1689

    Article  ADS  Google Scholar 

  10. Hu K, Ivanov PC, Chen Z, Carpena P, Stanley HE (2001) Effect of trends on detrended fluctuation analysis. Phys Rev E 64:011114

    Article  ADS  Google Scholar 

  11. Makse HA, Havlin S, Schwartz M, Stanley HE (1996) Method for generating long-range correlations for large systems. Phys Rev E 53:5445–5449

    Article  ADS  Google Scholar 

  12. Carretero-Campos C, Bernaola-Galván P, Ivanov PC, Carpena P (2012) Phase transitions in the first-passage time of scale-invariant correlated processes. Phys Rev E 85:011139

    Article  ADS  Google Scholar 

Download references

Acknowledgements

This work has been supported by Grant no. P07-FQM03163 from Spanish Junta de Andalucía.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Carretero-Campos, C., Montemurro, M.A., Bernaola-Galván, P., Coronado, A.V., Carpena, P. (2013). Towards a Deeper Understanding of the Complex Behaviour Observed in the Distribution of Words in Written Texts. In: Gilbert, T., Kirkilionis, M., Nicolis, G. (eds) Proceedings of the European Conference on Complex Systems 2012. Springer Proceedings in Complexity. Springer, Cham. https://doi.org/10.1007/978-3-319-00395-5_34

Download citation

Publish with us

Policies and ethics