Skip to main content
Log in

The method of zonal correlation text analysis

  • Published:
Automatic Documentation and Mathematical Linguistics Aims and scope

Abstract

This paper analyses the method of zonal correlation text analysis based on the comparison of the distribution of word counts in the J 1 zones of two or more texts. The compared texts are divided into the J 0, J 1, and J 2 zones in accordance with the previously developed interpretation of Bradford’s law. A comparative analysis of the parameters of words that are contained in the J 1 text zones is performed. The distance, which points to the degree of semantic proximity of the compared texts, is calculated. The described method can be applied for automatic classification and author attribution of texts. In addition, it allows one to obtain adequate results based on a limited number of parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Yatsko, V.A., Computational linguistics or linguistic informatics?, Autom. Doc. Math. Linguist., 2014, vol. 48, no. 3, pp. 149–157.

    Article  Google Scholar 

  2. Piantadosi, S.T., Zipf’s word frequency law in natural language: a critical review and future directions, 2014. http://colala.bcs.rochester.edu/papers/piantadosi2014zipfs.pdf.

    Google Scholar 

  3. West, M., The mystery of Zipf, 2008. http://plus.maths.org/content/mystery-zipf.

    Google Scholar 

  4. Sorell, J., Zipf’s law and vocabulary, in The Encyclopedia of Applied Linguistics, Chapelle, C.A., Ed., Oxford: Blackwell, 2012. http://www.academia.edu/550703/Zipfs-Law-and-Vocabulary.

    Google Scholar 

  5. Bol’shakova, E.I., Klyshinskii, E.S., Lande, D.V., et al., Avtomaticheskaya obrabotka tekstov na estestvennom yazyke i komp’yuternaya lingvistika: ucheb. posobie (Automatic Text Processing in Natural Language and Computer Linguistics: Handbook), Ìoscow: MIEP, 2011.

    Google Scholar 

  6. Mogilev, A.V. and Listrova, L.V., Informatsiya i informatsionnye protsessy. Sotsial’naya informatika (Information and Information Processes. Social Informatics), St. Petersburg: BKHV-Petersburg, 2006.

    Google Scholar 

  7. Yatsko, V.A., The interpretation of Bradford’s law in terms of geometric progression, Autom. Doc. Math. Linguist., 2012, vol. 46, no. 2, pp. 112–117.

    Article  Google Scholar 

  8. Anthony L., AntConc3.1.3, 2012. http://www.antlab.sci.waseda.ac.jp/antconc-index.html.

    Google Scholar 

  9. Yatsko, V. A., Method of zonal data analysis, V Mire Nauch. Otrk., 2013, no. 6.1, pp. 166–182.

    Google Scholar 

  10. Hadi, W.M., Thabtah, M., and Abdel-Jaber, H., A comparative study using vector space model with Knearest neighbor on text categorization data, Proc. of the World Congress on Engineering (London, 2007), Ao, S.I., Gelman, L., Hukins, D., Hunter, A., and Korsunsky, A.M., Eds., London, 2007, vol. 1, pp. 296–300. http://www.iaeng.org/publication/WCE2007/WCE2007-pp296-300.pdf.

    Google Scholar 

  11. Yatsko, V.A., Starikov, M.S., and Butakov, A.V., Automatic genre recognition and adaptive text summarization, Autom. Doc. Math. Linguist., 2010, vol. 44, no. 3, pp. 111–120.

    Article  Google Scholar 

  12. Santini, M., Description of 3 feature sets for automatic identification of genres in web pages, 2005–2006. http://www.nltg.brighton.ac.uk/home/Marina.Santini/three-feature-sets.pdf.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to V. A. Yatsko.

Additional information

Original Russian Text © V.A. Yatsko, 2014, published in Nauchno-Tekhnicheskaya Informatsiya, Seriya 2, 2014, No. 10, pp. 26–30.

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yatsko, V.A. The method of zonal correlation text analysis. Autom. Doc. Math. Linguist. 48, 259–263 (2014). https://doi.org/10.3103/S0005105514050057

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0005105514050057

Keywords

Navigation